Turning to a Teacher for Timestamp Supervised Temporal Action Segmentation
Abstract
A new framework using a teacher model and segmentally smoothing loss improves timestamp-supervised temporal action segmentation by mitigating noise and enhancing label stability, achieving competitive performance with lower annotation costs.
Temporal action segmentation in videos has drawn much attention recently. Timestamp supervision is a cost-effective way for this task. To obtain more information to optimize the model, the existing method generated pseudo frame-wise labels iteratively based on the output of a segmentation model and the timestamp annotations. However, this practice may introduce noise and oscillation during the training, and lead to performance degeneration. To address this problem, we propose a new framework for timestamp supervised temporal action segmentation by introducing a teacher model parallel to the segmentation model to help stabilize the process of model optimization. The teacher model can be seen as an ensemble of the segmentation model, which helps to suppress the noise and to improve the stability of pseudo labels. We further introduce a segmentally smoothing loss, which is more focused and cohesive, to enforce the smooth transition of the predicted probabilities within action instances. The experiments on three datasets show that our method outperforms the state-of-the-art method and performs comparably against the fully-supervised methods at a much lower annotation cost.
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper