论文标题
DECIWATCH:10倍有效2D和3D姿势估计的简单基线
DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation
论文作者
论文摘要
本文提出了一个简单的基线框架,用于基于视频的2D/3D人姿势估计,该估计可以比现有作品提高10倍的效率,而无需任何性能降级,名为Deciwatch。与当前在视频中估算每个帧的解决方案不同,Deciwatch引入了一个简单而有效的样品探测框架,该框架只能利用人类运动的连续性和轻巧的姿势表示,只能观看稀疏采样的框架。具体而言,DeciWatch均匀地示例小于10%的视频帧以进行详细估计,以有效的变压器体系结构来降低估计的2D/3D姿势,然后使用另一个基于变压器的网络准确地恢复其余帧。通过四个数据集的三个基于视频的人姿势估计和身体网格恢复任务的全面实验结果验证了Deciwatch的效率和有效性。代码可在https://github.com/cure-lab/deciwatch上找到。
This paper proposes a simple baseline framework for video-based 2D/3D human pose estimation that can achieve 10 times efficiency improvement over existing works without any performance degradation, named DeciWatch. Unlike current solutions that estimate each frame in a video, DeciWatch introduces a simple yet effective sample-denoise-recover framework that only watches sparsely sampled frames, taking advantage of the continuity of human motions and the lightweight pose representation. Specifically, DeciWatch uniformly samples less than 10% video frames for detailed estimation, denoises the estimated 2D/3D poses with an efficient Transformer architecture, and then accurately recovers the rest of the frames using another Transformer-based network. Comprehensive experimental results on three video-based human pose estimation and body mesh recovery tasks with four datasets validate the efficiency and effectiveness of DeciWatch. Code is available at https://github.com/cure-lab/DeciWatch.