论文标题
现实世界视频中的动作识别
Action recognition in real-world videos
论文作者
论文摘要
人类行动识别的目的是在视频序列中暂时或空间定位人类感兴趣的动作。时间定位(即指示视频中动作的开始和终点框架)称为帧级检测。空间本地化更具挑战性,意味着识别与动作相对应的每个动作框架中的像素。此设置通常称为像素级检测。在本章中,我们可以互换使用动作,活动和事件。
The goal of human action recognition is to temporally or spatially localize the human action of interest in video sequences. Temporal localization (i.e. indicating the start and end frames of the action in a video) is referred to as frame-level detection. Spatial localization, which is more challenging, means to identify the pixels within each action frame that correspond to the action. This setting is usually referred to as pixel-level detection. In this chapter, we are using action, activity, event interchangeably.