论文标题
请注意视频脱张状态的隐藏状态:ping-pong复发性神经网络和选择性非本地关注
Pay Attention to Hidden States for Video Deblurring: Ping-Pong Recurrent Neural Networks and Selective Non-Local Attention
论文作者
论文摘要
视频DeBlurring模型利用相邻框架中的信息来消除由相机和对象的运动引起的模糊。经常采用复发性神经网络〜(RNN),以通过隐藏状态对框架之间的时间依赖性进行建模。但是,当运动模糊很强时,由于不同帧之间的位移,隐藏状态很难传递适当的信息。尽管已经尝试更新隐藏状态,但除了简单模块的接受领域之外,很难处理未对准的功能。因此,我们提出了2个模块,以补充视频DeBlurring的RNN体系结构。首先,我们设计了ping-pong rnn〜(pprnn),该ping rnn〜(pprnn)通过提及当前时间和上一个时间步骤中的功能来更新隐藏状态。 PPRNN利用其经常性架构,以迭代和平衡的方式从这两个功能中收集相关信息。其次,我们使用选择性的非本地关注〜(SNLA)模块,通过将其与输入框架功能中的位置信息对齐来另外完善隐藏状态。注意力评分是通过与输入功能相关的相关性来扩展的,以关注必要的信息。通过对两个模块的隐藏状态(具有强大协同作用),我们的PAHS框架可以提高RNN结构的代表权,并在标准的基准测试和现实世界视频上实现最先进的脱张性能。
Video deblurring models exploit information in the neighboring frames to remove blur caused by the motion of the camera and the objects. Recurrent Neural Networks~(RNNs) are often adopted to model the temporal dependency between frames via hidden states. When motion blur is strong, however, hidden states are hard to deliver proper information due to the displacement between different frames. While there have been attempts to update the hidden states, it is difficult to handle misaligned features beyond the receptive field of simple modules. Thus, we propose 2 modules to supplement the RNN architecture for video deblurring. First, we design Ping-Pong RNN~(PPRNN) that acts on updating the hidden states by referring to the features from the current and the previous time steps alternately. PPRNN gathers relevant information from the both features in an iterative and balanced manner by utilizing its recurrent architecture. Second, we use a Selective Non-Local Attention~(SNLA) module to additionally refine the hidden state by aligning it with the positional information from the input frame feature. The attention score is scaled by the relevance to the input feature to focus on the necessary information. By paying attention to hidden states with both modules, which have strong synergy, our PAHS framework improves the representation powers of RNN structures and achieves state-of-the-art deblurring performance on standard benchmarks and real-world videos.