部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Selective Query-guided Debiasing for Video Corpus Moment Retrieval

论文作者

Yoon, Sunjae, Hong, Ji Woo, Yoon, Eunseop, Kim, Dahyun, Kim, Junyeong, Yoon, Hee Suk, Yoo, Chang D.

论文摘要

Video moment retrieval (VMR) aims to localize target moments in untrimmed videos pertinent to a given textual query.现有的检索系统倾向于依靠检索偏见作为快捷方式，因此无法充分学习查询和视频之间的多模式相互作用。这种检索偏见源于查询和矩之间的频繁同时出现模式，这些偏见微不足道地将查询中引用的物体（例如铅笔）与矩（例如，用铅笔与铅笔相关的铅笔）相关联，这些物体经常出现在视频中，以使它们融合成偏见的时刻预测。尽管最近的陈述方法重点是消除这种检索偏见，但我们认为这些偏见的预测有时应该保留，因为有许多疑问有偏见的预测相当有用。为了结合这种检索偏见，我们提出了一个选择性的查询引导的偏见网络（Squidnet），该网络（Squidnet）结合了以下两个主要属性：（1）有偏见的力矩检索有意揭示出在查询的对象中本质上固有的偏见的时刻，并且（2）选择性质疑的偏见具有选择性的偏见，以表现出选择性的偏见，以征求意义，而有意义地进行了有意义的研究。我们对三矩检索基准测试（即TVR，ActivityNet，DideMo）的实验结果显示了Squidnet和定性分析的有效性，显示了可解释性的提高。

Video moment retrieval (VMR) aims to localize target moments in untrimmed videos pertinent to a given textual query. Existing retrieval systems tend to rely on retrieval bias as a shortcut and thus, fail to sufficiently learn multi-modal interactions between query and video. This retrieval bias stems from learning frequent co-occurrence patterns between query and moments, which spuriously correlate objects (e.g., a pencil) referred in the query with moments (e.g., scene of writing with a pencil) where the objects frequently appear in the video, such that they converge into biased moment predictions. Although recent debiasing methods have focused on removing this retrieval bias, we argue that these biased predictions sometimes should be preserved because there are many queries where biased predictions are rather helpful. To conjugate this retrieval bias, we propose a Selective Query-guided Debiasing network (SQuiDNet), which incorporates the following two main properties: (1) Biased Moment Retrieval that intentionally uncovers the biased moments inherent in objects of the query and (2) Selective Query-guided Debiasing that performs selective debiasing guided by the meaning of the query. Our experimental results on three moment retrieval benchmarks (i.e., TVR, ActivityNet, DiDeMo) show the effectiveness of SQuiDNet and qualitative analysis shows improved interpretability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题