通过非本地复发神经记忆的学习序列表示

论文标题

通过非本地复发神经记忆的学习序列表示

Learning Sequence Representations by Non-local Recurrent Neural Memory

论文作者

Pei, Wenjie, Feng, Xin, Fu, Canmiao, Cao, Qiong, Lu, Guangming, Tai, Yu-Wing

论文摘要

序列表示学习的主要挑战是捕获远程时间依赖性。监督序列表示学习的典型方法是基于复发性神经网络以捕获时间依赖性的。这些方法的一个潜在局限性是，它们仅在序列中明确对相邻时间步长之间的一阶信息相互作用进行建模，因此，未完全利用了非相应时间步长之间的高阶相互作用。它极大地限制了建模远程时间依赖性的能力，因为由于时间信息稀释和梯度消失，无法长期保持一阶相互作用所学的时间特征。为了解决这一限制，我们提出了用于监督序列表示学习的非本地复发性神经记忆（NRNM），该学习执行非本地操作\ MR {通过自我关注机制}，以学习滑动时间内存块中的全阶相互作用，并模型在收到的重复范围内的内存块之间模型。因此，我们的模型能够捕获远程依赖性。此外，我们的模型可以蒸馏出高阶相互作用中包含的潜在高级特征。我们验证了NRNM对不同模态的三种序列应用的有效性和概括，包括序列分类，逐步的顺序预测和序列相似性学习。我们的模型与针对这些序列应用中的每个序列应用专门设计的其他最新方法进行了比较。

The key challenge of sequence representation learning is to capture the long-range temporal dependencies. Typical methods for supervised sequence representation learning are built upon recurrent neural networks to capture temporal dependencies. One potential limitation of these methods is that they only model one-order information interactions explicitly between adjacent time steps in a sequence, hence the high-order interactions between nonadjacent time steps are not fully exploited. It greatly limits the capability of modeling the long-range temporal dependencies since the temporal features learned by one-order interactions cannot be maintained for a long term due to temporal information dilution and gradient vanishing. To tackle this limitation, we propose the Non-local Recurrent Neural Memory (NRNM) for supervised sequence representation learning, which performs non-local operations \MR{by means of self-attention mechanism} to learn full-order interactions within a sliding temporal memory block and models global interactions between memory blocks in a gated recurrent manner. Consequently, our model is able to capture long-range dependencies. Besides, the latent high-level features contained in high-order interactions can be distilled by our model. We validate the effectiveness and generalization of our NRNM on three types of sequence applications across different modalities, including sequence classification, step-wise sequential prediction and sequence similarity learning. Our model compares favorably against other state-of-the-art methods specifically designed for each of these sequence applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题