RNN体系结构需要多少复杂性来学习对语法敏感的依赖性？

论文标题

RNN体系结构需要多少复杂性来学习对语法敏感的依赖性？

How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?

论文作者

Bhatt, Gantavya, Bansal, Hritik, Singh, Rishubh, Agarwal, Sumeet

论文摘要

长期的短期内存（LSTM）网络及其变体能够封装长期依赖性，这可以从它们在各种语言任务上的表现中可以明显看出。另一方面，在突触连接方面似乎更扎根于生物学上的简单复发网络（SRN）通常在捕获长期依赖性以及在不受欢迎的环境中的语法误差的基因座方面取得了不太成功。在本文中，我们试图开发弥合生物合理性和语言能力之间差距的模型。我们提出了一种新的结构，即衰减RNN，它结合了神经元激活的衰减性质，并模拟了神经元种群中的兴奋性和抑制性联系。除了其生物学灵感外，我们的模型还显示了相对于LSTM的竞争性能，在主题 - 动词协议，句子语法和语言建模任务方面。这些结果为探测RNN体系结构成功模拟语言现象所需的感应性偏见的性质提供了一些指示。

Long short-term memory (LSTM) networks and their variants are capable of encapsulating long-range dependencies, which is evident from their performance on a variety of linguistic tasks. On the other hand, simple recurrent networks (SRNs), which appear more biologically grounded in terms of synaptic connections, have generally been less successful at capturing long-range dependencies as well as the loci of grammatical errors in an unsupervised setting. In this paper, we seek to develop models that bridge the gap between biological plausibility and linguistic competence. We propose a new architecture, the Decay RNN, which incorporates the decaying nature of neuronal activations and models the excitatory and inhibitory connections in a population of neurons. Besides its biological inspiration, our model also shows competitive performance relative to LSTMs on subject-verb agreement, sentence grammaticality, and language modeling tasks. These results provide some pointers towards probing the nature of the inductive biases required for RNN architectures to model linguistic phenomena successfully.

下载PDF全文

下载文献需遵守相关版权规定

论文标题