神经网络语言模型和人类处理嵌套依赖的机制

论文标题

神经网络语言模型和人类处理嵌套依赖的机制

Mechanisms for Handling Nested Dependencies in Neural-Network Language Models and Humans

论文作者

Lakretz, Yair, Hupkes, Dieuwke, Vergallito, Alessandra, Marelli, Marco, Baroni, Marco, Dehaene, Stanislas

论文摘要

句子理解中的递归处理被认为是人类语言能力的标志。但是，其潜在的神经机制在很大程度上尚不清楚。我们研究了接受“深度学习”方法训练的现代人工神经网络模仿人类句子处理的中心方面，即在工作记忆中存储语法数量和性别信息及其在长距离一致性中的使用（例如，在主体与其他短语分离时捕获了正确的数字一致性和动词之间的正确数字一致性）。尽管该网络是具有长期短期内存单元的复发架构，仅接受了训练以预测大型语料库中的下一个单词，但分析表明，出现了一组非常稀疏的专业单元，这些单元成功地处理了语法编号的局部和长距离句法协议。但是，模拟还表明，这种机制不支持完整的递归，并且在某些远程嵌入依赖项的情况下失败了。我们在一个行为实验中测试了该模型的预测，其中人类在句子中检测到违规行为的违规行为在多个名词的奇异/复数状态（带有或不带有嵌入）中有系统的变化。人类和模型误差模式非常相似，表明该模型在人类数据中观察到了各种效果。但是，一个关键区别在于，嵌入式远程依赖性，人类仍然高于机会水平，而模型的系统错误使其低于机会。总体而言，我们的研究表明，探索现代人工神经网络处理句子导致有关人类语言表现的精确和可检验的假设的方式。

Recursive processing in sentence comprehension is considered a hallmark of human linguistic abilities. However, its underlying neural mechanisms remain largely unknown. We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing, namely the storing of grammatical number and gender information in working memory and its use in long-distance agreement (e.g., capturing the correct number agreement between subject and verb when they are separated by other phrases). Although the network, a recurrent architecture with Long Short-Term Memory units, was solely trained to predict the next word in a large corpus, analysis showed the emergence of a very sparse set of specialized units that successfully handled local and long-distance syntactic agreement for grammatical number. However, the simulations also showed that this mechanism does not support full recursion and fails with some long-range embedded dependencies. We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns, with or without embedding. Human and model error patterns were remarkably similar, showing that the model echoes various effects observed in human data. However, a key difference was that, with embedded long-range dependencies, humans remained above chance level, while the model's systematic errors brought it below chance. Overall, our study shows that exploring the ways in which modern artificial neural networks process sentences leads to precise and testable hypotheses about human linguistic performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题