论文标题

建筑瓶颈原则

The Architectural Bottleneck Principle

论文作者

Pimentel, Tiago, Valvoda, Josef, Stoehr, Niklas, Cotterell, Ryan

论文摘要

在本文中,我们试图衡量神经网络中的组件可以从馈入其中的表示中提取多少信息。我们的工作与先前的探测工作形成鲜明对比,其中大多数调查了模型的表示包含多少信息。透视图的这种转变使我们提出了一个新的探测原理,即建筑瓶颈原则:为了估计给定组件可以提取多少信息,探针应该看起来完全像组件。依靠这一原则,我们估计通过我们的注意探测器可以为变压器提供多少句法信息,该探测器与变形金刚的自我注意力头的头部完全相似。在实验上,我们发现,在三种模型(Bert,Albert和Roberta)中,句子的语法树主要可以通过我们的探针提取,这表明这些模型可以访问句法信息,同时构成其上下文表示。但是,这些模型是否实际使用了这些信息仍然是一个悬而未决的问题。

In this paper, we seek to measure how much information a component in a neural network could extract from the representations fed into it. Our work stands in contrast to prior probing work, most of which investigates how much information a model's representations contain. This shift in perspective leads us to propose a new principle for probing, the architectural bottleneck principle: In order to estimate how much information a given component could extract, a probe should look exactly like the component. Relying on this principle, we estimate how much syntactic information is available to transformers through our attentional probe, a probe that exactly resembles a transformer's self-attention head. Experimentally, we find that, in three models (BERT, ALBERT, and RoBERTa), a sentence's syntax tree is mostly extractable by our probe, suggesting these models have access to syntactic information while composing their contextual representations. Whether this information is actually used by these models, however, remains an open question.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源