映射神经语言模型的时间尺度组织

论文标题

映射神经语言模型的时间尺度组织

Mapping the Timescale Organization of Neural Language Models

论文作者

Chien, Hsiang-Yun Sherry, Zhang, Jinhan, Honey, Christopher. J.

论文摘要

在人的大脑中，语言输入序列是在分布式和层次结构中处理的，其中处理的较高的处理阶段在更长的时间尺度上编码上下文信息。相比之下，在执行自然语言处理的经常性神经网络中，我们对如何在功能上组织了多个时间尺度。因此，我们应用了在神经科学中开发的工具来映射单词级LSTM语言模型中各个单元的“处理时间尺度”。该时间表映射方法将长时间尺度分配给了以前发现的单元，以跟踪远程句法依赖性。此外，该映射显示了具有长时间尺度的网络的一小部分（少于15％的单元），并且以前没有探索其功能。接下来，我们通过检查单位的处理时间尺度与其网络连接性之间的关系来探究网络的功能组织。我们确定了两类的长时间尺度单元：“控制器”单元组成了一个密集的互连子网，并强烈投射到网络的其余部分，而“ Integrator”单元显示了网络中最长的时间表，并表达了更接近平均投影概况的投影概况。消融集成商和控制器单元影响句子中不同位置的模型性能，这表明了这两组单元的独特功能。最后，我们测试了这些结果对具有不同体系结构的字符级LSTM模型和模型的概括。总而言之，我们展示了一种无模型的技术，用于映射经常性神经网络中的时间表组织，并应用了这种方法来揭示神经语言模型的时间尺度和功能组织。

In the human brain, sequences of language input are processed within a distributed and hierarchical architecture, in which higher stages of processing encode contextual information over longer timescales. In contrast, in recurrent neural networks which perform natural language processing, we know little about how the multiple timescales of contextual information are functionally organized. Therefore, we applied tools developed in neuroscience to map the "processing timescales" of individual units within a word-level LSTM language model. This timescale-mapping method assigned long timescales to units previously found to track long-range syntactic dependencies. Additionally, the mapping revealed a small subset of the network (less than 15% of units) with long timescales and whose function had not previously been explored. We next probed the functional organization of the network by examining the relationship between the processing timescale of units and their network connectivity. We identified two classes of long-timescale units: "controller" units composed a densely interconnected subnetwork and strongly projected to the rest of the network, while "integrator" units showed the longest timescales in the network, and expressed projection profiles closer to the mean projection profile. Ablating integrator and controller units affected model performance at different positions within a sentence, suggesting distinctive functions of these two sets of units. Finally, we tested the generalization of these results to a character-level LSTM model and models with different architectures. In summary, we demonstrated a model-free technique for mapping the timescale organization in recurrent neural networks, and we applied this method to reveal the timescale and functional organization of neural language models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题