论文标题

使用双熵提取和验证解释性单词群岛

Extracting and Validating Explanatory Word Archipelagoes using Dual Entropy

论文作者

Ohsawa, Yukio

论文摘要

文本的逻辑连通性由形成群岛的单词的连接性表示。在这里,每个群岛都是某个单词发生的一系列岛屿。这里的一个岛是指强调单词的局部句子序列,并且使用熵A(基于窗口的熵)在单词出现的分布与每个时间窗口的宽度上的熵A(基于窗口的熵)的共同差异提取了与目标文本相当的群岛。然后,在熵B(基于图的熵)上评估文本的逻辑连接性,该连通性根据句子的分布计算为对单词共同发生的连接的单词群体的分布。结果显示了目标文本的部分,其中构成了熵A上提取的群岛的单词,而没有学习或准备的知识,构成了文本的解释性部分,其熵B比基线方​​法提取的部分较小。

The logical connectivity of text is represented by the connectivity of words that form archipelagoes. Here, each archipelago is a sequence of islands of the occurrences of a certain word. An island here means the local sequence of sentences where the word is emphasized, and an archipelago of a length comparable to the target text is extracted using the co-variation of entropy A (the window-based entropy) on the distribution of the word's occurrences with the width of each time window. Then, the logical connectivity of text is evaluated on entropy B (the graph-based entropy) computed on the distribution of sentences to connected word-clusters obtained on the co-occurrence of words. The results show the parts of the target text with words forming archipelagoes extracted on entropy A, without learned or prepared knowledge, form an explanatory part of the text that is of smaller entropy B than the parts extracted by the baseline methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源