论文标题
对知识图的常识性自学的实证研究
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs
论文作者
论文摘要
基于从大知识图中提取的信息的自我审视已被证明可以改善语言模型的概括,以对各种下游语言推理任务进行零摄像的评估。但是,由于这些改进是总体上报告的,因此(i)如何选择合适的知识以跨任务选择适当的知识,(ii)如何将这些知识与神经语言模型相结合,以及(iii)这些配对如何影响颗粒状的任务表现。在本文中,我们研究了知识抽样策略和大小的效果,这些策略和大小可用于生成适应语言模型的合成数据。我们研究不同合成数据集对具有各种架构和大小的语言模型的影响。根据四个任务属性评估所得模型:域重叠,回答相似性,词汇重叠和答案长度。我们的实验表明,编码器模型从更多数据中受益,而在不同方面之间取得平衡的采样策略会产生最佳性能。大多数改进都发生在带有简短答案和不同答案候选者的问题上,这与用于预训练的数据的特征相对应。
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models, in zero-shot evaluation on various downstream language reasoning tasks. Since these improvements are reported in aggregate, however, little is known about (i) how to select the appropriate knowledge for solid performance across tasks, (ii) how to combine this knowledge with neural language models, and (iii) how these pairings affect granular task performance. In this paper, we study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models. We study the effect of different synthetic datasets on language models with various architectures and sizes. The resulting models are evaluated against four task properties: domain overlap, answer similarity, vocabulary overlap, and answer length. Our experiments show that encoder-decoder models benefit from more data to learn from, whereas sampling strategies that balance across different aspects yield best performance. Most of the improvement occurs on questions with short answers and dissimilar answer candidates, which corresponds to the characteristics of the data used for pre-training.