论文标题
高脑检测的数据增强
Data Augmentation for Hypernymy Detection
论文作者
论文摘要
自动检测大伴关系代表了NLP中一个具有挑战性的问题。使用分布式表示形式的最先进的监督方法成功地应用了高质量培训数据的可用性有限。我们已经开发了两种新型的数据增强技术,这些技术从现有培训中产生了新的培训示例。首先,我们结合了高鼻传感器的语言原理和相互作用的修饰语名称组成,以生成其他成对的向量,例如“小狗 - 狗”或“小狗 - 动物”,可以假定过度nymyy的关系。其次,我们使用生成的对抗网络(GAN)生成一对矢量,也可以假定高度伴侣关系。此外,我们提出了两种互补策略,用于通过利用语言资源(例如WordNet)来扩展现有数据集。使用3个不同数据集的评估来进行超声检测和2个不同的向量空间,我们证明了所提出的自动数据增强和数据集扩展策略都大大改善了分类器的性能。
The automatic detection of hypernymy relationships represents a challenging problem in NLP. The successful application of state-of-the-art supervised approaches using distributed representations has generally been impeded by the limited availability of high quality training data. We have developed two novel data augmentation techniques which generate new training examples from existing ones. First, we combine the linguistic principles of hypernym transitivity and intersective modifier-noun composition to generate additional pairs of vectors, such as "small dog - dog" or "small dog - animal", for which a hypernymy relationship can be assumed. Second, we use generative adversarial networks (GANs) to generate pairs of vectors for which the hypernymy relation can also be assumed. We furthermore present two complementary strategies for extending an existing dataset by leveraging linguistic resources such as WordNet. Using an evaluation across 3 different datasets for hypernymy detection and 2 different vector spaces, we demonstrate that both of the proposed automatic data augmentation and dataset extension strategies substantially improve classifier performance.