论文标题
就像双语婴儿:视觉上双语模型的优势
Like a bilingual baby: The advantage of visually grounding a bilingual language model
论文作者
论文摘要
与大多数神经语言模型不同,人类在丰富的多感官和通常的多语言环境中学习语言。当前的语言模型通常无法完全捕获多语言语言使用的复杂性。我们从MS-Coco-Es的英语和西班牙语中培训了LSTM语言模型。我们发现,视觉接地改善了模型对语言内部和语言之间的语义相似性的理解,并改善了困惑。但是,我们发现抽象单词的视觉接地没有显着优势。我们的结果提供了其他证据,证明了视觉基础语言模型的优势,并指出需要从多语言扬声器和具有感知基础的多语言数据集中使用更自然的语言数据。
Unlike most neural language models, humans learn language in a rich, multi-sensory and, often, multi-lingual environment. Current language models typically fail to fully capture the complexities of multilingual language use. We train an LSTM language model on images and captions in English and Spanish from MS-COCO-ES. We find that the visual grounding improves the model's understanding of semantic similarity both within and across languages and improves perplexity. However, we find no significant advantage of visual grounding for abstract words. Our results provide additional evidence of the advantages of visually grounded language models and point to the need for more naturalistic language data from multilingual speakers and multilingual datasets with perceptual grounding.