在学习的互动中预处理扎根的负担代表

论文标题

在学习的互动中预处理扎根的负担代表

Pretraining on Interactions for Learning Grounded Affordance Representations

论文作者

Merullo, Jack, Ebert, Dylan, Eickhoff, Carsten, Pavlick, Ellie

论文摘要

词汇语义和认知科学表明，负担能力（即对反对支持的行动）对于理解和代表名词和动词至关重要。但是，对这些语义特征的研究尚未与目前主导语言表示研究的“基础”模型集成。我们假设随着时间的推移，对象状态对对象状态的预测建模将导致“免费”编码对象负担信息的表示形式。我们训练神经网络，以预测对象的轨迹在模拟的相互作用中，并表明我们网络的潜在表示区分了观察到的和未观察到的负担。我们发现，使用空间数据集中的3D模拟训练的模型优于传统的2D计算机视觉模型，该模型训练了类似任务的训练，并且在初步检查时，概念之间的差异与预期功能相对应（例如，滚动需要旋转）。我们的结果提出了一种方法，即可以将现代深度学习方法与传统的词汇表示形式的正式语义概念融合在一起。

Lexical semantics and cognitive science point to affordances (i.e. the actions that objects support) as critical for understanding and representing nouns and verbs. However, study of these semantic features has not yet been integrated with the "foundation" models that currently dominate language representation research. We hypothesize that predictive modeling of object state over time will result in representations that encode object affordance information "for free". We train a neural network to predict objects' trajectories in a simulated interaction and show that our network's latent representations differentiate between both observed and unobserved affordances. We find that models trained using 3D simulations from our SPATIAL dataset outperform conventional 2D computer vision models trained on a similar task, and, on initial inspection, that differences between concepts correspond to expected features (e.g., roll entails rotation). Our results suggest a way in which modern deep learning approaches to grounded language learning can be integrated with traditional formal semantic notions of lexical representations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题