compguess什么？！：用于基础语言学习的多任务评估框架

论文标题

compguess什么？！：用于基础语言学习的多任务评估框架

CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning

论文作者

Suglia, Alessandro, Konstas, Ioannis, Vanzo, Andrea, Bastianelli, Emanuele, Elliott, Desmond, Frank, Stella, Lemon, Oliver

论文摘要

接地语言学习的方法通常集中在基于任务的最终绩效指标上，该绩效指标可能不取决于所学到的隐藏表示形式的理想属性，例如它们预测明显属性或推广到看不见的情况的能力。为了解决这个问题，我们提出了Grolla，这是一个带有三个子任务的属性的基础语言学习框架：1）面向目标的评估； 2）对象属性预测评估； 3）零射门评估。我们还提出了一个新的数据集compguess？作为该框架的一个实例，用于评估学习神经表示的质量，特别是关于属性接地的实例。为此，我们扩展了原始的猜测？通过在感知层的顶部包含语义层来进行数据集。具体来说，我们丰富了与猜测关联的Visualgenome场景图？具有抽象和位置属性的图像。通过使用诊断分类器，我们表明当前的模型学习表达式不足以编码对象属性（平均F1为44.27）。此外，当游戏玩法涉及新颖的场景或物体时，他们没有学习策略或表现力，可以表现出色（零拍摄的最佳准确度为50.06％）。

Approaches to Grounded Language Learning typically focus on a single task-based final performance measure that may not depend on desirable properties of the learned hidden representations, such as their ability to predict salient attributes or to generalise to unseen situations. To remedy this, we present GROLLA, an evaluation framework for Grounded Language Learning with Attributes with three sub-tasks: 1) Goal-oriented evaluation; 2) Object attribute prediction evaluation; and 3) Zero-shot evaluation. We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations, in particular concerning attribute grounding. To this end, we extend the original GuessWhat?! dataset by including a semantic layer on top of the perceptual one. Specifically, we enrich the VisualGenome scene graphs associated with the GuessWhat?! images with abstract and situated attributes. By using diagnostic classifiers, we show that current models learn representations that are not expressive enough to encode object attributes (average F1 of 44.27). In addition, they do not learn strategies nor representations that are robust enough to perform well when novel scenes or objects are involved in gameplay (zero-shot best accuracy 50.06%).

下载PDF全文

下载文献需遵守相关版权规定

论文标题