将自然语言说明映射到移动UI操作序列

论文标题

将自然语言说明映射到移动UI操作序列

Mapping Natural Language Instructions to Mobile UI Action Sequences

论文作者

Li, Yang, He, Jiacong, Zhou, Xin, Zhang, Yuan, Baldridge, Jason

论文摘要

我们提出了一个新问题：将自然语言指令接地到移动用户界面操作，并为其创建三个新数据集。为了进行完整的任务评估，我们创建了PixelHelp，这是一种与移动UI模拟器上的人所执行的英语说明配对的语料库。为了扩展培训，我们通过（a）注释动作短语将语言和动作数据解除了如何指示，以及（b）综合移动用户界面的操作的扎根描述。我们使用变压器从远程自然语言指令中提取动作短词组。然后，接地变压器使用其内容和屏幕位置表示UI对象，并将它们连接到对象描述。鉴于起始屏幕和说明，我们的模型在预测PixelHelp中完整的基真实动作序列方面达到了70.59％的精度。

We present a new problem: grounding natural language instructions to mobile user interface actions, and create three new datasets for it. For full task evaluation, we create PIXELHELP, a corpus that pairs English instructions with actions performed by people on a mobile UI emulator. To scale training, we decouple the language and action data by (a) annotating action phrase spans in HowTo instructions and (b) synthesizing grounded descriptions of actions for mobile user interfaces. We use a Transformer to extract action phrase tuples from long-range natural language instructions. A grounding Transformer then contextually represents UI objects using both their content and screen position and connects them to object descriptions. Given a starting screen and instruction, our model achieves 70.59% accuracy on predicting complete ground-truth action sequences in PIXELHELP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题