Spandrop：长序列的简单有效的反事实学习

论文标题

Spandrop：长序列的简单有效的反事实学习

SpanDrop: Simple and Effective Counterfactual Learning for Long Sequences

论文作者

Qi, Peng, Wang, Guangtao, Huang, Jing

论文摘要

从长序列中提取监督信号以进行预测是机器学习中的一项艰巨任务，尤其是当输入序列中的所有元素并非同等贡献所需的输出时。在本文中，我们提出了Spandrop，这是一种简单有效的数据增强技术，可帮助模型以很少的示例以很少的序列识别真实的监督信号。通过直接操纵输入序列，SpandRop一次随机消融序列的一部分，并要求模型执行相同的任务以模拟反事实学习并实现输入属性。基于对其属性的理论分析，我们还基于β-伯努利分布提出了spandrop的一种变体，该变体产生了不同的增强序列，同时提供了一个与原始数据集更一致的学习目标。我们证明了Spandrop对一组精心设计的玩具任务的有效性，以及各种自然语言处理任务，这些任务需要长时间的推理才能得出正确的答案，并表明当数据稀缺和丰富的数据时，它有助于改善模型的性能。

Distilling supervision signal from a long sequence to make predictions is a challenging task in machine learning, especially when not all elements in the input sequence contribute equally to the desired output. In this paper, we propose SpanDrop, a simple and effective data augmentation technique that helps models identify the true supervision signal in a long sequence with very few examples. By directly manipulating the input sequence, SpanDrop randomly ablates parts of the sequence at a time and ask the model to perform the same task to emulate counterfactual learning and achieve input attribution. Based on theoretical analysis of its properties, we also propose a variant of SpanDrop based on the beta-Bernoulli distribution, which yields diverse augmented sequences while providing a learning objective that is more consistent with the original dataset. We demonstrate the effectiveness of SpanDrop on a set of carefully designed toy tasks, as well as various natural language processing tasks that require reasoning over long sequences to arrive at the correct answer, and show that it helps models improve performance both when data is scarce and abundant.

下载PDF全文

下载文献需遵守相关版权规定

论文标题