通过验证的模型积极学习非语义语音任务

论文标题

通过验证的模型积极学习非语义语音任务

Active Learning of Non-semantic Speech Tasks with Pretrained Models

论文作者

Lee, Harlin, Saeed, Aaqib, Bertozzi, Andrea L.

论文摘要

具有大量未标记数据集的预处理神经网络已变得流行，因为它在求解下游任务之前就可以使深层模型更具效果。但是，这种方法通常假定下游任务可以访问足够大小的注释数据。在这项工作中，我们提出了芦荟，这是一种新型系统，用于通过主动学习来改善非语义语音任务的数据和标签效率。芦荟结合审计的模型与主动学习结合使用，以逐步标记数据并学习下游任务的分类器，从而减轻了事先获取标记数据的需求。我们证明了芦荟对广泛的任务，基于不确定性的采集功能和模型体系结构的有效性。训练与芦荟的冷冻编码器上的线性分类器显示出类似于使用整个标记数据的几个基线的性能。

Pretraining neural networks with massive unlabeled datasets has become popular as it equips the deep models with a better prior to solve downstream tasks. However, this approach generally assumes that the downstream tasks have access to annotated data of sufficient size. In this work, we propose ALOE, a novel system for improving the data- and label-efficiency of non-semantic speech tasks with active learning. ALOE uses pretrained models in conjunction with active learning to label data incrementally and learn classifiers for downstream tasks, thereby mitigating the need to acquire labeled data beforehand. We demonstrate the effectiveness of ALOE on a wide range of tasks, uncertainty-based acquisition functions, and model architectures. Training a linear classifier on top of a frozen encoder with ALOE is shown to achieve performance similar to several baselines that utilize the entire labeled data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题