通过自我训练的自我训练模型来改善往来检测

论文标题

通过自我训练的自我训练模型来改善往来检测

Improving Disfluency Detection by Self-Training a Self-Attentive Model

论文作者

Lou, Paria Jamshid, Johnson, Mark

论文摘要

当前，使用上下文化的单词嵌入（例如Elmo或Bert），自我煽动的神经句法解析器目前会产生最新的最新词，从而在语音转录本中产生联合解析和差异检测。由于上下文化的单词嵌入在大量未标记的数据上进行了预训练，因此使用其他未标记的数据训练神经模型似乎是多余的。但是，我们表明，自我训练 - 一种半监督的技术，用于合并未标记的数据 - 为自我培训的自我检测设置了新的最新技术，以表明自我训练为预培养的上下文化的单词表示提供了正交的好处。我们还表明，结合训练的解析器为探测提供了进一步的收益。

Self-attentive neural syntactic parsers using contextualized word embeddings (e.g. ELMo or BERT) currently produce state-of-the-art results in joint parsing and disfluency detection in speech transcripts. Since the contextualized word embeddings are pre-trained on a large amount of unlabeled data, using additional unlabeled data to train a neural model might seem redundant. However, we show that self-training - a semi-supervised technique for incorporating unlabeled data - sets a new state-of-the-art for the self-attentive parser on disfluency detection, demonstrating that self-training provides benefits orthogonal to the pre-trained contextualized word representations. We also show that ensembling self-trained parsers provides further gains for disfluency detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题