通过在循环中与人类有效的积极学习在电话交谈中改善指定的实体识别

论文标题

通过在循环中与人类有效的积极学习在电话交谈中改善指定的实体识别

Improving Named Entity Recognition in Telephone Conversations via Effective Active Learning with Human in the Loop

论文作者

Laskar, Md Tahmid Rahman, Chen, Cheng, Fu, Xue-Yong, TN, Shashi Bhushan

论文摘要

电话转录数据可能由于语音识别错误，出现的漏洞等而非常嘈杂。不仅注释这些数据对注释者来说非常具有挑战性，而且即使完成注释工作完成，此类数据也可能存在很多注释错误，从而导致模型表现非常差。在本文中，我们提出了一个主动学习框架，该框架利用循环学习中的人类从注释数据集中识别数据样本进行重新注释，而该数据更可能包含注释错误。通过这种方式，我们很大程度上减少了整个数据集重新注释的数据重新通道。我们通过提出的命名实体识别方法进行了广泛的实验，并观察到，通过将大约6％的培训实例重新通知整个数据集，某个实体类型的F1得分可以显着提高约25％。

Telephone transcription data can be very noisy due to speech recognition errors, disfluencies, etc. Not only that annotating such data is very challenging for the annotators, but also such data may have lots of annotation errors even after the annotation job is completed, resulting in a very poor model performance. In this paper, we present an active learning framework that leverages human in the loop learning to identify data samples from the annotated dataset for re-annotation that are more likely to contain annotation errors. In this way, we largely reduce the need for data re-annotation for the whole dataset. We conduct extensive experiments with our proposed approach for Named Entity Recognition and observe that by re-annotating only about 6% training instances out of the whole dataset, the F1 score for a certain entity type can be significantly improved by about 25%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题