论文标题
部分可观测时空混沌系统的无模型预测
An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts
论文作者
论文摘要
我们提出了一种简单而有效的方法,用于培训命名实体识别(NER)模型,该模型在商务电话交谈的记录上运行,该转录本包含噪音,这是由于口语对话的性质和自动语音识别的工件。我们首先通过有限的成绩单进行微调Luke,这是一种最先进的指定实体识别(NER)模型,然后用它作为教师模型,使用大量标记的数据和少量的人类通知数据来教授较小的基于Distilbert的学生模型。该模型可以达到高精度,同时还可以满足包含在商业电话产品中的实际限制:在具有成本效益的CPU而不是GPU上部署时实时性能。
We present a simple yet effective method to train a named entity recognition (NER) model that operates on business telephone conversation transcripts that contain noise due to the nature of spoken conversation and artifacts of automatic speech recognition. We first fine-tune LUKE, a state-of-the-art Named Entity Recognition (NER) model, on a limited amount of transcripts, then use it as the teacher model to teach a smaller DistilBERT-based student model using a large amount of weakly labeled data and a small amount of human-annotated data. The model achieves high accuracy while also satisfying the practical constraints for inclusion in a commercial telephony product: realtime performance when deployed on cost-effective CPUs rather than GPUs.