通过在上下文特定的教师教育的特定对话代理发展中通过弱监督改善分类

论文标题

通过在上下文特定的教师教育的特定对话代理发展中通过弱监督改善分类

Improving Classification through Weak Supervision in Context-specific Conversational Agent Development for Teacher Education

论文作者

Datta, Debajyoti, Phillips, Maria, Chiu, Jennifer, Watson, Ginger S., Bywater, James P., Barnes, Laura, Brown, Donald

论文摘要

将对话代理开发的自然语言处理（NLP）组件应用于自然语言处理（NLP）的机器学习技术显示出令人鼓舞的结果，以提高对话代理可以提供的反馈的准确性和质量。开发教育方案所需的努力是特定的对话代理，因为它需要域专家来标记和注释嘈杂的数据源，例如课堂视频。以前的建模注释方法依赖于标记成千上万个示例并计算通道间的一致性和多数票，以模拟必要的情况。这种方法虽然被证明是成功的，却忽略了单个注释者在标记数据点的标签上的优势，并且不足以利用没有多数投票标签的示例。我们建议使用多任务薄弱的监督方法与主动学习相结合以解决这些问题。与传统方法相比，这种方法所需的标签需要少，并且与多数投票方法相比，精确，效率和时间要求的标签较大（Ratner 2019）。我们在Google拼图数据集中证明了此方法的有效性，然后提出了一种使用教学质量评估（IQA）应用此方法来定义标签类别的方案。我们建议使用注释标签的概率建模来生成主动学习示例，以进一步标记数据。主动学习能够迭代地提高原始分类模型的训练性能和准确性。这种方法结合了薄弱监督和积极学习的最新标签技术，以优化教育领域的结果，并可以进一步用于减少通过转移学习中教育领域扩展方案的数据要求。

Machine learning techniques applied to the Natural Language Processing (NLP) component of conversational agent development show promising results for improved accuracy and quality of feedback that a conversational agent can provide. The effort required to develop an educational scenario specific conversational agent is time consuming as it requires domain experts to label and annotate noisy data sources such as classroom videos. Previous approaches to modeling annotations have relied on labeling thousands of examples and calculating inter-annotator agreement and majority votes in order to model the necessary scenarios. This method, while proven successful, ignores individual annotator strengths in labeling a data point and under-utilizes examples that do not have a majority vote for labeling. We propose using a multi-task weak supervision method combined with active learning to address these concerns. This approach requires less labeling than traditional methods and shows significant improvements in precision, efficiency, and time-requirements than the majority vote method (Ratner 2019). We demonstrate the validity of this method on the Google Jigsaw data set and then propose a scenario to apply this method using the Instructional Quality Assessment(IQA) to define the categories for labeling. We propose using probabilistic modeling of annotator labeling to generate active learning examples to further label the data. Active learning is able to iteratively improve the training performance and accuracy of the original classification model. This approach combines state-of-the art labeling techniques of weak supervision and active learning to optimize results in the educational domain and could be further used to lessen the data requirements for expanded scenarios within the education domain through transfer learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题