解释多任务学习从脊柱MRI报告中提取有效知识的有效性

论文标题

解释多任务学习从脊柱MRI报告中提取有效知识的有效性

Explaining the Effectiveness of Multi-Task Learning for Efficient Knowledge Extraction from Spine MRI Reports

论文作者

Sehanobish, Arijit, Sandora, McCullen, Abraham, Nabila, Pawar, Jayashri, Torres, Danielle, Das, Anasuya, Becker, Murray, Herzog, Richard, Odry, Benjamin, Vianu, Ron

论文摘要

在特定领域的CORPORA上进行了预处理的基于变压器的模型已改变了NLP的景观。但是，培训或对这些模型进行单个任务的培训可能是耗时和资源密集型的。因此，许多当前的研究集中在使用变压器进行多任务学习（Raffel等，2020）以及如何将任务分组以帮助多任务模型学习可以在任务中共享的有效表示（Standley等，2020; Fifty et al。，2021）。在这项工作中，我们表明，当特定于任务模型在其所有隐藏层上显示相似的表示形式时，单个多任务模型可以匹配任务特定模型的性能，并且它们的梯度是对齐的，即它们的梯度遵循相同的方向。我们假设上述观察结果解释了多任务学习的有效性。我们验证了对颈椎和腰椎内部放射线医师注销数据集的观察结果。我们的方法简单明了，可以在多种NLP问题中使用。

Pretrained Transformer based models finetuned on domain specific corpora have changed the landscape of NLP. However, training or fine-tuning these models for individual tasks can be time consuming and resource intensive. Thus, a lot of current research is focused on using transformers for multi-task learning (Raffel et al.,2020) and how to group the tasks to help a multi-task model to learn effective representations that can be shared across tasks (Standley et al., 2020; Fifty et al., 2021). In this work, we show that a single multi-tasking model can match the performance of task specific models when the task specific models show similar representations across all of their hidden layers and their gradients are aligned, i.e. their gradients follow the same direction. We hypothesize that the above observations explain the effectiveness of multi-task learning. We validate our observations on our internal radiologist-annotated datasets on the cervical and lumbar spine. Our method is simple and intuitive, and can be used in a wide range of NLP problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题