论文标题
通过寻求基于任务的平面区域改善多任务学习
Improving Multi-task Learning via Seeking Task-based Flat Regions
论文作者
论文摘要
多任务学习(MTL)是一种广泛使用且强大的学习范式,用于训练深层神经网络,可以通过一个单个骨干来学习一个以上的目标。与培训任务分别相比,MTL大大降低了计算成本,提高数据效率,并有可能通过利用跨任务知识来提高模型性能。因此,它已在各种应用中采用,从计算机视觉到自然语言处理和语音识别。其中,MTL中有一项新兴的工作,重点是操纵任务梯度,以得出最终的梯度下降方向以使所有任务受益。尽管在许多基准上取得了令人印象深刻的结果,但在不使用适当正则化技术的情况下直接应用这些方法可能会导致对现实问题的次优解决方案。特别是,最大程度地减少训练数据的经验损失的标准培训很容易遭受过度拟合到低资源任务或被嘈杂标记的任务宠坏,这可能会导致任务和整体绩效下降之间的负转移。为了减轻此类问题,我们建议利用最近引入的培训方法,即Sharpness-Aware Awawawawawawawawaence-Minimiation,这可以增强单任务学习的模型概括能力。因此,我们提出了一种新颖的MTL培训方法,鼓励该模型找到基于任务的平面最小值,以相干地提高其对所有任务的概括能力。最后,我们对各种应用进行了全面的实验,以证明我们对现有基于梯度的MTL方法的拟议方法的优点,如我们所发达的理论所建议的那样。
Multi-Task Learning (MTL) is a widely-used and powerful learning paradigm for training deep neural networks that allows learning more than one objective by a single backbone. Compared to training tasks separately, MTL significantly reduces computational costs, improves data efficiency, and potentially enhances model performance by leveraging knowledge across tasks. Hence, it has been adopted in a variety of applications, ranging from computer vision to natural language processing and speech recognition. Among them, there is an emerging line of work in MTL that focuses on manipulating the task gradient to derive an ultimate gradient descent direction to benefit all tasks. Despite achieving impressive results on many benchmarks, directly applying these approaches without using appropriate regularization techniques might lead to suboptimal solutions on real-world problems. In particular, standard training that minimizes the empirical loss on the training data can easily suffer from overfitting to low-resource tasks or be spoiled by noisy-labeled ones, which can cause negative transfer between tasks and overall performance drop. To alleviate such problems, we propose to leverage a recently introduced training method, named Sharpness-aware Minimization, which can enhance model generalization ability on single-task learning. Accordingly, we present a novel MTL training methodology, encouraging the model to find task-based flat minima for coherently improving its generalization capability on all tasks. Finally, we conduct comprehensive experiments on a variety of applications to demonstrate the merit of our proposed approach to existing gradient-based MTL methods, as suggested by our developed theory.