论文标题
尝试:通过软提示的注意混合物进行参数效率的多任务调整
ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts
论文作者
论文摘要
这项工作介绍了一种新的多任务,参数效率语言模型(LM)调整方法,该方法通过软提示嵌入向量的柔软提示前嵌入向量进行了预先训练的不同任务,从而学会了在不同任务中转移知识。我们的方法称为尝试(及时调整的注意混合物),将源提示作为大规模源任务的编码中获取到少数参数中,并训练注意模块以插入源提示和目标任务中每个实例的新初始化目标提示。在培训期间,只有目标任务提示和多任务培训中任务之间共享的注意力权重进行了更新,而原始的LM和源提示则完好无损。尝试是高度参数效率的(例如,更新的参数比完整的微调少2,300倍),同时使用高资源任务的知识来实现高任务绩效。此外,它是使用预训练的软提示模块化的,并且可以灵活地添加或删除源提示以进行有效的知识转移。我们在21个不同NLP数据集中的实验结果表明,尝试显着胜过调谐,胜过或匹配完全微调或其他参数有效的调整方法,这些调整方法使用了十倍以上的参数。最后,尝试在几次学习设置中胜过以前的工作。
This work introduces a new multi-task, parameter-efficient language model (LM) tuning method that learns to transfer knowledge across different tasks via a mixture of soft prompts-small prefix embedding vectors pre-trained for different tasks. Our method, called ATTEMPT (ATTEntional Mixtures of Prompt Tuning), obtains source prompts as encodings of large-scale source tasks into a small number of parameters and trains an attention module to interpolate the source prompts and a newly initialized target prompt for every instance in the target task. During training, only the target task prompt and the attention weights, which are shared between tasks in multi-task training, are updated, while the original LM and source prompts are intact. ATTEMPT is highly parameter-efficient (e.g., updates 2,300 times fewer parameters than full fine-tuning) while achieving high task performance using knowledge from high-resource tasks. Moreover, it is modular using pre-trained soft prompts, and can flexibly add or remove source prompts for effective knowledge transfer. Our experimental results across 21 diverse NLP datasets show that ATTEMPT significantly outperforms prompt tuning and outperforms or matches fully fine-tuned or other parameter-efficient tuning approaches that use over ten times more parameters. Finally, ATTEMPT outperforms previous work in few-shot learning settings.