蛋糕：通过增强学习的测试时间提示

论文标题

蛋糕：通过增强学习的测试时间提示

TEMPERA: Test-Time Prompting via Reinforcement Learning

论文作者

Zhang, Tianjun, Wang, Xuezhi, Zhou, Denny, Schuurmans, Dale, Gonzalez, Joseph E.

论文摘要

仔细的及时设计对于在零击或几次学习中使用大语言模型至关重要。结果，对设计最佳提示的自动化方法越来越兴趣。在这项工作中，我们提出了使用加强学习（疗法）进行测试时间提示编辑。与先前的及时生成方法相反，蛋彩可以有效利用先验知识，适应不同的查询，并为每个查询提供了可解释的提示。为了实现这一目标，我们设计了一个新颖的动作空间，可以灵活地编辑初始提示，这些提示涵盖了许多常用的组件，例如说明，少量示例和言语。与最近的SOTA方法相比，所提出的方法在包括情感分析，主题分类，自然语言推断和阅读理解的各种任务中，诸如迅速调整，自动启动和RLPrompt（例如及时的调音，自动启动和RLPrompt）相比，取得了巨大的收益。与传统的微调方法相比，我们的方法平均实现样本效率的平均提高。

Careful prompt design is critical to the use of large language models in zero-shot or few-shot learning. As a consequence, there is a growing interest in automated methods to design optimal prompts. In this work, we propose Test-time Prompt Editing using Reinforcement learning (TEMPERA). In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge, is adaptive to different queries and provides an interpretable prompt for every query. To achieve this, we design a novel action space that allows flexible editing of the initial prompts covering a wide set of commonly-used components like instructions, few-shot exemplars, and verbalizers. The proposed method achieves significant gains compared with recent SoTA approaches like prompt tuning, AutoPrompt, and RLPrompt, across a variety of tasks including sentiment analysis, topic classification, natural language inference, and reading comprehension. Our method achieves 5.33x on average improvement in sample efficiency when compared to the traditional fine-tuning methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题