提示与GPT-3.5的意见摘要

论文标题

提示与GPT-3.5的意见摘要

Prompted Opinion Summarization with GPT-3.5

论文作者

Bhaskar, Adithya, Fabbri, Alexander R., Durrett, Greg

论文摘要

大型语言模型在各种任务中显示出令人印象深刻的表现，包括文本摘要。在本文中，我们表明这种强大的表现扩展到了意见摘要。我们探索了几种用于应用GPT-3.5的管道方法，以以促进的方式汇总大量用户评论。为了处理任意的用户评论，我们探讨了递归摘要以及选择显着内容以通过监督聚类或提取来汇总的方法。在两个数据集中，酒店评论（Space）的一个面向方面的摘要数据集以及亚马逊和Yelp评论的通用摘要数据集（LINDSUM），我们表明GPT-3.5模型在人类评估中具有非常强大的性能。我们认为，标准评估指标不能反映这一点，并引入了三个针对忠实，事实和通用性的新指标，以对比这些不同的方法。

Large language models have shown impressive performance across a wide variety of tasks, including text summarization. In this paper, we show that this strong performance extends to opinion summarization. We explore several pipeline methods for applying GPT-3.5 to summarize a large collection of user reviews in a prompted fashion. To handle arbitrarily large numbers of user reviews, we explore recursive summarization as well as methods for selecting salient content to summarize through supervised clustering or extraction. On two datasets, an aspect-oriented summarization dataset of hotel reviews (SPACE) and a generic summarization dataset of Amazon and Yelp reviews (FewSum), we show that GPT-3.5 models achieve very strong performance in human evaluation. We argue that standard evaluation metrics do not reflect this, and introduce three new metrics targeting faithfulness, factuality, and genericity to contrast these different methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题