论文标题

提示播客笔录的抽象扎根汇总

Towards Abstractive Grounded Summarization of Podcast Transcripts

论文作者

Song, Kaiqiang, Li, Chen, Wang, Xiaoyang, Yu, Dong, Liu, Fei

论文摘要

播客最近显示出受欢迎程度的迅速增长。播客成绩单的汇总对内容提供商和消费者都是实际的好处。它可以帮助消费者快速确定他们是否会收听播客,并减少内容提供商的认知负载来撰写摘要。然而,播客摘要面临重大挑战,包括有关投入的事实矛盾。语音疏散和口语笔录中的识别错误加剧了问题。在本文中,我们探讨了一种新颖的抽象摘要方法来减轻这些挑战。具体而言,我们的方法学会了产生抽象性摘要,同时将成绩单的特定部分的摘要片段扎根,以允许对摘要细节进行全面检查。我们在大型播客数据集上对拟议方法进行了一系列分析,并表明该方法可以实现有希望的结果。扎根的摘要在找到包含不一致信息的摘要和笔录段中带来了明显的好处,因此在自动和人类评估指标中都显着提高了汇总质量。

Podcasts have recently shown a rapid rise in popularity. Summarization of podcast transcripts is of practical benefit to both content providers and consumers. It helps consumers to quickly decide whether they will listen to the podcasts and reduces the cognitive load of content providers to write summaries. Nevertheless, podcast summarization faces significant challenges including factual inconsistencies with respect to the inputs. The problem is exacerbated by speech disfluencies and recognition errors in transcripts of spoken language. In this paper, we explore a novel abstractive summarization method to alleviate these challenges. Specifically, our approach learns to produce an abstractive summary while grounding summary segments in specific portions of the transcript to allow for full inspection of summary details. We conduct a series of analyses of the proposed approach on a large podcast dataset and show that the approach can achieve promising results. Grounded summaries bring clear benefits in locating the summary and transcript segments that contain inconsistent information, and hence significantly improve summarization quality in both automatic and human evaluation metrics.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源