论文标题
从引导和逐步增强奖励中学习:文本样式转移的半监督框架
Learning from Bootstrapping and Stepwise Reinforcement Reward: A Semi-Supervised Framework for Text Style Transfer
论文作者
论文摘要
文本样式转移是可控语言生成的重要任务。有监督的方法推动了面向风格的重写(例如形式转换)的性能提高。但是,由于许多领域中的大规模平行数据缺乏挑战。虽然无监督的方法不依赖于每种样式的注释句子对,但它们通常会困扰着不稳定的问题,例如模式崩溃或质量退化。为了利用受监督和无监督的范式并应对挑战,在这项工作中,我们为文本样式转移提供了一个半监督的框架。首先,学习过程以使用词汇和基于语义的方法自动构造的伪并行对引导的监督进行引导。然后,该模型通过增强奖励从未标记的数据中学习。具体而言,我们建议通过逐步奖励优化提高顺序到序列策略梯度,提供细粒度的学习信号并稳定增强的学习过程。实验结果表明,所提出的方法在多个数据集上实现了最先进的性能,并产生有效的生成,其最低限量是培训数据的10 \%。
Text style transfer is an important task in controllable language generation. Supervised approaches have pushed performance improvement on style-oriented rewriting such as formality conversion. However, challenges remain due to the scarcity of large-scale parallel data in many domains. While unsupervised approaches do not rely on annotated sentence pairs for each style, they are often plagued with instability issues such as mode collapse or quality degradation. To take advantage of both supervised and unsupervised paradigms and tackle the challenges, in this work, we propose a semi-supervised framework for text style transfer. First, the learning process is bootstrapped with supervision guided by automatically constructed pseudo-parallel pairs using lexical and semantic-based methods. Then the model learns from unlabeled data via reinforcement rewards. Specifically, we propose to improve the sequence-to-sequence policy gradient via stepwise reward optimization, providing fine-grained learning signals and stabilizing the reinforced learning process. Experimental results show that the proposed approach achieves state-of-the-art performance on multiple datasets, and produces effective generation with as minimal as 10\% of training data.