论文标题
GEDI:生成歧视者引导序列产生
GeDi: Generative Discriminator Guided Sequence Generation
论文作者
论文摘要
尽管大规模的语言模型(LMS)能够很好地模仿自然语言的分布以产生逼真的文本,但很难控制它们产生的分布的哪个区域。这尤其有问题,因为用于训练大型LM的数据集通常包含明显的毒性,仇恨,偏见和消极情绪。我们建议GEDI作为使用较小的LMS作为生成歧视器的有效方法,以指导大型LMS的生成,以使其更安全,更可控制。 GEDI通过计算通过贝叶斯规则来计算所有可能的令牌规则的分类概率,通过在两个类别条件分布上进行标准化,来指导每个步骤的生成;一个以所需属性或控制代码为条件,另一个以不希望的属性或反控制代码为条件。我们发现,GEDI比最先进的方法具有更强的可控性,同时还达到了生成速度的速度超过30倍。此外,仅在四个主题上进行培训GEDI允许我们从关键字中可以控制地生成新的主题零射门,从而解开了以前可控生成方法所没有的新功能。最后,我们表明GEDI可以在不牺牲语言质量的情况下使GPT-2(1.5B参数)的毒性明显降低,这使得它是迄今为止最实用的现有方法,用于在保持快速生成速度的同时对大型语言模型排毒。
While large-scale language models (LMs) are able to imitate the distribution of natural language well enough to generate realistic text, it is difficult to control which regions of the distribution they generate. This is especially problematic because datasets used for training large LMs usually contain significant toxicity, hate, bias, and negativity. We propose GeDi as an efficient method for using smaller LMs as generative discriminators to guide generation from large LMs to make them safer and more controllable. GeDi guides generation at each step by computing classification probabilities for all possible next tokens via Bayes rule by normalizing over two class-conditional distributions; one conditioned on the desired attribute, or control code, and another conditioned on the undesired attribute, or anti control code. We find that GeDi gives stronger controllability than the state of the art method while also achieving generation speeds more than 30 times faster. Additionally, training GeDi on only four topics allows us to controllably generate new topics zero-shot from just a keyword, unlocking a new capability that previous controllable generation methods do not have. Lastly, we show that GeDi can make GPT-2 (1.5B parameters) significantly less toxic without sacrificing linguistic quality, making it by far the most practical existing method for detoxifying large language models while maintaining a fast generation speed.