论文标题
从大型审计语言模型中发现有用的句子表示
Discovering Useful Sentence Representations from Large Pretrained Language Models
论文作者
论文摘要
尽管预审预测的语言模型作为建立NLP系统的编码者,但他们并没有将其视为序列生成任务的解码器。我们探讨了这些模型是否可以被用作通用解码器的问题。要被视为“通用”,解码器必须对任何目标句子$ s $具有隐式表示,以便它在以其表示条件为条件时可以准确地恢复该句子。对于接受大量英语文本培训的大型基于变压器的语言模型,我们研究是否可以使用标准优化方法轻松地发现此类表示。我们介绍并比较了三种表示基于变压器模型的表示注入技术和三种随附的方法,这些方法映射到该表示空间和从该表示空间映射。实验表明,不仅存在各种流派的句子的表示形式。更重要的是,在不需要复杂的优化算法的情况下,我们的方法几乎完美地恢复了这些句子,而无需微调基础语言模型。
Despite the extensive success of pretrained language models as encoders for building NLP systems, they haven't seen prominence as decoders for sequence generation tasks. We explore the question of whether these models can be adapted to be used as universal decoders. To be considered "universal," a decoder must have an implicit representation for any target sentence $s$, such that it can recover that sentence exactly when conditioned on its representation. For large transformer-based language models trained on vast amounts of English text, we investigate whether such representations can be easily discovered using standard optimization methods. We present and compare three representation injection techniques for transformer-based models and three accompanying methods which map sentences to and from this representation space. Experiments show that not only do representations exist for sentences from a variety of genres. More importantly, without needing complex optimization algorithms, our methods recover these sentences almost perfectly without fine-tuning the underlying language model at all.