使用LSTM网络的基于上下文的文本生成

论文标题

使用LSTM网络的基于上下文的文本生成

Context based Text-generation using LSTM networks

论文作者

Santhanam, Sivasurya

论文摘要

基于序列的模型的长期短期记忆（LSTM）单元已用于翻译，提问系统，由于学习长期依赖性的能力而进行的分类任务。在自然语言生成中，LSTM网络通过学习具有语法稳定语法的语言模型为文本生成模型提供了令人印象深刻的结果。但是缺点是网络没有了解上下文。该网络仅学习输入输出函数，并在给定一组输入词的情况下生成文本，而与实用主义无关。由于模型是没有任何这样的上下文的训练，因此生成的句子之间没有语义一致性。提出的模型经过训练，可以为给定的一组输入单词以及上下文向量生成文本。上下文向量类似于掌握句子语义含义（上下文）的段落向量。在这项工作中提出了提取上下文向量的几种方法。在训练语言模型的同时，除了输入输出序列外，上下文向量还与输入一起训练。由于这种结构，该模型了解了输入词，上下文向量和目标词之间的关系。给定一组上下文术语，训练有素的模型将围绕提供的上下文生成文本。基于计算上下文向量的性质，已尝试了两个变体（单词值和单词群集）的模型。在单词聚类方法中，还探讨了各个域之间的合适嵌入。根据生成的文本与给定上下文的语义接近度评估结果。

Long short-term memory(LSTM) units on sequence-based models are being used in translation, question-answering systems, classification tasks due to their capability of learning long-term dependencies. In Natural language generation, LSTM networks are providing impressive results on text generation models by learning language models with grammatically stable syntaxes. But the downside is that the network does not learn about the context. The network only learns the input-output function and generates text given a set of input words irrespective of pragmatics. As the model is trained without any such context, there is no semantic consistency among the generated sentences. The proposed model is trained to generate text for a given set of input words along with a context vector. A context vector is similar to a paragraph vector that grasps the semantic meaning(context) of the sentence. Several methods of extracting the context vectors are proposed in this work. While training a language model, in addition to the input-output sequences, context vectors are also trained along with the inputs. Due to this structure, the model learns the relation among the input words, context vector and the target word. Given a set of context terms, a well trained model will generate text around the provided context. Based on the nature of computing context vectors, the model has been tried out with two variations (word importance and word clustering). In the word clustering method, the suitable embeddings among various domains are also explored. The results are evaluated based on the semantic closeness of the generated text to the given context.

下载PDF全文

下载文献需遵守相关版权规定

论文标题