论文标题
小型语料库的LSTM和BERT的比较
A Comparison of LSTM and BERT for Small Corpus
论文作者
论文摘要
NLP领域的最新进展表明,通过调整预训练的模型而不是从头开始,转移学习有助于实现新任务的最新结果。变形金刚在为许多NLP任务创建新的最新结果方面做出了重大改进,包括但不限于文本分类,文本生成和序列标签。这些成功案例大多数都是基于大数据集的。在本文中,我们专注于学术界和行业经常面临的现实情况:鉴于一个小数据集,我们可以使用像伯特这样的大型预培训模型并获得比简单模型更好的结果吗?为了回答这个问题,我们使用一个小数据集用于收集的意图分类来构建聊天机器人,并将简单双向LSTM模型的性能与预先训练的BERT模型进行比较。我们的实验结果表明,双向LSTM模型比小数据集的BERT模型可以取得明显更高的结果,并且这些简单模型在训练的时间内被比调整预训练的对应物的时间少得多。我们得出的结论是,模型的性能取决于任务和数据,因此在做出模型选择之前,应考虑这些因素,而不是直接选择最流行的模型。
Recent advancements in the NLP field showed that transfer learning helps with achieving state-of-the-art results for new tasks by tuning pre-trained models instead of starting from scratch. Transformers have made a significant improvement in creating new state-of-the-art results for many NLP tasks including but not limited to text classification, text generation, and sequence labeling. Most of these success stories were based on large datasets. In this paper we focus on a real-life scenario that scientists in academia and industry face frequently: given a small dataset, can we use a large pre-trained model like BERT and get better results than simple models? To answer this question, we use a small dataset for intent classification collected for building chatbots and compare the performance of a simple bidirectional LSTM model with a pre-trained BERT model. Our experimental results show that bidirectional LSTM models can achieve significantly higher results than a BERT model for a small dataset and these simple models get trained in much less time than tuning the pre-trained counterparts. We conclude that the performance of a model is dependent on the task and the data, and therefore before making a model choice, these factors should be taken into consideration instead of directly choosing the most popular model.