论文标题
顺序定位:文本分类中数据不平衡的增量学习方法
Sequential Targeting: an incremental learning approach for data imbalance in text classification
论文作者
论文摘要
分类任务需要平衡的数据分布,以确保对学习者进行培训以概括所有课程。但是,在实际数据集中,类别之间的实例数在类中有很大差异。这通常会导致一个学习者,该学习者由于其主要财产而促进对多数群体的偏见。因此,处理不平衡数据集的方法对于减轻分布偏斜并充分利用代表性不足的数据至关重要,尤其是在文本分类中。在解决文本数据中的不平衡时,大多数方法在数据的数值表示方面都使用采样方法,这将其效率限制在表示的效率上。我们提出了一种新颖的训练方法,即顺序靶向(ST),与表示方法的有效性无关,该方法通过将数据分解为相互排斥的子集并适应学习者来实施增量学习设置。为了解决增量学习中出现的问题,我们应用弹性重量巩固。我们通过对模拟基准数据集(IMDB)和从Naver收集的数据进行实验来证明我们的方法的有效性。
Classification tasks require a balanced distribution of data to ensure the learner to be trained to generalize over all classes. In real-world datasets, however, the number of instances vary substantially among classes. This typically leads to a learner that promotes bias towards the majority group due to its dominating property. Therefore, methods to handle imbalanced datasets are crucial for alleviating distributional skews and fully utilizing the under-represented data, especially in text classification. While addressing the imbalance in text data, most methods utilize sampling methods on the numerical representation of the data, which limits its efficiency on how effective the representation is. We propose a novel training method, Sequential Targeting(ST), independent of the effectiveness of the representation method, which enforces an incremental learning setting by splitting the data into mutually exclusive subsets and training the learner adaptively. To address problems that arise within incremental learning, we apply elastic weight consolidation. We demonstrate the effectiveness of our method through experiments on simulated benchmark datasets (IMDB) and data collected from NAVER.