论文标题
社交媒体主题分类的非参数时间改编
Non-Parametric Temporal Adaptation for Social Media Topic Classification
论文作者
论文摘要
随着新趋势影响在线讨论,用户生成的社交媒体数据正在不断变化,并且由于隐私问题而删除了个人信息。但是,大多数当前的NLP模型都是静态的,并且依赖固定的培训数据,这意味着它们无法适应时间变化 - 无论是测试分配变化还是删除的培训数据),而无需频繁,昂贵的重新训练。在本文中,我们通过纵向主题标签预测的任务来研究时间适应,并提出了一种非参数密集的检索技术,该技术不需要重新训练,作为一种简单但有效的解决方案。在对展示时间分配变化的新收集,公开可用的Twitter数据集的实验中,我们的方法比最佳参数基线提高了64.12%,而没有任何基于昂贵的基于梯度的更新。我们的密集检索方法也特别非常适合与数据隐私法相一致的动态删除用户数据,并具有可忽略的计算成本和绩效损失。
User-generated social media data is constantly changing as new trends influence online discussion and personal information is deleted due to privacy concerns. However, most current NLP models are static and rely on fixed training data, which means they are unable to adapt to temporal change -- both test distribution shift and deleted training data -- without frequent, costly re-training. In this paper, we study temporal adaptation through the task of longitudinal hashtag prediction and propose a non-parametric dense retrieval technique, which does not require re-training, as a simple but effective solution. In experiments on a newly collected, publicly available, year-long Twitter dataset exhibiting temporal distribution shift, our method improves by 64.12% over the best parametric baseline without any of its costly gradient-based updating. Our dense retrieval approach is also particularly well-suited to dynamically deleted user data in line with data privacy laws, with negligible computational cost and performance loss.