论文标题

具有成本效益的变异主动实体分辨率

Cost-effective Variational Active Entity Resolution

论文作者

Bogatu, Alex, Paton, Norman W., Douthwaite, Mark, Davie, Stuart, Freitas, Andre

论文摘要

准确地确定同一现实世界实体的不同表示是数据清洁的组成部分,并且已经提出了许多方法来完成它。该实体解决任务的挑战通常需要大量的研究注意力,这通常源于该过程的任务特异性和用户依赖性。采用深度学习技术有可能减少这些挑战。在本文中,我们着手设计一种实体解决方法,该方法基于深度自动编码器赋予的稳健性,以降低人类参与成本。具体而言,我们通过执行无监督的表示学习来降低培训深度实体解决模型的成本。这揭示了所得模型的可转移性属性,该属性可以通过转移学习进一步降低将方法应用于新数据集的成本。最后,我们通过主动学习方法降低了培训数据的成本,该方法建立在使用深度自动编码器所赋予的属性上。经验评估证实了我们的成本降低的实现,同时与最先进的替代方案获得了可比的有效性。

Accurately identifying different representations of the same real-world entity is an integral part of data cleaning and many methods have been proposed to accomplish it. The challenges of this entity resolution task that demand so much research attention are often rooted in the task-specificity and user-dependence of the process. Adopting deep learning techniques has the potential to lessen these challenges. In this paper, we set out to devise an entity resolution method that builds on the robustness conferred by deep autoencoders to reduce human-involvement costs. Specifically, we reduce the cost of training deep entity resolution models by performing unsupervised representation learning. This unveils a transferability property of the resulting model that can further reduce the cost of applying the approach to new datasets by means of transfer learning. Finally, we reduce the cost of labelling training data through an active learning approach that builds on the properties conferred by the use of deep autoencoders. Empirical evaluation confirms the accomplishment of our cost-reduction desideratum while achieving comparable effectiveness with state-of-the-art alternatives.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源