论文标题

Autoknow:成千上万类产品的自动驾驶知识收集

AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types

论文作者

Dong, Xin Luna, He, Xiang, Kan, Andrey, Li, Xian, Liang, Yan, Ma, Jun, Xu, Yifan Ethan, Zhang, Chenwei, Zhao, Tong, Saldana, Gabriel Blanco, Deshpande, Saurabh, Manduca, Alexandre Michetti, Ren, Jay, Singh, Surender Pal, Xiao, Fan, Chang, Haw-Shiuan, Karamanolakis, Giannis, Mao, Yuning, Wang, Yaqing, Faloutsos, Christos, McCallum, Andrew, Han, Jiawei

论文摘要

一个人可以为世界上所有产品构建知识图(kg)吗?知识图已牢固地确立了自己的宝贵信息来源,以供搜索和问答,并且自然想知道公园是否可以包含有关在线零售网站提供的产品的信息。有几个成功的通用kg的例子,但是组织有关产品的信息带来了许多其他挑战,包括产品的稀疏性和噪声,用于产品的结构化数据,具有数百万个产品类型和数千种属性的域的复杂性,大量类别的异质性,以及大型产品和不断增长的产品。我们描述了Autoknow,即解决这些挑战的自动(自动驾驶)系统。该系统包括一套针对分类法构建,产品属性识别,知识提取,异常检测和同义发现的新技术。 Autoknow是(a)自动,几乎不需要人类干预,(b)多尺度,可扩展的多个维度(许多域,许多产品和许多属性),以及(c)综合性,利用丰富的客户行为日志。 Autoknow一直在为超过11K产品类型的产品知识收集产品知识。

Can one build a knowledge graph (KG) for all products in the world? Knowledge graphs have firmly established themselves as valuable sources of information for search and question answering, and it is natural to wonder if a KG can contain information about products offered at online retail sites. There have been several successful examples of generic KGs, but organizing information about products poses many additional challenges, including sparsity and noise of structured data for products, complexity of the domain with millions of product types and thousands of attributes, heterogeneity across large number of categories, as well as large and constantly growing number of products. We describe AutoKnow, our automatic (self-driving) system that addresses these challenges. The system includes a suite of novel techniques for taxonomy construction, product property identification, knowledge extraction, anomaly detection, and synonym discovery. AutoKnow is (a) automatic, requiring little human intervention, (b) multi-scalable, scalable in multiple dimensions (many domains, many products, and many attributes), and (c) integrative, exploiting rich customer behavior logs. AutoKnow has been operational in collecting product knowledge for over 11K product types.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源