论文标题

Texsmart:用于细粒度和增强语义分析的文本理解系统

TexSmart: A Text Understanding System for Fine-Grained NER and Enhanced Semantic Analysis

论文作者

Zhang, Haisong, Liu, Lemao, Jiang, Haiyun, Li, Yangming, Zhao, Enbo, Xu, Kun, Song, Linfeng, Zheng, Suncong, Zhou, Botong, Zhu, Jianchen, Feng, Xiao, Chen, Tao, Yang, Tao, Yu, Dong, Zhang, Feng, Kang, Zhanhui, Shi, Shuming

论文摘要

该技术报告介绍了Texsmart,这是一个文本理解系统,该系统支持细粒的命名实体识别(NER)和增强的语义分析功能。与大多数以前公开的文本理解系统和工具相比,Texsmart具有一些独特的功能。首先,Texsmart的NER功能支持1,000多种实体类型,而大多数其他公共工具通常支持几种实体类型。其次,Texsmart引入了新的语义分析功能,例如语义扩展和深层语义表示,这些功能在以前的大多数系统中都不存在。第三,在Texsmart的一个功能中实施了各种算法(从非常快速的算法到相对较慢但更准确的算法),以满足不同的学术和工业应用的要求。特别强调了无监督或弱监督的算法的采用,目的是轻松更新我们的模型,以包括人类注释较少的工作,包括新的数据。 本报告的主要内容包括Texsmart的主要功能,实现这些功能的算法,如何使用Texsmart Toolkit和Web API以及某些关键算法的评估结果。

This technique report introduces TexSmart, a text understanding system that supports fine-grained named entity recognition (NER) and enhanced semantic analysis functionalities. Compared to most previous publicly available text understanding systems and tools, TexSmart holds some unique features. First, the NER function of TexSmart supports over 1,000 entity types, while most other public tools typically support several to (at most) dozens of entity types. Second, TexSmart introduces new semantic analysis functions like semantic expansion and deep semantic representation, that are absent in most previous systems. Third, a spectrum of algorithms (from very fast algorithms to those that are relatively slow but more accurate) are implemented for one function in TexSmart, to fulfill the requirements of different academic and industrial applications. The adoption of unsupervised or weakly-supervised algorithms is especially emphasized, with the goal of easily updating our models to include fresh data with less human annotation efforts. The main contents of this report include major functions of TexSmart, algorithms for achieving these functions, how to use the TexSmart toolkit and Web APIs, and evaluation results of some key algorithms.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源