论文标题
ASET:文本收集的临时结构化探索[扩展摘要]
ASET: Ad-hoc Structured Exploration of Text Collections [Extended Abstract]
论文作者
论文摘要
在本文中,我们提出了一个名为ASET的新系统,该系统允许用户以临时的方式对文本收集进行结构化探索。 ASET的主要思想是使用一种新的两阶段方法,该方法首先使用现有提取器(例如命名实体识别器)从文本中提取信息掘金的超集,然后根据嵌入式的用户要求将提取物与用户要求的结构化表定义匹配。在我们的评估中,我们表明ASET能够以高质量从现实世界的文本收集中提取结构化数据,而无需预先设计提取管道。
In this paper, we propose a new system called ASET that allows users to perform structured explorations of text collections in an ad-hoc manner. The main idea of ASET is to use a new two-phase approach that first extracts a superset of information nuggets from the texts using existing extractors such as named entity recognizers and then matches the extractions to a structured table definition as requested by the user based on embeddings. In our evaluation, we show that ASET is thus able to extract structured data from real-world text collections in high quality without the need to design extraction pipelines upfront.