论文标题

LEAM:用于原位视觉文本分析的交互式系统

Leam: An Interactive System for In-situ Visual Text Analysis

论文作者

Rahman, Sajjadur, Griggs, Peter, Demiralp, Çağatay

论文摘要

随着网络上生成的数字文本的规模和可用性的增加,在线零售商和聚合商等企业经常使用文本分析来挖掘和分析数据以改善其服务和产品。文本数据分析是一种迭代的非线性过程,从数据清洁到可视化,具有多个阶段的不同工作流程。现有的文本分析系统通常适合这些阶段的子集,并且通常无法解决与数据异质性,出处,工作流可重复使用性和可重复性以及与既定实践相兼容的挑战。基于我们从这些挑战中得出的一系列设计考虑因素,我们提出了LEAM,该系统通过结合计算笔记本,电子表格和可视化工具的优势将文本分析过程视为单个连续性。 LEAM具有用于运行文本分析工作流程的交互式用户界面,用于管理多个原子和复合数据类型的新数据模型以及一种表达性代数,该代数捕获了代表文本分析各个阶段的多种操作集,并启用了系统的不同组件之间的协调,包括数据,代码和可视化。我们报告了我们目前在LEAM开发方面的进展,同时证明了其用法示例的有用性。最后,我们概述了许多增强功能,并确定了开发交互式视觉文本分析系统的几个研究方向。

With the increase in scale and availability of digital text generated on the web, enterprises such as online retailers and aggregators often use text analytics to mine and analyze the data to improve their services and products alike. Text data analysis is an iterative, non-linear process with diverse workflows spanning multiple stages, from data cleaning to visualization. Existing text analytics systems usually accommodate a subset of these stages and often fail to address challenges related to data heterogeneity, provenance, workflow reusability and reproducibility, and compatibility with established practices. Based on a set of design considerations we derive from these challenges, we propose Leam, a system that treats the text analysis process as a single continuum by combining advantages of computational notebooks, spreadsheets, and visualization tools. Leam features an interactive user interface for running text analysis workflows, a new data model for managing multiple atomic and composite data types, and an expressive algebra that captures diverse sets of operations representing various stages of text analysis and enables coordination among different components of the system, including data, code, and visualizations. We report our current progress in Leam development while demonstrating its usefulness with usage examples. Finally, we outline a number of enhancements to Leam and identify several research directions for developing an interactive visual text analysis system.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源