论文标题
Onelabeler:一个用于构建数据标签工具的灵活系统
OneLabeler: A Flexible System for Building Data Labeling Tools
论文作者
论文摘要
标记的数据集对于监督机器学习至关重要。已经构建了各种数据标记工具,以在不同的用法方案中收集标签。但是,开发标签工具是在软件开发方面耗时,昂贵的和专业知识。在本文中,我们根据概念框架提出了一个用于数据标记和亮贴机的概念框架,以支持轻松构建标签工具,以实现各种用法方案。该框架由通过编码现有工具的编码来概括的标签工具中的常见模块和状态组成。 Onelabeler通过视觉编程来支持常见软件模块的配置和组成,以构建数据标记工具。模块可以是数据标记中的人,机器或混合计算过程。我们通过使用Onelabeler构建的十个示例标签工具来证明系统的表现力和实用性。开发人员的用户研究提供了证据,表明Onelabeler支持有效地构建各种数据标签工具。
Labeled datasets are essential for supervised machine learning. Various data labeling tools have been built to collect labels in different usage scenarios. However, developing labeling tools is time-consuming, costly, and expertise-demanding on software development. In this paper, we propose a conceptual framework for data labeling and OneLabeler based on the conceptual framework to support easy building of labeling tools for diverse usage scenarios. The framework consists of common modules and states in labeling tools summarized through coding of existing tools. OneLabeler supports configuration and composition of common software modules through visual programming to build data labeling tools. A module can be a human, machine, or mixed computation procedure in data labeling. We demonstrate the expressiveness and utility of the system through ten example labeling tools built with OneLabeler. A user study with developers provides evidence that OneLabeler supports efficient building of diverse data labeling tools.