论文标题
大规模数据驱动语言技术时代的数据治理
Data Governance in the Age of Large-Scale Data-Driven Language Technology
论文作者
论文摘要
机器学习技术的最新出现和采用,特别是大型语言模型,引起了人们对语言数据进行系统和透明管理的需求。这项工作提出了一种全球语言数据治理的方法,该方法试图在利益相关者,价值观和权利之间组织数据管理。我们的建议是通过对分布式治理的先前工作来告知的,该政府涉及人类价值观,并由国际研究合作的基础,该合作将来自60个国家 /地区的研究人员和从业人员汇集在一起。我们提出的框架是一种专注于语言数据的多党国际治理结构,并纳入了支持其工作所需的技术和组织工具。
The recent emergence and adoption of Machine Learning technology, and specifically of Large Language Models, has drawn attention to the need for systematic and transparent management of language data. This work proposes an approach to global language data governance that attempts to organize data management amongst stakeholders, values, and rights. Our proposal is informed by prior work on distributed governance that accounts for human values and grounded by an international research collaboration that brings together researchers and practitioners from 60 countries. The framework we present is a multi-party international governance structure focused on language data, and incorporating technical and organizational tools needed to support its work.