没有任何代表性的计算：避免通过多样性来避免数据和算法偏见

论文标题

没有任何代表性的计算：避免通过多样性来避免数据和算法偏见

No computation without representation: Avoiding data and algorithm biases through diversity

论文作者

Kuhlman, Caitlin, Jackson, Latifa, Chunara, Rumi

论文摘要

关于AI中伦理问题的研究的出现和成长，尤其是算法公平，源于一个基本观察，即社会中的结构不平等反映在用于培训预测模型和目标功能设计的数据中。尽管旨在减轻这些问题的研究本质上是跨学科的，但公正的算法和公平的社会技术系统的设计是取决于数据科学和计算领域的从业者的关键期望结果。但是，这些计算字段也广泛地遭受了我们分析的数据集中相同的代表性不足问题。这种断开连接会影响我们衡量成功的期望结果和指标的设计。如果道德AI研究社区接受这一点，我们会默认认可现状，并与非歧视和公平的目标相矛盾，这些目标是在算法公平，问责制和透明度上寻求解决的目标。因此，我们在这项工作中提倡将计算作为该领域的核心优先级以及我们实现道德AI实践的努力。我们在学术和专业计算领域缺乏多样性与数据集中遇到的偏见的类型和广度之间建立了联系，机器学习模型，问题表述和结果的解释。在研究AI文献中当前的公平/道德规范时，我们强调了这种缺乏多种观点的案例已经是治疗代表性不足和受保护的组数据的基础。我们还寻找其他专业社区，例如在法律和健康中，在学员的教育多样性和专业实践中，差异都降低了。我们使用这些教训来开发建议，为计算社区提供具体步骤以增加多样性。

The emergence and growth of research on issues of ethics in AI, and in particular algorithmic fairness, has roots in an essential observation that structural inequalities in society are reflected in the data used to train predictive models and in the design of objective functions. While research aiming to mitigate these issues is inherently interdisciplinary, the design of unbiased algorithms and fair socio-technical systems are key desired outcomes which depend on practitioners from the fields of data science and computing. However, these computing fields broadly also suffer from the same under-representation issues that are found in the datasets we analyze. This disconnect affects the design of both the desired outcomes and metrics by which we measure success. If the ethical AI research community accepts this, we tacitly endorse the status quo and contradict the goals of non-discrimination and equity which work on algorithmic fairness, accountability, and transparency seeks to address. Therefore, we advocate in this work for diversifying computing as a core priority of the field and our efforts to achieve ethical AI practices. We draw connections between the lack of diversity within academic and professional computing fields and the type and breadth of the biases encountered in datasets, machine learning models, problem formulations, and interpretation of results. Examining the current fairness/ethics in AI literature, we highlight cases where this lack of diverse perspectives has been foundational to the inequity in treatment of underrepresented and protected group data. We also look to other professional communities, such as in law and health, where disparities have been reduced both in the educational diversity of trainees and among professional practices. We use these lessons to develop recommendations that provide concrete steps for the computing community to increase diversity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题