招聘人类团队中的绩效和偏见的调查

论文标题

招聘人类团队中的绩效和偏见的调查

Investigations of Performance and Bias in Human-AI Teamwork in Hiring

论文作者

Peng, Andi, Nushi, Besmira, Kiciman, Emre, Inkpen, Kori, Kamar, Ece

论文摘要

在AI辅助决策中，有效的混合动力车（Human-AI）团队合作并不仅仅取决于AI的表现，而是其对人类决策的影响。尽管先前的工作研究模型准确性对人类的影响，但我们在这里努力研究模型的预测性能和偏见如何在建议辅助的决策任务中转移到人类的复杂动力学。我们考虑了ML辅助招聘的领域，在该领域（在有限的选择设置中运行）可以选择他们是否希望利用训练有素的模型的推论来帮助从书面传记中选择候选人。我们进行了一项大规模的用户研究，利用了先前工作的重新创建的真实BIOS数据集，在该数据中，人类预测了在有和没有三个不同NLP分类器（随机，单词和深层神经网络）的带有和没有帮助的给定候选人的地面真相占用。我们的结果表明，尽管高性能模型在混合环境中显着改善了人类的性能，但一些模型减轻了混合偏见，而另一些模型则强调了混合偏见。我们通过决策一致性的角度研究了这些发现，并观察到我们的模型体系结构选择对人类AI的整合性和偏见有影响，这激发了在部署前评估这些复杂动态的明确需求。

In AI-assisted decision-making, effective hybrid (human-AI) teamwork is not solely dependent on AI performance alone, but also on its impact on human decision-making. While prior work studies the effects of model accuracy on humans, we endeavour here to investigate the complex dynamics of how both a model's predictive performance and bias may transfer to humans in a recommendation-aided decision task. We consider the domain of ML-assisted hiring, where humans -- operating in a constrained selection setting -- can choose whether they wish to utilize a trained model's inferences to help select candidates from written biographies. We conduct a large-scale user study leveraging a re-created dataset of real bios from prior work, where humans predict the ground truth occupation of given candidates with and without the help of three different NLP classifiers (random, bag-of-words, and deep neural network). Our results demonstrate that while high-performance models significantly improve human performance in a hybrid setting, some models mitigate hybrid bias while others accentuate it. We examine these findings through the lens of decision conformity and observe that our model architecture choices have an impact on human-AI conformity and bias, motivating the explicit need to assess these complex dynamics prior to deployment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题