论文标题

表征感知实验:A/B测试和警报的小组不平等分析

Representation-Aware Experimentation: Group Inequality Analysis for A/B Testing and Alerting

论文作者

Friedberg, Rina, Ambler, Stuart, Saint-Jacques, Guillaume

论文摘要

随着公司采用越来越多的实验驱动文化,开发方法来理解这些实验的任何潜在意外后果至关重要。我们可能对这些后果有具体的问题(变化是否增加或减少了内容创建者之间的性别表示平等?);我们可能会怀疑我们是否尚未考虑正确的问题(也就是说,我们不知道我们不知道的问题)。因此,我们从两个角度解决了实验中意想不到的后果的问题:即,预先指定的与数据驱动的选择,是感兴趣的维度。对于指定的维度,我们引入了一个统计量,以测量与同等表示(DER统计量)的偏差,给出其渐近分布并评估有限样本的性能。我们解释了如何使用此统计数据跨越大规模的实验系统搜索,以提醒我们对小组代表的任何极端意外后果。我们通过讨论沿着因果树的一组维度来搜索对异构治疗效果的搜索来补充这种方法,并在我们的生态系统中进行了稍作修改,并在这里用作DER统计量警报标记的实验的一种方式。我们介绍了一种模拟数据的方法,该数据在LinkedIn上密切模拟了观察到的数据,并评估模拟中DER统计的性能。最后,我们给出了LinkedIn的案例研究,并展示了这些方法如何授权我们发现有关群体代表的令人惊讶和重要的见解。附录中可用复制代码。

As companies adopt increasingly experimentation-driven cultures, it is crucial to develop methods for understanding any potential unintended consequences of those experiments. We might have specific questions about those consequences (did a change increase or decrease gender representation equality among content creators?); we might also wonder whether if we have not yet considered the right question (that is, we don't know what we don't know). Hence we address the problem of unintended consequences in experimentation from two perspectives: namely, pre-specified vs. data-driven selection, of dimensions of interest. For a specified dimension, we introduce a statistic to measure deviation from equal representation (DER statistic), give its asymptotic distribution, and evaluate finite-sample performance. We explain how to use this statistic to search across large-scale experimentation systems to alert us to any extreme unintended consequences on group representation. We complement this methodology by discussing a search for heterogeneous treatment effects along a set of dimensions with causal trees, modified slightly for practicalities in our ecosystem, and used here as a way to dive deeper into experiments flagged by the DER statistic alerts. We introduce a method for simulating data that closely mimics observed data at LinkedIn, and evaluate the performance of DER statistics in simulations. Last, we give a case study from LinkedIn, and show how these methodologies empowered us to discover surprising and important insights about group representation. Code for replication is available in an appendix.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源