论文标题

从有偏见的数据集中选择的多特征主题选择

Multi-characteristic Subject Selection from Biased Datasets

论文作者

Arabghalizi, Tahereh, Labrinidis, Alexandros

论文摘要

主题选择在实验研究中起着至关重要的作用,尤其是具有人类受试者的研究。轶事证据表明,在大学校园环境或附近进行的许多此类研究都遭受了选择偏见的困扰,即过于卑鄙的kids-kids-as us-taugnatss问题。不幸的是,传统的抽样技术在对偏见的数据上应用时,通常会返回有偏见的结果。在本文中,我们解决了从偏见数据集中选择多个特定主题的问题。我们提出了一种基于限制的优化方法,该方法基于运行主题选择的研究人员提供的所需采样部分,找到了不同人群亚组的最佳抽样部分。我们使用各种真实数据集进行了广泛的实验研究。我们的结果表明,我们提出的方法的表现使所有问题变化的基准都高达90%。

Subject selection plays a critical role in experimental studies, especially ones with human subjects. Anecdotal evidence suggests that many such studies, done at or near university campus settings suffer from selection bias, i.e., the too-many-college-kids-as-subjects problem. Unfortunately, traditional sampling techniques, when applied over biased data, will typically return biased results. In this paper, we tackle the problem of multi-characteristic subject selection from biased datasets. We present a constrained optimization-based method that finds the best possible sampling fractions for the different population subgroups, based on the desired sampling fractions provided by the researcher running the subject selection.We perform an extensive experimental study, using a variety of real datasets. Our results show that our proposed method outperforms the baselines for all problem variations by up to 90%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源