通过程序数据调整偏差

论文标题

通过程序数据调整偏差

Adjusting for Bias with Procedural Data

论文作者

Gupta, Shesh Narayan, Brown, Nicholas Bear

论文摘要

3D软件现在能够产生与真实图像几乎没有区别的高度逼真的图像。这就提出了一个问题：可以通过3D渲染数据来增强实际数据集吗？我们调查了这个问题。在本文中，我们证明了3D渲染的数据，程序，数据用于调整图像数据集中偏差的使用。我们对动物的图像进行错误分析，这表明某些动物品种的错误分类在很大程度上是一个数据问题。然后，我们创建分类不佳的品种的程序图像，该模型进一步培训了程序数据，可以更好地对实际数据进行分类不佳的品种。我们认为，这种方法可用于增强任何代表性不足的群体，包括罕见疾病，或任何可能提高模型准确性和公平性的数据偏见。我们发现，由此产生的表示形式与直接从真实数据中学到的直接学习的表示形式相抗衡，但是良好的性能需要在3D渲染的程序数据生成中进行护理。 3D图像数据集可以看作是真实数据集的压缩和有组织的副本，我们设想了一个未来，其中越来越多的程序数据扩散，而数据集变得越来越笨拙，丢失或私有。本文提出了几种在如此未来的视觉表示学习的技术。

3D softwares are now capable of producing highly realistic images that look nearly indistinguishable from the real images. This raises the question: can real datasets be enhanced with 3D rendered data? We investigate this question. In this paper we demonstrate the use of 3D rendered data, procedural, data for the adjustment of bias in image datasets. We perform error analysis of images of animals which shows that the misclassification of some animal breeds is largely a data issue. We then create procedural images of the poorly classified breeds and that model further trained on procedural data can better classify poorly performing breeds on real data. We believe that this approach can be used for the enhancement of visual data for any underrepresented group, including rare diseases, or any data bias potentially improving the accuracy and fairness of models. We find that the resulting representations rival or even out-perform those learned directly from real data, but that good performance requires care in the 3D rendered procedural data generation. 3D image dataset can be viewed as a compressed and organized copy of a real dataset, and we envision a future where more and more procedural data proliferate while datasets become increasingly unwieldy, missing, or private. This paper suggests several techniques for dealing with visual representation learning in such a future.

下载PDF全文

下载文献需遵守相关版权规定

论文标题