论文标题
破折号:通过用户驱动的合成数据增强来对图像分类进行偏见的视觉分析
DASH: Visual Analytics for Debiasing Image Classification via User-Driven Synthetic Data Augmentation
论文作者
论文摘要
图像分类模型通常会学会根据输入特征与训练数据中输出类之间的无关共发生进行预测类。我们称不需要的相关性为“数据偏见”,并导致数据偏见的视觉特征“偏见因素”。在没有人类干预的情况下自动识别和减轻偏见是一项挑战。因此,我们进行了一项设计研究,以找到一个人类的环解决方案。首先,我们确定了用三个专家捕获图像分类模型的偏差缓解过程的用户任务。然后,为了支持任务,我们开发了一个名为DASH的视觉分析系统,该系统允许用户使用最先进的图像到图像到图像转换模型迭代地识别偏见因素,以迭代生成合成图像,并监督改善分类精度的模型培训过程。我们对十名参与者的定量评估和定性研究证明了破折号的实用性,并为将来的工作提供了教训。
Image classification models often learn to predict a class based on irrelevant co-occurrences between input features and an output class in training data. We call the unwanted correlations "data biases," and the visual features causing data biases "bias factors." It is challenging to identify and mitigate biases automatically without human intervention. Therefore, we conducted a design study to find a human-in-the-loop solution. First, we identified user tasks that capture the bias mitigation process for image classification models with three experts. Then, to support the tasks, we developed a visual analytics system called DASH that allows users to visually identify bias factors, to iteratively generate synthetic images using a state-of-the-art image-to-image translation model, and to supervise the model training process for improving the classification accuracy. Our quantitative evaluation and qualitative study with ten participants demonstrate the usefulness of DASH and provide lessons for future work.