通过文本到图像合成中的同素亵渎来利用文化偏见

论文标题

通过文本到图像合成中的同素亵渎来利用文化偏见

Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis

论文作者

Struppek, Lukas, Hintersdorf, Dominik, Friedrich, Felix, Brack, Manuel, Schramowski, Patrick, Kersting, Kristian

论文摘要

文本到图像综合的模型，例如DALL-E〜2和稳定的扩散，最近引起了学术界和公众的极大兴趣。这些模型能够产生高质量的图像，这些图像以文本描述为条件时描绘了各种概念和样式。但是，这些模型采用了与特定的Unicode脚本相关的文化特征，从其大量的培训数据中，这可能并不明显。我们表明，通过简单地在文本描述中插入单个非拉丁字符，共同模型就反映了其生成的图像中的文化刻板印象和偏见。我们在定性和定量上分析了这种行为，并将模型的文本编码识别为现象的根本原因。此外，恶意用户或服务提供商可能试图故意偏向图像生成，以创建种族主义刻板印象，通过用非拉丁语脚本（所谓的同符文字）替换拉丁字符，以替代拉丁字符。为了减轻这种未引起的脚本攻击，我们提出了一种新颖的同符盲文方法来微调文本编码器，从而使其可抵抗同质的操纵。

Models for text-to-image synthesis, such as DALL-E~2 and Stable Diffusion, have recently drawn a lot of interest from academia and the general public. These models are capable of producing high-quality images that depict a variety of concepts and styles when conditioned on textual descriptions. However, these models adopt cultural characteristics associated with specific Unicode scripts from their vast amount of training data, which may not be immediately apparent. We show that by simply inserting single non-Latin characters in a textual description, common models reflect cultural stereotypes and biases in their generated images. We analyze this behavior both qualitatively and quantitatively, and identify a model's text encoder as the root cause of the phenomenon. Additionally, malicious users or service providers may try to intentionally bias the image generation to create racist stereotypes by replacing Latin characters with similarly-looking characters from non-Latin scripts, so-called homoglyphs. To mitigate such unnoticed script attacks, we propose a novel homoglyph unlearning method to fine-tune a text encoder, making it robust against homoglyph manipulations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题