SKDCGN：使用CGAN的反事实生成网络的无源知识蒸馏

论文标题

SKDCGN：使用CGAN的反事实生成网络的无源知识蒸馏

SKDCGN: Source-free Knowledge Distillation of Counterfactual Generative Networks using cGANs

论文作者

Ambekar, Sameer, Tafuro, Matteo, Ankit, Ankit, van der Mast, Diego, Alence, Mark, Athanasiadis, Christos

论文摘要

使用适当的归纳偏见，反事实生成网络（CGN）可以从形状，纹理和背景歧管的随机组合中生成新的图像。这些图像可以用于训练不变的分类器，避免了深层建筑的广泛差异问题，而不是有意义的相关性。因此，改善了域外的鲁棒性。但是，CGN体系结构包括多个参数化网络，即BigGan和U2-NET。培训这些网络需要适当的背景知识和广泛的计算。由于一个人并不总是能够访问精确的培训细节，也不总是拥有反事实的必要知识，因此我们的工作解决了以下问题：我们可以使用预先训练的CGN中所嵌入的知识来训练较低的容量模型，假设Black-box访问（即只有对贴心的CGN型号访问型号）的知识？在这个方向上，我们提出了一项名为SKDCGN的新颖作品，该作品尝试使用知识蒸馏（KD）尝试知识转移。在我们提出的架构中，每个独立的机制（形状，纹理，背景）都由学生“ tinygan”表示，该学生从验证的老师“ Biggan”中学习。我们通过使用KD和适当的损失函数来证明使用最新数据集（例如ImageNet）和MNIST的效力。此外，作为额外的贡献，我们的论文对CGN的组成机制进行了详尽的研究，以更好地了解每种机制如何影响不变分类器的分类精度。代码可用：https：//github.com/ambekarsameer96/skdcgn

With the usage of appropriate inductive biases, Counterfactual Generative Networks (CGNs) can generate novel images from random combinations of shape, texture, and background manifolds. These images can be utilized to train an invariant classifier, avoiding the wide spread problem of deep architectures learning spurious correlations rather than meaningful ones. As a consequence, out-of-domain robustness is improved. However, the CGN architecture comprises multiple over parameterized networks, namely BigGAN and U2-Net. Training these networks requires appropriate background knowledge and extensive computation. Since one does not always have access to the precise training details, nor do they always possess the necessary knowledge of counterfactuals, our work addresses the following question: Can we use the knowledge embedded in pre-trained CGNs to train a lower-capacity model, assuming black-box access (i.e., only access to the pretrained CGN model) to the components of the architecture? In this direction, we propose a novel work named SKDCGN that attempts knowledge transfer using Knowledge Distillation (KD). In our proposed architecture, each independent mechanism (shape, texture, background) is represented by a student 'TinyGAN' that learns from the pretrained teacher 'BigGAN'. We demonstrate the efficacy of the proposed method using state-of-the-art datasets such as ImageNet, and MNIST by using KD and appropriate loss functions. Moreover, as an additional contribution, our paper conducts a thorough study on the composition mechanism of the CGNs, to gain a better understanding of how each mechanism influences the classification accuracy of an invariant classifier. Code available at: https://github.com/ambekarsameer96/SKDCGN

下载PDF全文

下载文献需遵守相关版权规定

论文标题