与对比信号的数据集凝结

论文标题

与对比信号的数据集凝结

Dataset Condensation with Contrastive Signals

论文作者

Lee, Saehyung, Chun, Sanghyuk, Jung, Sangwon, Yun, Sangdoo, Yoon, Sungroh

论文摘要

最近的研究表明，基于梯度匹配的数据集综合或数据集凝结（DC），当应用于数据效率高效的学习任务时，方法可以实现最先进的性能。但是，在这项研究中，我们证明，当任务 - IRRELELERELERVANT信息构成培训数据集的重要组成部分时，现有的DC方法比随机选择方法更糟。我们将其归因于缺乏与课堂梯度匹配策略所产生的类对比信号的参与。为了解决此问题，我们通过修改损耗函数以使DC方法有效地捕获类之间的差异来提出与对比信号（DCC）的数据集凝结。此外，我们通过跟踪内核速度来分析训练动力学的新损失函数。此外，我们引入了双层热身策略，以稳定优化。我们的实验结果表明，尽管现有方法对细粒度的图像分类任务无效，但所提出的方法可以成功地为相同任务生成信息合成数据集。此外，我们证明所提出的方法甚至在SVHN，CIFAR-10和CIFAR-100等基准数据集上也优于基准。最后，我们通过将其应用于持续学习任务来证明该方法的高度适用性。

Recent studies have demonstrated that gradient matching-based dataset synthesis, or dataset condensation (DC), methods can achieve state-of-the-art performance when applied to data-efficient learning tasks. However, in this study, we prove that the existing DC methods can perform worse than the random selection method when task-irrelevant information forms a significant part of the training dataset. We attribute this to the lack of participation of the contrastive signals between the classes resulting from the class-wise gradient matching strategy. To address this problem, we propose Dataset Condensation with Contrastive signals (DCC) by modifying the loss function to enable the DC methods to effectively capture the differences between classes. In addition, we analyze the new loss function in terms of training dynamics by tracking the kernel velocity. Furthermore, we introduce a bi-level warm-up strategy to stabilize the optimization. Our experimental results indicate that while the existing methods are ineffective for fine-grained image classification tasks, the proposed method can successfully generate informative synthetic datasets for the same tasks. Moreover, we demonstrate that the proposed method outperforms the baselines even on benchmark datasets such as SVHN, CIFAR-10, and CIFAR-100. Finally, we demonstrate the high applicability of the proposed method by applying it to continual learning tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题