论文标题

DC基础:数据集凝结基准测试

DC-BENCH: Dataset Condensation Benchmark

论文作者

Cui, Justin, Wang, Ruochen, Si, Si, Hsieh, Cho-Jui

论文摘要

数据集凝结是一种新兴的技术,旨在学习一个微小的数据集,该数据集捕获原始数据集中编码的丰富信息。随着数据集的大小当代机器学习模型的依赖变得越来越大,凝结方法成为加速网络培训和减少数据存储的重要方向。尽管在这个快速增长的领域中提出了许多方法,但评估和比较不同的冷凝方法是非平凡的,仍然是一个空旷的问题。凝结数据集的质量通常会受到许多关键的影响最终性能的关键因素,例如数据增强和模型架构。缺乏评估和比较冷凝方法的系统方法不仅阻碍了我们对现有技术的理解,而且会阻止合成数据集的实际用法。这项工作提供了数据集冷凝的第一个大规模标准化基准。它由一套评估组成,可以全面地通过其生成的数据集的镜头来全面反映冷凝方法的生成性和有效性。利用这一基准,我们对当前的冷凝方法进行了大规模研究,并报告了许多有见地的发现,这些发现为未来发展开辟了新的可能性。开源的基准库,包括评估人员,基线方法和生成的数据集,以促进未来的研究和应用。

Dataset Condensation is a newly emerging technique aiming at learning a tiny dataset that captures the rich information encoded in the original dataset. As the size of datasets contemporary machine learning models rely on becomes increasingly large, condensation methods become a prominent direction for accelerating network training and reducing data storage. Despite numerous methods have been proposed in this rapidly growing field, evaluating and comparing different condensation methods is non-trivial and still remains an open issue. The quality of condensed dataset are often shadowed by many critical contributing factors to the end performance, such as data augmentation and model architectures. The lack of a systematic way to evaluate and compare condensation methods not only hinders our understanding of existing techniques, but also discourages practical usage of the synthesized datasets. This work provides the first large-scale standardized benchmark on Dataset Condensation. It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods through the lens of their generated dataset. Leveraging this benchmark, we conduct a large-scale study of current condensation methods, and report many insightful findings that open up new possibilities for future development. The benchmark library, including evaluators, baseline methods, and generated datasets, is open-sourced to facilitate future research and application.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源