可传递性指标评估的稳定程度如何？

论文标题

可传递性指标评估的稳定程度如何？

How stable are Transferability Metrics evaluations?

论文作者

Agostinelli, Andrea, Pándy, Michal, Uijlings, Jasper, Mensink, Thomas, Ferrari, Vittorio

论文摘要

可传输性指标是一个越来越多的兴趣的成熟领域，旨在提供启发式方法，以选择最合适的源模型将其转移到给定目标数据集，而无需对其进行微调。但是，现有的作品依赖于各种论文不同的自定义实验设置，从而得出了关于哪些可传输性指标最有效的结论。在本文中，我们通过系统地构建广泛的715K实验设置变化进行了大规模研究。我们发现，即使是对实验设置的小变化也会导致关于可转移性度量优于另一个的优势的不同结论。然后，我们通过在许多实验中汇总进行更好的评估，从而得出更稳定的结论。结果，我们在选择良好的源数据集中揭示了LogMe在语义分割方案中转移的优势，NLEEP在图像分类方案中选择良好的源体系结构以及GBC在确定哪些目标任务从给定的源模型中受益最大。但是，在所有情况下，没有单一的可传输性公制功能最佳。

Transferability metrics is a maturing field with increasing interest, which aims at providing heuristics for selecting the most suitable source models to transfer to a given target dataset, without fine-tuning them all. However, existing works rely on custom experimental setups which differ across papers, leading to inconsistent conclusions about which transferability metrics work best. In this paper we conduct a large-scale study by systematically constructing a broad range of 715k experimental setup variations. We discover that even small variations to an experimental setup lead to different conclusions about the superiority of a transferability metric over another. Then we propose better evaluations by aggregating across many experiments, enabling to reach more stable conclusions. As a result, we reveal the superiority of LogME at selecting good source datasets to transfer from in a semantic segmentation scenario, NLEEP at selecting good source architectures in an image classification scenario, and GBC at determining which target task benefits most from a given source model. Yet, no single transferability metric works best in all scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题