彩票池：通过在不增加培训或推理成本的情况下插入门票赢得更多

论文标题

彩票池：通过在不增加培训或推理成本的情况下插入门票赢得更多

Lottery Pools: Winning More by Interpolating Tickets without Increasing Training or Inference Cost

论文作者

Yin, Lu, Liu, Shiwei, Fang, Meng, Huang, Tianjin, Menkovski, Vlado, Pechenizkiy, Mykola

论文摘要

彩票（LTS）能够发现可以隔离训练以匹配密集网络的性能的准确而稀疏的子网。合奏并行，是机器学习中最古老的预期技巧之一，可以通过结合多个独立模型的输出来提高性能。但是，在LTS背景下的合奏的好处将被稀释，因为合奏并没有直接导致更稀疏的子网，而是利用他们的预测做出更好的决定。在这项工作中，我们首先观察到，直接平均相邻学习的子网的权重可以显着提高LT的性能。在这一观察结果的鼓励下，我们进一步提出了另一种方法，通过简单的插值策略通过迭代幅度修剪来识别的子网执行“合奏”。我们称我们的方法彩票池。与幼稚的合奏相比，每一个子网都不会带来性能，彩票池比原始LTS产生的稀疏子网络要强得多，而无需任何额外的培训或推理成本。在CIFAR-10/100和Imagenet上的各种现代体系结构中，我们表明我们的方法在分布和分发场景中都可以取得显着的性能。令人印象深刻的是，用VGG-16和RESNET-18进行评估，生产的子网稀疏的子网在CIFAR-100上优于原始LTS，在CIFAR-100-C上高达1.88％；所得密度的网络超过了CIFAR-100的预训练密集模型，在CIFAR-100-C上超过2.22％。

Lottery tickets (LTs) is able to discover accurate and sparse subnetworks that could be trained in isolation to match the performance of dense networks. Ensemble, in parallel, is one of the oldest time-proven tricks in machine learning to improve performance by combining the output of multiple independent models. However, the benefits of ensemble in the context of LTs will be diluted since ensemble does not directly lead to stronger sparse subnetworks, but leverages their predictions for a better decision. In this work, we first observe that directly averaging the weights of the adjacent learned subnetworks significantly boosts the performance of LTs. Encouraged by this observation, we further propose an alternative way to perform an 'ensemble' over the subnetworks identified by iterative magnitude pruning via a simple interpolating strategy. We call our method Lottery Pools. In contrast to the naive ensemble which brings no performance gains to each single subnetwork, Lottery Pools yields much stronger sparse subnetworks than the original LTs without requiring any extra training or inference cost. Across various modern architectures on CIFAR-10/100 and ImageNet, we show that our method achieves significant performance gains in both, in-distribution and out-of-distribution scenarios. Impressively, evaluated with VGG-16 and ResNet-18, the produced sparse subnetworks outperform the original LTs by up to 1.88% on CIFAR-100 and 2.36% on CIFAR-100-C; the resulting dense network surpasses the pre-trained dense-model up to 2.22% on CIFAR-100 and 2.38% on CIFAR-100-C.

下载PDF全文

下载文献需遵守相关版权规定

论文标题