有效的训练时间堆叠以结合深神经网络

论文标题

有效的训练时间堆叠以结合深神经网络

Effective training-time stacking for ensembling of deep neural networks

论文作者

Proscura, Polina, Zaytsev, Alexey

论文摘要

结合是改善机器学习（ML）模型的一种流行而有效的方法。它不仅在古典ML中，而且证明了其价值，而且还证明了深度学习的价值。合奏提高了ML解决方案的质量和可信度，并允许估计不确定性。但是，它们以一个代价：深度学习模型的培训集合吃了大量的计算资源。快照结合，沿着单个训练路径在合奏中收集模型。由于它仅一次进行训练，因此计算时间类似于一种模型的训练。但是，沿训练路径的模型质量是不同的：通常，如果没有过度拟合，则以后的模型更好。因此，模型具有不同的效用。我们的方法通过在训练路径上选择和加权集合成员来改善快照结合。它依赖于训练时间的可能性，而无需查看标准堆叠方法的验证样本错误。时尚MNIST，CIFAR-10和CIFAR-100数据集的实验证据证明了拟议的加权合奏的优质质量C.T.香草结合深度学习模型。

Ensembling is a popular and effective method for improving machine learning (ML) models. It proves its value not only in classical ML but also for deep learning. Ensembles enhance the quality and trustworthiness of ML solutions, and allow uncertainty estimation. However, they come at a price: training ensembles of deep learning models eat a huge amount of computational resources. A snapshot ensembling collects models in the ensemble along a single training path. As it runs training only one time, the computational time is similar to the training of one model. However, the quality of models along the training path is different: typically, later models are better if no overfitting occurs. So, the models are of varying utility. Our method improves snapshot ensembling by selecting and weighting ensemble members along the training path. It relies on training-time likelihoods without looking at validation sample errors that standard stacking methods do. Experimental evidence for Fashion MNIST, CIFAR-10, and CIFAR-100 datasets demonstrates the superior quality of the proposed weighted ensembles c.t. vanilla ensembling of deep learning models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题