论文标题
预防胜于治疗:处理密集网络中的基础崩溃和透明度
Prevention is Better than Cure: Handling Basis Collapse and Transparency in Dense Networks
论文作者
论文摘要
密集网是任何分类和回归问题的组成部分。最近,这些网络发现了一个新的应用程序作为各个域中已知表示形式的求解器。但是,密集网的一个至关重要的问题是它的特征解释和多次训练的可重复性。在这项工作中,我们将基础崩溃问题确定为主要原因,并提出一个修改后的损失函数,以绕过此问题。我们还提供了一些一般指南,将激活的选择与损失表面粗糙度和适当的缩放尺度设计有关,以设计低重量密集的网。我们通过精心选择的数值实验证明,基础崩溃问题导致了大量冗余网络的设计。我们的方法可导致基本简洁的网,$ 100 \ times $ $较少的参数,而$ $(10 \ times)$ MSE的损失比以前的工作中所报告的要低得多。此外,我们表明,密集网的宽度急剧取决于特征复杂性。这与先前理论工作中报道的依赖性宽度选择相反。据我们所知,这是这些问题和矛盾第一次进行了报道和实验验证。通过我们的设计指南,我们根据低重量网络设计的透明度。我们在https://github.com/smjtgupta/dense_net_regress上共享我们的代码,以获得完整的可重复性。
Dense nets are an integral part of any classification and regression problem. Recently, these networks have found a new application as solvers for known representations in various domains. However, one crucial issue with dense nets is it's feature interpretation and lack of reproducibility over multiple training runs. In this work, we identify a basis collapse issue as a primary cause and propose a modified loss function that circumvents this problem. We also provide a few general guidelines relating the choice of activations to loss surface roughness and appropriate scaling for designing low-weight dense nets. We demonstrate through carefully chosen numerical experiments that the basis collapse issue leads to the design of massively redundant networks. Our approach results in substantially concise nets, having $100 \times$ fewer parameters, while achieving a much lower $(10\times)$ MSE loss at scale than reported in prior works. Further, we show that the width of a dense net is acutely dependent on the feature complexity. This is in contrast to the dimension dependent width choice reported in prior theoretical works. To the best of our knowledge, this is the first time these issues and contradictions have been reported and experimentally verified. With our design guidelines we render transparency in terms of a low-weight network design. We share our codes for full reproducibility available at https://github.com/smjtgupta/Dense_Net_Regress.