论文标题
在自动编码器的信息平面上
On the Information Plane of Autoencoders
论文作者
论文摘要
从理论上讲,深度学习中隐藏层的训练动力学知之甚少。最近,提出了信息平面(IP)来分析它们,这是基于共同信息的信息理论概念(MI)。信息瓶颈(IB)理论预测,将最大程度地提高相关信息并压缩无关的信息。由于样本中MI估算的局限性,有关监督学习案例IP的属性的持续辩论。在这项工作中,我们为自动编码器的IP提供了理论收敛。该理论预测,具有较大瓶颈层尺寸的理想自动编码器不会压缩输入信息,而小尺寸仅在编码器层中引起压缩。对于实验,我们使用了最近在文献中提出的基于革兰氏阴性的MI估计量。我们提出了一个新规则,以调整其参数,以补偿规模和维度效应。使用我们提出的规则,我们获得了更接近理论的实验IPS。我们的自动编码器的理论IP可以用作验证新方法来估计神经网络中MI的基准。这样,可以识别和纠正实验限制,从而有助于就监督学习案例进行持续的辩论。
The training dynamics of hidden layers in deep learning are poorly understood in theory. Recently, the Information Plane (IP) was proposed to analyze them, which is based on the information-theoretic concept of mutual information (MI). The Information Bottleneck (IB) theory predicts that layers maximize relevant information and compress irrelevant information. Due to the limitations in MI estimation from samples, there is an ongoing debate about the properties of the IP for the supervised learning case. In this work, we derive a theoretical convergence for the IP of autoencoders. The theory predicts that ideal autoencoders with a large bottleneck layer size do not compress input information, whereas a small size causes compression only in the encoder layers. For the experiments, we use a Gram-matrix based MI estimator recently proposed in the literature. We propose a new rule to adjust its parameters that compensates scale and dimensionality effects. Using our proposed rule, we obtain experimental IPs closer to the theory. Our theoretical IP for autoencoders could be used as a benchmark to validate new methods to estimate MI in neural networks. In this way, experimental limitations could be recognized and corrected, helping with the ongoing debate on the supervised learning case.