论文标题

线性组件的恢复:降低复杂性自动编码器设计

Recovery of Linear Components: Reduced Complexity Autoencoder Designs

论文作者

Zocco, Federico, McLoone, Seán

论文摘要

降低维度是许多数据分析应用程序中的关键预处理步骤,以解决维数和遵循性诅咒对模型性能和计算复杂性的负面影响,以降低数据或减少存储要求。此外,在许多应用程序中,希望通过选择最能代表整个集合的变量的子集,而无需任何先验信息来减少输入维度。无监督的变量选择技术为第二个问题提供了解决方案。自动编码器,如果适当正规化,可以解决无监督的维度降低和可变选择,但是大型神经网络的训练在时间敏感应用中可能会过时。我们提出了一种称为线性组件(RLC)的方法,该方法是线性和非线性维度降低技术之间的中间立场,减少了自动编码器训练时间,同时增强了纯线性线性技术的性能。在合成和现实世界的案例研究的帮助下,我们表明,与相似复杂性的自动编码器相比,RLC显示出更高的精度,相似的鲁棒性与过度拟合度以及更快的训练时间。此外,以相对较小的计算复杂性增加,RLC被证明超过了半导体制造晶片制造晶片测量点优化应用的当前最新设备。

Reducing dimensionality is a key preprocessing step in many data analysis applications to address the negative effects of the curse of dimensionality and collinearity on model performance and computational complexity, to denoise the data or to reduce storage requirements. Moreover, in many applications it is desirable to reduce the input dimensions by choosing a subset of variables that best represents the entire set without any a priori information available. Unsupervised variable selection techniques provide a solution to this second problem. An autoencoder, if properly regularized, can solve both unsupervised dimensionality reduction and variable selection, but the training of large neural networks can be prohibitive in time sensitive applications. We present an approach called Recovery of Linear Components (RLC), which serves as a middle ground between linear and non-linear dimensionality reduction techniques, reducing autoencoder training times while enhancing performance over purely linear techniques. With the aid of synthetic and real world case studies, we show that the RLC, when compared with an autoencoder of similar complexity, shows higher accuracy, similar robustness to overfitting, and faster training times. Additionally, at the cost of a relatively small increase in computational complexity, RLC is shown to outperform the current state-of-the-art for a semiconductor manufacturing wafer measurement site optimization application.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源