论文标题
通过集团在相关阈值图上搜索潜在因素的结构学习
Structure Learning of Latent Factors via Clique Search on Correlation Thresholded Graphs
论文作者
论文摘要
尽管潜在因素分析的广泛应用,但现有方法却遭受了以下弱点的影响:需要知道的因素数量,缺乏学习模型结构的理论保证以及由于可能性的旋转不变性而引起的参数的非可见性。我们通过提出快速相关阈值(CT)算法来解决这些问题,该算法同时了解潜在因素的数量和旋转可识别的模型结构。我们的新方法将这个结构学习问题转化为在阈值相关图中搜索所谓的独立最大集团,该图可以很容易地从观察到的数据中构造出来。我们的集团分析技术可以很好地扩展到数千个变量,而竞争方法不适用于合理的运行时间。我们为我们的方法的结构学习建立了有限样本误差和高维的一致性。通过一系列的仿真研究和一个真实的数据示例,我们表明CT算法是学习因子分析模型的结构的准确方法,并且可以违反其假设。
Despite the widespread application of latent factor analysis, existing methods suffer from the following weaknesses: requiring the number of factors to be known, lack of theoretical guarantees for learning the model structure, and nonidentifiability of the parameters due to rotation invariance properties of the likelihood. We address these concerns by proposing a fast correlation thresholding (CT) algorithm that simultaneously learns the number of latent factors and a rotationally identifiable model structure. Our novel approach translates this structure learning problem into the search for so-called independent maximal cliques in a thresholded correlation graph that can be easily constructed from the observed data. Our clique analysis technique scales well up to thousands of variables, while competing methods are not applicable in a reasonable amount of running time. We establish a finite-sample error bound and high-dimensional consistency for the structure learning of our method. Through a series of simulation studies and a real data example, we show that the CT algorithm is an accurate method for learning the structure of factor analysis models and is robust to violations of its assumptions.