论文标题
相对计数数据作为收缩问题的功率转换
Power Transformations of Relative Count Data as a Shrinkage Problem
论文作者
论文摘要
在这里,我们展示了我们最近提出的信息几何方法在组成数据分析(CODA)中的应用。该应用程序涉及相对计数数据,例如,从测序实验获得。首先,我们详细介绍了各种必要的概念,包括基本计数分布及其信息几何描述,对贝叶斯统计和收缩之间的联系与尾电中的功率转换之间的使用。然后,我们证明了动力,即与单纯形上的标量乘法等效的功率可以理解为单纯形的切线空间上的收缩问题。在信息几何术语中,传统收缩对应于沿混合物(或M-)测量的优化,同时可以沿指数(或e-)地球测量值优化动力(或者,我们称为指数收缩)。尽管M-Geodesic使用共轭先验对应于多项式计数的后均值,但E-GEODESIC对应于后验的替代参数化,其中先验和数据贡献是通过几何而不是算术均值加权的。为了优化指数收缩参数,我们使用于点误差作为切线空间上的成本函数。这只是与真实参数的预期平方Aitchison距离。我们根据三角洲方法得出了最低限度的分析解决方案,并通过仿真进行测试。我们还讨论了指数收缩,以替代零归为降低和数据归一化的替代方法。
Here we show an application of our recently proposed information-geometric approach to compositional data analysis (CoDA). This application regards relative count data, which are, e.g., obtained from sequencing experiments. First we review in some detail a variety of necessary concepts ranging from basic count distributions and their information-geometric description over the link between Bayesian statistics and shrinkage to the use of power transformations in CoDA. We then show that powering, i.e., the equivalent to scalar multiplication on the simplex, can be understood as a shrinkage problem on the tangent space of the simplex. In information-geometric terms, traditional shrinkage corresponds to an optimization along a mixture (or m-) geodesic, while powering (or, as we call it, exponential shrinkage) can be optimized along an exponential (or e-) geodesic. While the m-geodesic corresponds to the posterior mean of the multinomial counts using a conjugate prior, the e-geodesic corresponds to an alternative parametrization of the posterior where prior and data contributions are weighted by geometric rather than arithmetic means. To optimize the exponential shrinkage parameter, we use mean-squared error as a cost function on the tangent space. This is just the expected squared Aitchison distance from the true parameter. We derive an analytic solution for its minimum based on the delta method and test it via simulations. We also discuss exponential shrinkage as an alternative to zero imputation for dimension reduction and data normalization.