论文标题

几何结构指导模型和算法,用于完全反向基因表达数据

Geometric structure guided model and algorithms for complete deconvolution of gene expression data

论文作者

Chen, Duan, Li, Shaoyu, Wang, Xue

论文摘要

大量RNASEQ数据的完全反卷积分析很重要,有助于区分患者组织中与疾病相关的GEP(基因表达谱)和正常对照组的差异是由于组织样品的细胞组成的变化,还是由于GEPS的特定细胞变化而导致的。执行完整反卷积的主要技术之一是非负矩阵分解(NMF),它在机器学习社区中还具有广泛的应用。但是,NMF是一个众所周知的强烈不适的问题,因此,NMF在RNASEQ数据中的直接应用将在解决方案的解释性中遇到严重的困难。在本文中,我们开发了一个基于NMF的数学模型和相应的计算算法,以提高反价值批量RNASEQ数据的解决方案可识别性。在我们的方法中,我们将标记基因的生物学概念与NMF理论的溶解性条件相结合,并开发出几何结构化指导优化模型。在此策略中,首先通过光谱聚类技术探索了大块组织数据的几何结构。然后,将标记基因的确定信息集成为解决性约束,而整体相关图则用作歧管正则化。合成数据和生物学数据均用于验证所提出的模型和算法,从中可解释性和准确性得到显着提高。

Complete deconvolution analysis for bulk RNAseq data is important and helpful to distinguish whether the difference of disease-associated GEPs (gene expression profiles) in tissues of patients and normal controls are due to changes in cellular composition of tissue samples, or due to GEPs changes in specific cells. One of the major techniques to perform complete deconvolution is nonnegative matrix factorization (NMF), which also has a wide-range of applications in the machine learning community. However, the NMF is a well-known strongly ill-posed problem, so a direct application of NMF to RNAseq data will suffer severe difficulties in the interpretability of solutions. In this paper we develop an NMF-based mathematical model and corresponding computational algorithms to improve the solution identifiability of deconvoluting bulk RNAseq data. In our approach, we combine the biological concept of marker genes with the solvability conditions of the NMF theories, and develop a geometric structured guided optimization model. In this strategy, the geometric structure of bulk tissue data is first explored by the spectral clustering technique. Then, the identified information of marker genes is integrated as solvability constraints, while the overall correlation graph is used as manifold regularization. Both synthetic and biological data are used to validate the proposed model and algorithms, from which solution interpretability and accuracy are significantly improved.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源