论文标题

通过稀疏的热带基质分解数据嵌入和预测

Data embedding and prediction by sparse tropical matrix factorization

论文作者

Omanović, Amra, Kazan, Hilal, Oblak, Polona, Curk, Tomaž

论文摘要

矩阵分解方法是线性模型,对建模复杂关系的能力有限。在我们的工作中,我们使用热带半段将非线性引入矩阵分解模型。我们提出了一种称为稀疏热带基质分解(STMF)的方法,以估计缺失(未知)值。我们以从癌症基因组图集(TCGA)数据库下载的基因表达测量的形式评估STMF方法对合成数据和生物学数据的效率。对唯一合成数据的测试表明,STMF近似比非阴性矩阵分解(NMF)具有更高的相关性,该矩阵分解(NMF)无法有效地恢复模式。在实际数据上,STMF在九个基因表达数据集中的六个上胜过NMF。尽管NMF假定正态分布并趋于平均值,但STMF可以更好地适合极值和分布。 STMF是第一项在稀疏数据上使用热带半段的工作。我们表明,在某些情况下,半度性是有用的,因为它们认为结构与标准线性代数相比,这种结构不同,更简单地理解。

Matrix factorization methods are linear models, with limited capability to model complex relations. In our work, we use tropical semiring to introduce non-linearity into matrix factorization models. We propose a method called Sparse Tropical Matrix Factorization (STMF) for the estimation of missing (unknown) values. We evaluate the efficiency of the STMF method on both synthetic data and biological data in the form of gene expression measurements downloaded from The Cancer Genome Atlas (TCGA) database. Tests on unique synthetic data showed that STMF approximation achieves a higher correlation than non-negative matrix factorization (NMF), which is unable to recover patterns effectively. On real data, STMF outperforms NMF on six out of nine gene expression datasets. While NMF assumes normal distribution and tends toward the mean value, STMF can better fit to extreme values and distributions. STMF is the first work that uses tropical semiring on sparse data. We show that in certain cases semirings are useful because they consider the structure, which is different and simpler to understand than it is with standard linear algebra.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源