论文标题

一种通用的内核机方法,用于识别多视图数据集中的高阶复合效应

A generalized kernel machine approach to identify higher-order composite effects in multi-view datasets

论文作者

Alam, Md Ashad, Qiu, Chuan, Shen, Hui, Wang, Yu-Ping, Deng, Hong-Wen

论文摘要

近年来,一项对多视图数据集(例如多词和成像扫描)的全面研究一直是生物医学研究的重点和最前沿。最先进的生物医学技术使我们能够收集多视图生物医学数据集,以研究复杂疾病。尽管数据的所有观点倾向于探索疾病的互补信息,但具有复杂相互作用的多视图数据分析对于对生物系统的更深入而整体的理解具有挑战性。在本文中,我们提出了一种新型的广义内核方法,以识别多视图生物医学数据集中的高阶复合效应。这种广义的半参数(一种混合效应线性模型)方法包括来自不同数据视图的特征的边际和联合Hadamard产品。提出的内核机方法将多视图数据视为预测变量,以使复杂性状更彻底,更全面的建模。所提出的方法可以应用于可用多视图数据集的任何疾病模型的研究。我们将方法应用于青春期大脑发育和骨质疏松研究的合成数据集和实际多视图数据集,包括成像扫描数据集和五个OMICS数据集。我们的实验表明,所提出的方法可以有效地识别高阶综合效应,并表明相应的特征(基因,兴趣区域和化学分类法)在一致的努力中起作用。我们表明,所提出的方法比现有方法更普遍。

In recent years, a comprehensive study of multi-view datasets (e.g., multi-omics and imaging scans) has been a focus and forefront in biomedical research. State-of-the-art biomedical technologies are enabling us to collect multi-view biomedical datasets for the study of complex diseases. While all the views of data tend to explore complementary information of a disease, multi-view data analysis with complex interactions is challenging for a deeper and holistic understanding of biological systems. In this paper, we propose a novel generalized kernel machine approach to identify higher-order composite effects in multi-view biomedical datasets. This generalized semi-parametric (a mixed-effect linear model) approach includes the marginal and joint Hadamard product of features from different views of data. The proposed kernel machine approach considers multi-view data as predictor variables to allow more thorough and comprehensive modeling of a complex trait. The proposed method can be applied to the study of any disease model, where multi-view datasets are available. We applied our approach to both synthesized datasets and real multi-view datasets from adolescence brain development and osteoporosis study, including an imaging scan dataset and five omics datasets. Our experiments demonstrate that the proposed method can effectively identify higher-order composite effects and suggest that corresponding features (genes, region of interests, and chemical taxonomies) function in a concerted effort. We show that the proposed method is more generalizable than existing ones.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源