论文标题
ODBAE:高性能模型,识别高维生物学数据集中复杂表型
ODBAE: a high-performance model identifying complex phenotypes in high-dimensional biological datasets
论文作者
论文摘要
由于不同的生理指标之间的复杂相互依赖性,从高维生物学数据中识别出复杂的表型是有挑战性的。传统方法通常着重于检测单个变量中的离群值,忽略了有助于表型出现的更广泛的交互网络。在这里,我们介绍了ODBAE(使用平衡自动编码器的离群值检测),这是一种机器学习方法,旨在通过捕获多个生理参数之间的潜在关系来揭示微妙和极端异常值。 ODBAE修订后的损失函数增强了其检测两种关键异常值类型的能力:有影响力的点(IP),它们破坏了尺寸和高杠杆点(HLP)之间的潜在相关性(HLP),它们偏离了规范,但未被传统的自动辅助编码器基于传统的自动辅助编码器。使用来自国际小鼠表型联盟(IMPC)的数据,我们表明ODBAE可以识别具有复杂,多指标表型的基因敲除小鼠 - 单个性状中正常,但同时考虑时异常。此外,该方法揭示了新型代谢相关的基因,并发现了跨代谢指标的协调异常。我们的结果突出了ODBAE在检测关节异常的实用性,并促进了我们对生物系统中稳态扰动的理解。
Identifying complex phenotypes from high-dimensional biological data is challenging due to the intricate interdependencies among different physiological indicators. Traditional approaches often focus on detecting outliers in single variables, overlooking the broader network of interactions that contribute to phenotype emergence. Here, we introduce ODBAE (Outlier Detection using Balanced Autoencoders), a machine learning method designed to uncover both subtle and extreme outliers by capturing latent relationships among multiple physiological parameters. ODBAE's revised loss function enhances its ability to detect two key types of outliers: influential points (IP), which disrupt latent correlations between dimensions, and high leverage points (HLP), which deviate from the norm but go undetected by traditional autoencoder-based methods. Using data from the International Mouse Phenotyping Consortium (IMPC), we show that ODBAE can identify knockout mice with complex, multi-indicator phenotypes - normal in individual traits, but abnormal when considered together. In addition, this method reveals novel metabolism-related genes and uncovers coordinated abnormalities across metabolic indicators. Our results highlight the utility of ODBAE in detecting joint abnormalities and advancing our understanding of homeostatic perturbations in biological systems.