论文标题

通过合奏特征选择方法选择的人类肠道微生物群的炎症性肠病生物标志物

Inflammatory Bowel Disease Biomarkers of Human Gut Microbiota Selected via Ensemble Feature Selection Methods

论文作者

Hacilar, Hilal, Nalbantoglu, O. Ufuk, Aran, Oya, Bakir-Gungor, Burcu

论文摘要

下一代测序和OMICS技术的巨大提升使得可以表征人类肠道微生物组(驻留在我们胃肠道中的微生物群落的集体基因组)。尽管其中一些微生物被认为是我们免疫系统的必不可少的调节剂,但其他一些微生物可能引起多种疾病,例如炎症性肠病(IBD),糖尿病和癌症。 IBD是一种与肠道相关的疾病,在该疾病中,健康肠道微生物组的偏差被认为与IBD相关。尽管现有的研究试图解开与IBD疾病相关的肠道微生物组的组成,但全面的情况远非完整。由于元基因组研究的复杂性,最新机器学习技术的应用变得流行,以解决元基因组数据分析领域的广泛问题。在这方面,使用IBD相关的元基因组学数据集,本研究利用了监督和无监督的机器学习算法,i)生成一个分类模型,该模型有助于IBD诊断,ii)发现IBD相关的生物标志物,III),以查找使用K Meame Meman Meman Meman和rierarch的IBD患者的子组。为了处理特征的高维度,我们应用了鲁棒特征选择算法,例如条件相互信息最大化(CMIM),基于快速相关的滤波器(FCBF),最小冗余最大相关性(MRMR)和极端梯度增强(XGBoost)。在我们的10倍交叉验证的实验中,XGBoost在最大程度地减少用于诊断IBD的微生物群中具有相当大的作用,从而减少了成本和时间。我们观察到,与单个分类器相比,诸如KNN和LogitBoost之类的集合方法为IBD的分类提供了更好的性能度量。

The tremendous boost in the next generation sequencing and in the omics technologies makes it possible to characterize human gut microbiome (the collective genomes of the microbial community that reside in our gastrointestinal tract). While some of these microorganisms are considered as essential regulators of our immune system, some others can cause several diseases such as Inflammatory Bowel Diseases (IBD), diabetes, and cancer. IBD, is a gut related disorder where the deviations from the healthy gut microbiome are considered to be associated with IBD. Although existing studies attempt to unveal the composition of the gut microbiome in relation to IBD diseases, a comprehensive picture is far from being complete. Due to the complexity of metagenomic studies, the applications of the state of the art machine learning techniques became popular to address a wide range of questions in the field of metagenomic data analysis. In this regard, using IBD associated metagenomics dataset, this study utilizes both supervised and unsupervised machine learning algorithms, i) to generate a classification model that aids IBD diagnosis, ii) to discover IBD associated biomarkers, iii) to find subgroups of IBD patients using k means and hierarchical clustering. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization (CMIM), Fast Correlation Based Filter (FCBF), min redundancy max relevance (mRMR) and Extreme Gradient Boosting (XGBoost). In our experiments with 10 fold cross validation, XGBoost had a considerable effect in terms of minimizing the microbiota used for the diagnosis of IBD and thus reducing the cost and time. We observed that compared to the single classifiers, ensemble methods such as kNN and logitboost resulted in better performance measures for the classification of IBD.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源