论文标题
值得信赖的AI的非成像医学数据综合:一项全面调查
Non-Imaging Medical Data Synthesis for Trustworthy AI: A Comprehensive Survey
论文作者
论文摘要
数据质量是发展医疗保健中值得信赖的AI的关键因素。大量具有控制混杂因素的策划数据集可以帮助提高下游AI算法的准确性,鲁棒性和隐私性。但是,访问高质量的数据集受数据获取的技术难度的限制,并且严格的道德限制阻碍了医疗保健数据的大规模共享。数据综合算法生成具有与实际临床数据相似的分布的数据,可以作为解决可信度AI的发展过程中缺乏优质数据的潜在解决方案。但是,最新的数据合成算法,尤其是深度学习算法,更多地集中在成像数据上,同时忽略了非成像医疗保健数据的综合,包括临床测量,医疗信号和波形以及电子保健记录(EHR)。因此,在本文中,我们将回顾合成算法,特别是对于非成像医学数据,目的是在该领域提供可信赖的AI。该教程风格的审查论文将对包括算法,评估,局限性和未来研究方向在内的各个方面进行全面描述。
Data quality is the key factor for the development of trustworthy AI in healthcare. A large volume of curated datasets with controlled confounding factors can help improve the accuracy, robustness and privacy of downstream AI algorithms. However, access to good quality datasets is limited by the technical difficulty of data acquisition and large-scale sharing of healthcare data is hindered by strict ethical restrictions. Data synthesis algorithms, which generate data with a similar distribution as real clinical data, can serve as a potential solution to address the scarcity of good quality data during the development of trustworthy AI. However, state-of-the-art data synthesis algorithms, especially deep learning algorithms, focus more on imaging data while neglecting the synthesis of non-imaging healthcare data, including clinical measurements, medical signals and waveforms, and electronic healthcare records (EHRs). Thus, in this paper, we will review the synthesis algorithms, particularly for non-imaging medical data, with the aim of providing trustworthy AI in this domain. This tutorial-styled review paper will provide comprehensive descriptions of non-imaging medical data synthesis on aspects including algorithms, evaluations, limitations and future research directions.