论文标题
低温EM结构的潜在空间扩散模型
Latent Space Diffusion Models of Cryo-EM Structures
论文作者
论文摘要
冷冻电子显微镜(Cryo-EM)在结构生物学的工具中是独一无二的,其能够成像大型动态蛋白质复合物。这种能力的关键是图像处理算法,用于异质性冷冻EM重建,包括最近的基于深度学习的方法。最先进的方法Cryodrgn使用变分自动编码器(VAE)框架从单个粒子冷冻成像数据中学习蛋白质结构的连续分布。尽管Cryodrgn可以对复杂的结构运动进行建模,但VAE的高斯先验分布无法与骨料近似后部相匹配,从而阻止了结构的生成采样,尤其是对于多模式分布(例如组成异质性)。在这里,我们将扩散模型训练为Cryodrgn框架中的表现力,可学习的先验。我们的方法直接从冷冻EM成像数据中学习了高质量的生成模型。我们显示了从两个合成和两个真实数据集中进行采样的能力,其中样品准确地遵循数据分布,与VAE先验分布的样品不同。我们还展示了如何利用潜在的潜在空间遍历和感兴趣状态之间的插值来利用的扩散模型。通过学习数据分布的准确模型,我们的方法在异质性冷冻EM合奏中解锁了生成建模,采样和分布分析的工具。
Cryo-electron microscopy (cryo-EM) is unique among tools in structural biology in its ability to image large, dynamic protein complexes. Key to this ability is image processing algorithms for heterogeneous cryo-EM reconstruction, including recent deep learning-based approaches. The state-of-the-art method cryoDRGN uses a Variational Autoencoder (VAE) framework to learn a continuous distribution of protein structures from single particle cryo-EM imaging data. While cryoDRGN can model complex structural motions, the Gaussian prior distribution of the VAE fails to match the aggregate approximate posterior, which prevents generative sampling of structures especially for multi-modal distributions (e.g. compositional heterogeneity). Here, we train a diffusion model as an expressive, learnable prior in the cryoDRGN framework. Our approach learns a high-quality generative model over molecular conformations directly from cryo-EM imaging data. We show the ability to sample from the model on two synthetic and two real datasets, where samples accurately follow the data distribution unlike samples from the VAE prior distribution. We also demonstrate how the diffusion model prior can be leveraged for fast latent space traversal and interpolation between states of interest. By learning an accurate model of the data distribution, our method unlocks tools in generative modeling, sampling, and distribution analysis for heterogeneous cryo-EM ensembles.