论文标题
从头分子生成的条件$β$ -VAE
Conditional $β$-VAE for De Novo Molecular Generation
论文作者
论文摘要
深度学习已经显着提高和加速了从头分子的产生。生成网络,即变异自动编码器(VAE)不仅可以随机生成新的分子,还可以改变分子结构,以优化特定的化学特性,这对于药物发现至关重要。尽管过去已经提出并研究了VAE的药物应用,但它们具有缺陷,限制了它们既优化属性的能力,又限制了句法有效的分子。我们提出了一个经常出现的条件$β$ -VAE,该$β$ -VAE散布了潜在空间,以增强事后分子优化。我们创建了一个相互信息驱动的训练方案和数据增强,以提高分子有效性并促进更长的序列产生。我们在ZINC-250K数据集上证明了框架的功效,在惩罚的LOGP(PLOGP)和QED分数上实现了SOTA不受限制的优化结果,同时还可以匹配当前SOTA的有效性,新颖性和独特性得分的随机生成。我们将当前的SOTA在0.948处的前3个分子上匹配,而在104.29、90.12、69.68中为PLOGP优化设置了新的SOTA,并在约束优化任务上展示了改进的结果。
Deep learning has significantly advanced and accelerated de novo molecular generation. Generative networks, namely Variational Autoencoders (VAEs) can not only randomly generate new molecules, but also alter molecular structures to optimize specific chemical properties which are pivotal for drug-discovery. While VAEs have been proposed and researched in the past for pharmaceutical applications, they possess deficiencies which limit their ability to both optimize properties and decode syntactically valid molecules. We present a recurrent, conditional $β$-VAE which disentangles the latent space to enhance post hoc molecule optimization. We create a mutual information driven training protocol and data augmentations to both increase molecular validity and promote longer sequence generation. We demonstrate the efficacy of our framework on the ZINC-250k dataset, achieving SOTA unconstrained optimization results on the penalized LogP (pLogP) and QED scores, while also matching current SOTA results for validity, novelty and uniqueness scores for random generation. We match the current SOTA on QED for top-3 molecules at 0.948, while setting a new SOTA for pLogP optimization at 104.29, 90.12, 69.68 and demonstrating improved results on the constrained optimization task.