论文标题

Adalead:序列设计的简单且强大的自适应贪婪搜索算法

AdaLead: A simple and robust adaptive greedy search algorithm for sequence design

论文作者

Sinai, Sam, Wang, Richard, Whatley, Alexander, Slocum, Stewart, Locane, Elina, Kelsic, Eric D.

论文摘要

生物序列的有效设计将对许多工业和医疗保健领域产生重大影响。但是,发现改进的序列需要解决困难的优化问题。传统上,生物学家通过一种称为“定向进化”的无模型方法(随机突变和选择的迭代过程)来应对这一挑战。随着构建捕获序列到功能图的模型的能力会改善,此类模型可以用作运行实验之前筛选序列的Oracles。近年来,对更好地利用此类神经来胜过无模型方法的更好算法的兴趣已加剧。这些跨度从基于贝叶斯优化的方法到正规生成模型和增强学习的适应。在这项工作中,我们实施开源健身景观探索沙箱(Flex:github.com/samsinai/flexs)环境,以根据其最佳,一致性和鲁棒性来测试和评估这些算法。使用Flexs,我们开发了一种易于实现,可扩展和强大的进化贪婪算法(Adalead)。尽管它很简单,但我们表明,阿达莱德(Adalead)是一个非常强大的基准,它在各种生物学动机的序列设计挑战中胜过更复杂的艺术方法。

Efficient design of biological sequences will have a great impact across many industrial and healthcare domains. However, discovering improved sequences requires solving a difficult optimization problem. Traditionally, this challenge was approached by biologists through a model-free method known as "directed evolution", the iterative process of random mutation and selection. As the ability to build models that capture the sequence-to-function map improves, such models can be used as oracles to screen sequences before running experiments. In recent years, interest in better algorithms that effectively use such oracles to outperform model-free approaches has intensified. These span from approaches based on Bayesian Optimization, to regularized generative models and adaptations of reinforcement learning. In this work, we implement an open-source Fitness Landscape EXploration Sandbox (FLEXS: github.com/samsinai/FLEXS) environment to test and evaluate these algorithms based on their optimality, consistency, and robustness. Using FLEXS, we develop an easy-to-implement, scalable, and robust evolutionary greedy algorithm (AdaLead). Despite its simplicity, we show that AdaLead is a remarkably strong benchmark that out-competes more complex state of the art approaches in a variety of biologically motivated sequence design challenges.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源