匪徒理论和汤普森采样引导的定向进化，以优化序列

论文标题

匪徒理论和汤普森采样引导的定向进化，以优化序列

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

论文作者

Yuan, Hui, Ni, Chengzhuo, Wang, Huazheng, Zhang, Xuezhou, Cong, Le, Szepesvári, Csaba, Wang, Mengdi

论文摘要

定向进化（DE）是一种建于1960年代的具有里程碑意义的湿lab方法，可以通过进化的候选序列来发现新型蛋白质设计。生物技术的最新进展使得可以收集高通量数据，从而可以使用机器学习来绘制蛋白质的序列到功能关系。对于加速蛋白质优化的机器学习辅助DE的兴趣越来越大。然而，对DE的理论理解以及DE中的机器学习的使用仍然有限。在本文中，我们将DE与Bandit学习理论联系起来，并首次尝试研究DE中的遗憾最小化。我们提出了一个汤普森采样引导的定向进化（TS-DE）框架，以进行序列优化，其中序列到功能映射尚不清楚，并且查询单个值的昂贵和嘈杂的测量值。 TS-DE根据收集的测量来更新功能后部。它使用后采样函数估计值来指导DE中的交叉重组和突变步骤。如果是线性模型，我们表明TS-DE享受贝叶斯的后悔$ \ tilde o（d^{2} \ sqrt {mt}）$，其中$ d $是特征尺寸，$ m $是人口大小，$ t $ t $是回合的数量。这种遗憾的束缚几乎是最佳的，证实匪徒学习可以证明可以加速DE。它可能对更一般的序列优化和进化算法有影响。

Directed Evolution (DE), a landmark wet-lab method originated in 1960s, enables discovery of novel protein designs via evolving a population of candidate sequences. Recent advances in biotechnology has made it possible to collect high-throughput data, allowing the use of machine learning to map out a protein's sequence-to-function relation. There is a growing interest in machine learning-assisted DE for accelerating protein optimization. Yet the theoretical understanding of DE, as well as the use of machine learning in DE, remains limited. In this paper, we connect DE with the bandit learning theory and make a first attempt to study regret minimization in DE. We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function mapping is unknown and querying a single value is subject to costly and noisy measurements. TS-DE updates a posterior of the function based on collected measurements. It uses a posterior-sampled function estimate to guide the crossover recombination and mutation steps in DE. In the case of a linear model, we show that TS-DE enjoys a Bayesian regret of order $\tilde O(d^{2}\sqrt{MT})$, where $d$ is feature dimension, $M$ is population size and $T$ is number of rounds. This regret bound is nearly optimal, confirming that bandit learning can provably accelerate DE. It may have implications for more general sequence optimization and evolutionary algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题