论文标题

注意力吸引的对比度学习用于预测T细胞受体抗原结合特异性

Attention-aware contrastive learning for predicting T cell receptor-antigen binding specificity

论文作者

Fang, Yiming, Liu, Xuejun, Liu, Hui

论文摘要

已经证实,仅MHC I类分子在细胞表面上提出的新抗原的一小部分才能引起T细胞。限制可以归因于T细胞受体(TCR)与肽-MHC复合物(PMHC)的结合特异性。 T细胞与新抗原的结合的计算预测是一项具有挑战性且尚未解决的任务。在本文中,我们提出了一个细心的掩盖对比学习模型ATMTCR,用于推断TCR抗原结合特异性。对于每个输入TCR序列,我们使用变压器编码器将其转换为潜在表示,然后掩盖了由注意力重量引导的一部分残基以产生其对比度视图。在大规模的TCR CDR3序列上进行了预处理,我们验证了对比度学习可显着改善TCR与肽-MHC复合物(PMHC)的预测性能。除了在TCR序列中检测重要的氨基酸及其位置外,我们的模型还可以提取TCR抗原结合特异性基础的高阶语义信息。比较实验是在两个独立数据集上进行的,我们的方法比其他现有算法更好地实现了性能。此外,我们通过注意力重量有效地确定了重要的氨基酸及其位置偏好,这表明我们提出的模型的解释性。

It has been verified that only a small fraction of the neoantigens presented by MHC class I molecules on the cell surface can elicit T cells. The limitation can be attributed to the binding specificity of T cell receptor (TCR) to peptide-MHC complex (pMHC). Computational prediction of T cell binding to neoantigens is an challenging and unresolved task. In this paper, we propose an attentive-mask contrastive learning model, ATMTCR, for inferring TCR-antigen binding specificity. For each input TCR sequence, we used Transformer encoder to transform it to latent representation, and then masked a proportion of residues guided by attention weights to generate its contrastive view. Pretraining on large-scale TCR CDR3 sequences, we verified that contrastive learning significantly improved the prediction performance of TCR binding to peptide-MHC complex (pMHC). Beyond the detection of important amino acids and their locations in the TCR sequence, our model can also extracted high-order semantic information underlying the TCR-antigen binding specificity. Comparison experiments were conducted on two independent datasets, our method achieved better performance than other existing algorithms. Moreover, we effectively identified important amino acids and their positional preferences through attention weights, which indicated the interpretability of our proposed model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源