抗体手表：文献中的文本挖掘抗体特异性

论文标题

抗体手表：文献中的文本挖掘抗体特异性

Antibody Watch: Text Mining Antibody Specificity from the Literature

论文作者

Hsu, Chun-Nan, Chang, Chia-Hui, Poopradubsil, Thamolwan, Lo, Amanda, William, Karen A., Lin, Ko-Wei, Bandrowski, Anita, Ozyurt, Ibrahim Burak, Grethe, Jeffrey S., Martone, Maryann E.

论文摘要

抗体是广泛使用的试剂，用于测试蛋白质和其他抗原的表达。但是，当他们不专门结合其提供者为其设计的目标蛋白时，它们可能并不总是可靠地产生结果，从而导致不可靠的研究结果。尽管已经制定了许多建议来解决抗体特异性问题，但涵盖可用于研究人员可用的数百万抗体仍然具有挑战性。在这项研究中，我们通过提取有关文献中报道的抗体特异性的陈述来自动为有问题抗体的用户自动生成警报的可行性。提取的警报可用于构建“抗体手表”知识库，其中包含有问题抗体的支持陈述。我们开发了一个深层的神经网络系统，并通过报告了抗体用途的两千多种文章的语料库测试了其性能。我们将问题分为两个任务。给定输入文章，第一个任务是识别有关抗体特异性的摘要，并分类是否摘要是否报告任何抗体表现出非特异性，因此有问题。第二个任务是将这些片段中的每个片段与摘要中提到的一种或多种抗体联系起来。实验评估表明，我们的系统可以准确地执行分类和连接任务，分别使用超过0.925和0.923的加权F得分和整体上的0.914，以完成联合任务。我们利用研究资源标识符（RRID）精确地识别与提取的特异性片段相关的抗体。结果表明，通过文本挖掘构建有关有问题的抗体的可靠知识基础是可行的。

Antibodies are widely used reagents to test for expression of proteins and other antigens. However, they might not always reliably produce results when they do not specifically bind to the target proteins that their providers designed them for, leading to unreliable research results. While many proposals have been developed to deal with the problem of antibody specificity, it is still challenging to cover the millions of antibodies that are available to researchers. In this study, we investigate the feasibility of automatically generating alerts to users of problematic antibodies by extracting statements about antibody specificity reported in the literature. The extracted alerts can be used to construct an "Antibody Watch" knowledge base containing supporting statements of problematic antibodies. We developed a deep neural network system and tested its performance with a corpus of more than two thousand articles that reported uses of antibodies. We divided the problem into two tasks. Given an input article, the first task is to identify snippets about antibody specificity and classify if the snippets report that any antibody exhibits non-specificity, and thus is problematic. The second task is to link each of these snippets to one or more antibodies mentioned in the snippet. The experimental evaluation shows that our system can accurately perform both classification and linking tasks with weighted F-scores over 0.925 and 0.923, respectively, and 0.914 overall when combined to complete the joint task. We leveraged Research Resource Identifiers (RRID) to precisely identify antibodies linked to the extracted specificity snippets. The result shows that it is feasible to construct a reliable knowledge base about problematic antibodies by text mining.

下载PDF全文

下载文献需遵守相关版权规定

论文标题