Pubsqueezer：一种文本挖掘网络工具，用于将非结构化文档转换为结构化数据

论文标题

Pubsqueezer：一种文本挖掘网络工具，用于将非结构化文档转换为结构化数据

PubSqueezer: A Text-Mining Web Tool to Transform Unstructured Documents into Structured Data

论文作者

Calderone, Alberto

论文摘要

每天发表的科学论文数量令人生畏，并且不断增加。跟上文学是一个挑战。如果一个人想开始探索新主题，那么很难在不阅读很多文章的情况下拥有大图。此外，正如人们通过文学所读到的那样，建立心理联系对于提出可能导致发现的新问题至关重要。在这项工作中，我提出了一种Web工具，该工具使用文本挖掘策略将大量非结构化生物医学文章的集合转换为结构化数据。生成的结果可以快速概述复杂的主题，这可能建议未明确报告的信息。特别是，我展示了两个数据科学分析。首先，我使用此工具提出了基于文献的稀有疾病网络构建，希望它将有助于阐明这些不流行的病理的某些方面。其次，我展示了使用PubsQueezer结果进行的基于文献的分析如何允许描述有关SARS-COV-2的已知事实。在一句话中，使用Pubsqueezer生成的数据使在任何计算分析（例如机器学习，自然语言处理等）中都可以轻松使用科学识字元素。可用性：http：//www.pubsqueezer.com

The amount of scientific papers published every day is daunting and constantly increasing. Keeping up with literature represents a challenge. If one wants to start exploring new topics it is hard to have a big picture without reading lots of articles. Furthermore, as one reads through literature, making mental connections is crucial to ask new questions which might lead to discoveries. In this work, I present a web tool which uses a Text Mining strategy to transform large collections of unstructured biomedical articles into structured data. Generated results give a quick overview on complex topics which can possibly suggest not explicitly reported information. In particular, I show two Data Science analyses. First, I present a literature based rare diseases network build using this tool in the hope that it will help clarify some aspects of these less popular pathologies. Secondly, I show how a literature based analysis conducted with PubSqueezer results allows to describe known facts about SARS-CoV-2. In one sentence, data generated with PubSqueezer make it easy to use scientific literate in any computational analysis such as machine learning, natural language processing etc. Availability: http://www.pubsqueezer.com

下载PDF全文

下载文献需遵守相关版权规定

论文标题