论文标题
社交媒体开采处方药的毒素:端到端管道,挑战和未来的工作
Social media mining for toxicovigilance of prescription medications: End-to-end pipeline, challenges and future work
论文作者
论文摘要
在全球和美国,药物使用,药物使用障碍和与药物使用相关的过量是主要的公共卫生问题。从公共卫生的角度来解决这些问题的一个关键方面是改善了监视。传统的监视系统是懒惰的,社交媒体可能是及时数据的有用来源。但是,社交媒体的采矿知识具有挑战性,需要开发先进的人工智能,特别是自然语言处理(NLP)和机器学习方法。我们开发了一条复杂的端到端管道,用于挖掘有关社交媒体的非医学处方药物使用的信息,即Twitter和Reddit。我们的管道采用监督的机器学习和NLP来滤除噪音并表征chat不休。在本文中,我们描述了我们四年来开发的端到端管道。除了描述我们的数据挖掘基础设施外,我们还讨论了社交媒体挖掘的毒素挑战和可能的未来研究方向的现有挑战。
Substance use, substance use disorder, and overdoses related to substance use are major public health problems globally and in the United States. A key aspect of addressing these problems from a public health standpoint is improved surveillance. Traditional surveillance systems are laggy, and social media are potentially useful sources of timely data. However, mining knowledge from social media is challenging, and requires the development of advanced artificial intelligence, specifically natural language processing (NLP) and machine learning methods. We developed a sophisticated end-to-end pipeline for mining information about nonmedical prescription medication use from social media, namely Twitter and Reddit. Our pipeline employs supervised machine learning and NLP for filtering out noise and characterizing the chatter. In this paper, we describe our end-to-end pipeline developed over four years. In addition to describing our data mining infrastructure, we discuss existing challenges in social media mining for toxicovigilance, and possible future research directions.