论文标题
通过社交媒体数据分析了解新兴疾病的管道:乳房植入疾病的案例研究
A Pipeline to Understand Emerging Illness via Social Media Data Analysis: A Case Study on Breast Implant Illness
论文作者
论文摘要
背景:在社交媒体的医学定义,正式记录或系统地研究之前,可能会首先通过社交媒体引起公众的关注。一个例子是一种被称为乳腺植入疾病(BII)的现象,该现象在社交媒体上进行了广泛讨论,尽管在医学文献中含糊不清。目的:本研究的目的是构建数据分析管道,使用社交媒体数据了解新兴疾病,并应用管道以了解BII的关键属性。方法:我们使用自然语言处理(NLP)和主题建模进行了社交媒体数据分析的管道。我们从社交媒体数据中提取了与临床文本分析和知识提取系统(CTAKE)有关的症状/症状,疾病/疾病和医疗程序有关的提及。我们将提及映射到标准的医学概念。我们使用潜在的Dirichlet分配(LDA)将映射概念汇总到主题。最后,我们应用了该管道来了解来自几个BIIDedicated社交媒体网站的BII。结果:我们的管道确定了与BII高度相关的毒性,癌症和心理健康问题有关的主题。我们的管道还表明,基于社交媒体讨论,癌症,自身免疫性疾病和心理健康问题是与乳房植入物相关的新兴问题。该管道还确定了诸如破裂,感染,疼痛和疲劳之类的提及是公众中常见的自我报告的问题,以及硅胶植入物的毒性。结论:我们的研究可以激发未来研究BII症状和因素的工作。我们的研究提供了使用NLP技术从社交媒体中获得BII的首次分析,并证明了使用社交媒体信息更好地了解类似的新兴疾病的潜力。
Background: A new illness could first come to the public attention over social media before it is medically defined, formally documented or systematically studied. One example is a phenomenon known as breast implant illness (BII) that has been extensively discussed on social media, though vaguely defined in medical literature. Objectives: The objective of this study is to construct a data analysis pipeline to understand emerging illness using social media data, and to apply the pipeline to understand key attributes of BII. Methods: We conducted a pipeline of social media data analysis using Natural Language Processing (NLP) and topic modeling. We extracted mentions related to signs/symptoms, diseases/disorders and medical procedures using the Clinical Text Analysis and Knowledge Extraction System (cTAKES) from social media data. We mapped the mentions to standard medical concepts. We summarized mapped concepts to topics using Latent Dirichlet Allocation (LDA). Finally, we applied this pipeline to understand BII from several BII-dedicated social media sites. Results: Our pipeline identified topics related to toxicity, cancer and mental health issues that are highly associated with BII. Our pipeline also shows that cancers, autoimmune disorders and mental health problems are emerging concerns associated with breast implants based on social media discussions. The pipeline also identified mentions such as rupture, infection, pain and fatigue as common self-reported issues among the public, as well as toxicity from silicone implants. Conclusions: Our study could inspire future work studying the suggested symptoms and factors of BII. Our study provides the first analysis and derived knowledge of BII from social media using NLP techniques, and demonstrates the potential of using social media information to better understand similar emerging illnesses.