使用基于池的模拟退火和单词矢量模型学习可解释的医学文本分类的正则表达式

论文标题

使用基于池的模拟退火和单词矢量模型学习可解释的医学文本分类的正则表达式

Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing and Word-vector Models

论文作者

Tu, Chaofan, Bai, Ruibin, Lu, Zheng, Aickelin, Uwe, Ge, Peiming, Zhao, Jianshuang

论文摘要

在本文中，我们提出了一种基于规则的引擎，该发动机由医学文本分类的高质量和可解释的正则表达式组成。正则表达式是通过建设性的启发式方法自动生成的，并使用基于池的模拟退火（PSA）方法进行了优化。尽管现有的深神经网络（DNN）方法在大多数自然语言处理（NLP）应用中呈现高质量的性能，但解决方案被认为是对人类的不可解释的黑匣子。因此，在需要可解释的解决方案时，尤其是在医学领域时，通常会引入基于规则的方法。但是，对于大型数据集，正则表达式的构建可能非常密集。这项研究旨在减少手动努力，同时保持高质量的解决方案

In this paper, we propose a rule-based engine composed of high quality and interpretable regular expressions for medical text classification. The regular expressions are auto generated by a constructive heuristic method and optimized using a Pool-based Simulated Annealing (PSA) approach. Although existing Deep Neural Network (DNN) methods present high quality performance in most Natural Language Processing (NLP) applications, the solutions are regarded as uninterpretable black boxes to humans. Therefore, rule-based methods are often introduced when interpretable solutions are needed, especially in the medical field. However, the construction of regular expressions can be extremely labor-intensive for large data sets. This research aims to reduce the manual efforts while maintaining high-quality solutions

下载PDF全文

下载文献需遵守相关版权规定

论文标题