无监督的法律文本简化

论文标题

无监督的法律文本简化

Unsupervised Simplification of Legal Texts

论文作者

Cemri, Mert, Çukur, Tolga, Koç, Aykut

论文摘要

法律文本的处理一直是自然语言处理（NLP）的新兴领域的发展。法律文本包含词汇，语义，语法和形态中独特的术语和复杂的语言属性。因此，对于法律领域特定的特定的文本简化（TS）方法的开发对于促进普通人理解法律文本并为主流法律NLP应用程序的高级模型提供投入至关重要。尽管最近的一项研究提出了一种基于规则的TS法律文本方法，但以前尚未考虑法律领域中的基于学习的TS。在这里，我们介绍了一种无监督的法律文本简化方法（USLT）。 USLT通过替换复杂的单词和分裂长句子来执行特定于域的TS。为此，USLT检测句子中的复杂单词，通过掩盖转换器模型生成候选者，并根据等级分数选择替代的候选者。之后，USLT递归将长句子分解为较短的核心和上下文句子的层次结构，同时保留语义含义。我们证明，USLT在文本简单性中优于最先进的领域TS方法，同时保持语义完整。

The processing of legal texts has been developing as an emerging field in natural language processing (NLP). Legal texts contain unique jargon and complex linguistic attributes in vocabulary, semantics, syntax, and morphology. Therefore, the development of text simplification (TS) methods specific to the legal domain is of paramount importance for facilitating comprehension of legal text by ordinary people and providing inputs to high-level models for mainstream legal NLP applications. While a recent study proposed a rule-based TS method for legal text, learning-based TS in the legal domain has not been considered previously. Here we introduce an unsupervised simplification method for legal texts (USLT). USLT performs domain-specific TS by replacing complex words and splitting long sentences. To this end, USLT detects complex words in a sentence, generates candidates via a masked-transformer model, and selects a candidate for substitution based on a rank score. Afterward, USLT recursively decomposes long sentences into a hierarchy of shorter core and context sentences while preserving semantic meaning. We demonstrate that USLT outperforms state-of-the-art domain-general TS methods in text simplicity while keeping the semantics intact.

下载PDF全文

下载文献需遵守相关版权规定

论文标题