论文标题
迈向因果关系。
Towards Causality Extraction from Requirements
论文作者
论文摘要
系统行为通常基于某些事件之间的因果关系(例如,如果事件1,则event2)。因此,这些因果关系也被文本嵌入要求。我们要提取这种因果知识,并利用它自动得出测试案例,并推理需求之间的依赖关系。现有的NLP方法无法以合理的表现从自然语言(NL)中提取因果关系。在本文中,我们描述了建立因果关系提取的新方法的第一步,并贡献了:(1)基于树递归神经网络(TRNN)的NLP体系结构,我们将训练以识别NL需求中的因果关系,以及(2)一个适合训练TRNN的注释方案和数据集。我们的数据集包含来自463个公开要求文档的212,186个句子,并且是迈向因果关系提取金标准语料库的第一步。我们鼓励研究人员为我们的数据集做出贡献,并帮助我们完成因果关系注释过程。此外,还可以进一步注释数据集,以作为其他重新提供NLP任务(例如需求分类)的基准。
System behavior is often based on causal relations between certain events (e.g. If event1, then event2). Consequently, those causal relations are also textually embedded in requirements. We want to extract this causal knowledge and utilize it to derive test cases automatically and to reason about dependencies between requirements. Existing NLP approaches fail to extract causality from natural language (NL) with reasonable performance. In this paper, we describe first steps towards building a new approach for causality extraction and contribute: (1) an NLP architecture based on Tree Recursive Neural Networks (TRNN) that we will train to identify causal relations in NL requirements and (2) an annotation scheme and a dataset that is suitable for training TRNNs. Our dataset contains 212,186 sentences from 463 publicly available requirement documents and is a first step towards a gold standard corpus for causality extraction. We encourage fellow researchers to contribute to our dataset and help us in finalizing the causality annotation process. Additionally, the dataset can also be annotated further to serve as a benchmark for other RE-relevant NLP tasks such as requirements classification.