Idiapers @ Causal News Corpus 2022：通过预先训练的自动回归语言模型提取因果效应信号三胞胎

论文标题

Idiapers @ Causal News Corpus 2022：通过预先训练的自动回归语言模型提取因果效应信号三胞胎

IDIAPers @ Causal News Corpus 2022: Extracting Cause-Effect-Signal Triplets via Pre-trained Autoregressive Language Model

论文作者

Fajcik, Martin, Singh, Muskaan, Zuluaga-Gomez, Juan, Villatoro-Tello, Esaú, Burdisso, Sergio, Motlicek, Petr, Smrz, Pavel

论文摘要

在本文中，我们描述了案例-2022中的子任务2（与休闲新闻语料库的事件因果关系识别）中的共享任务提交。挑战的重点是自动检测新闻媒体中句子中存在的所有原因信号跨度。我们使用T5（一种预先训练的自回归语言模型）检测句子中的因果信号跨度。我们迭代地识别所有原因效应信号跨度三重态，始终在先前预测的三胞胎上预测下一个三胞胎。为了预测三胞胎本身，我们考虑了不同的因果关系，例如原因$ \ rightarrow $效果$ \ rightarrow $信号。每个三重态组件都是通过句子上的语言模型，当前三胞胎的前部以及先前预测的三重态生成的。尽管在一个非常小的160个样本数据集上进行了培训，但我们的方法仍取得了竞争性能，并在比赛中排名第二。此外，我们表明，假设$ \ rightarrow $效果或效果$ \ rightarrow $导致订单可实现相似的结果。

In this paper, we describe our shared task submissions for Subtask 2 in CASE-2022, Event Causality Identification with Casual News Corpus. The challenge focused on the automatic detection of all cause-effect-signal spans present in the sentence from news-media. We detect cause-effect-signal spans in a sentence using T5 -- a pre-trained autoregressive language model. We iteratively identify all cause-effect-signal span triplets, always conditioning the prediction of the next triplet on the previously predicted ones. To predict the triplet itself, we consider different causal relationships such as cause$\rightarrow$effect$\rightarrow$signal. Each triplet component is generated via a language model conditioned on the sentence, the previous parts of the current triplet, and previously predicted triplets. Despite training on an extremely small dataset of 160 samples, our approach achieved competitive performance, being placed second in the competition. Furthermore, we show that assuming either cause$\rightarrow$effect or effect$\rightarrow$cause order achieves similar results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题