论文标题

ELQA:关于英语的元语言问题和答案

ELQA: A Corpus of Metalinguistic Questions and Answers about English

论文作者

Behzad, Shabnam, Sakaguchi, Keisuke, Schneider, Nathan, Zeldes, Amir

论文摘要

我们提出ELQA,这是英语和有关英语的问题和答案的语料库。从两个在线论坛中收集了> 70k的问题(来自英语学习者和其他论坛)涵盖了广泛的主题,包括语法,含义,流利度和词源。答案包括对英语词汇和语法的一般属性的描述,以及有关特定(正确和不正确的)用法示例的解释。与大多数NLP数据集不同,该语料库是元语言的 - 它由有关语言的语言组成。因此,它可以促进对NLU模型的金属语言能力以及语言学习领域中的教育应用进行调查。为了研究这一点,我们在数据集中定义了一个自由形式的答案任务,并对多个LLM(大语言模型)进行评估,以分析其产生元语言答案的能力。

We present ELQA, a corpus of questions and answers in and about the English language. Collected from two online forums, the >70k questions (from English learners and others) cover wide-ranging topics including grammar, meaning, fluency, and etymology. The answers include descriptions of general properties of English vocabulary and grammar as well as explanations about specific (correct and incorrect) usage examples. Unlike most NLP datasets, this corpus is metalinguistic -- it consists of language about language. As such, it can facilitate investigations of the metalinguistic capabilities of NLU models, as well as educational applications in the language learning domain. To study this, we define a free-form question answering task on our dataset and conduct evaluations on multiple LLMs (Large Language Models) to analyze their capacity to generate metalinguistic answers.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源