论文标题
Mintaka:一个复杂,自然和多语言数据集用于端到端问题
Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering
论文作者
论文摘要
我们介绍了Mintaka,这是一种旨在通过端到端提问模型进行实验的复杂,自然和多语言数据集。 Mintaka由用英语收集的20,000对提问对组成,用Wikidata实体注释,并翻译成阿拉伯语,法语,德语,印度人,意大利,意大利语,日语,葡萄牙语和西班牙语,总共提供18万个样品。 Mintaka包括8种类型的复杂问题,包括最高级,交叉路口和多跳的问题,这些问题自然而然地从人群工人那里引起。我们在Mintaka上运行基线,其中最好的是英语中的38%命中@1@1,而31%的命中@1多种命中率,表明现有型号具有改进的余地。我们在https://github.com/amazon-research/mintaka上发布mintaka。
We introduce Mintaka, a complex, natural, and multilingual dataset designed for experimenting with end-to-end question-answering models. Mintaka is composed of 20,000 question-answer pairs collected in English, annotated with Wikidata entities, and translated into Arabic, French, German, Hindi, Italian, Japanese, Portuguese, and Spanish for a total of 180,000 samples. Mintaka includes 8 types of complex questions, including superlative, intersection, and multi-hop questions, which were naturally elicited from crowd workers. We run baselines over Mintaka, the best of which achieves 38% hits@1 in English and 31% hits@1 multilingually, showing that existing models have room for improvement. We release Mintaka at https://github.com/amazon-research/mintaka.