MMTM：数学单词问题的多任务多任码变压器

论文标题

MMTM：数学单词问题的多任务多任码变压器

MMTM: Multi-Tasking Multi-Decoder Transformer for Math Word Problems

论文作者

Faldu, Keyur, Sheth, Amit, Kikani, Prashant, Patel, Darshan

论文摘要

最近，得出了许多新型的神经体系结构，以通过预测表达树来解决数学单词问题。这些体系结构因SEQ2SEQ模型而异，包括利用图与树解码器结合的编码器。这些模型可以在各种MWPS数据集上实现良好的性能，但是当应用于对抗性挑战数据集SVAMP时，性能差。我们提出了一种新型模型MMTM，该模型在预训练期间利用多任务和多核模型。它通过使用预订，按顺序和后阶遍历表达式树来得出标签来创建变体任务，并在多任务框架中使用特定于任务的解码器。我们利用具有较低维度的变压器体系结构，并从Roberta模型初始化权重。 MMTM模型具有更好的数学推理能力和通用性，我们通过优于SEQ2SEQ，GTS和Graph2Tree的最佳最佳基线模型来证明，在对抗性挑战数据集SVAMP上相对改善的相对改善为19.4％。

Recently, quite a few novel neural architectures were derived to solve math word problems by predicting expression trees. These architectures varied from seq2seq models, including encoders leveraging graph relationships combined with tree decoders. These models achieve good performance on various MWPs datasets but perform poorly when applied to an adversarial challenge dataset, SVAMP. We present a novel model MMTM that leverages multi-tasking and multi-decoder during pre-training. It creates variant tasks by deriving labels using pre-order, in-order and post-order traversal of expression trees, and uses task-specific decoders in a multi-tasking framework. We leverage transformer architectures with lower dimensionality and initialize weights from RoBERTa model. MMTM model achieves better mathematical reasoning ability and generalisability, which we demonstrate by outperforming the best state of the art baseline models from Seq2Seq, GTS, and Graph2Tree with a relative improvement of 19.4% on an adversarial challenge dataset SVAMP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题