论文标题
多:端到端多任务学习变压器
MulT: An End-to-End Multitask Learning Transformer
论文作者
论文摘要
我们提出了一个名为MULT的端到端多任务学习变压器框架,以同时学习多个高级视觉任务,包括深度估计,语义分割,重新剪接,表面正常估计,2D关键点检测和边缘检测。基于SWIN Transformer模型,我们的框架将输入图像编码为共享表示形式,并使用基于任务的分解器头对每个视觉任务进行预测。我们方法的核心是一种共享的注意机制,对整个任务的依赖性进行了建模。我们在多个多任务基准上评估了我们的模型,这表明我们的多框架的表现优于最先进的多任务卷积神经网络模型和所有相应的单个任务变压器模型。我们的实验进一步强调了在所有任务中分享注意力的好处,并证明我们的多模型是强大的,并且可以很好地推广到新领域。我们的项目网站位于https://ivrl.github.io/mult/。
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks, including depth estimation, semantic segmentation, reshading, surface normal estimation, 2D keypoint detection, and edge detection. Based on the Swin transformer model, our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads. At the heart of our approach is a shared attention mechanism modeling the dependencies across the tasks. We evaluate our model on several multitask benchmarks, showing that our MulT framework outperforms both the state-of-the art multitask convolutional neural network models and all the respective single task transformer models. Our experiments further highlight the benefits of sharing attention across all the tasks, and demonstrate that our MulT model is robust and generalizes well to new domains. Our project website is at https://ivrl.github.io/MulT/.