只需选择一个标志：使用梯度标志辍学的深度多任务模型优化

论文标题

只需选择一个标志：使用梯度标志辍学的深度多任务模型优化

Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout

论文作者

Chen, Zhao, Ngiam, Jiquan, Huang, Yanping, Luong, Thang, Kretzschmar, Henrik, Chai, Yuning, Anguelov, Dragomir

论文摘要

绝大多数深层模型都使用多个梯度信号，通常与多个损失条款的总和相对应，以更新一组共享的可训练权重。但是，这些多个更新可以通过将模型置于冲突的方向上来阻碍最佳培训。我们提出了梯度符号辍学（GradDrop），这是一种概率掩蔽过程，该过程根据其一致性水平在激活层中采样梯度。 GradDrop被实现为简单的深层，可以在任何深网中使用，并与其他梯度平衡方法协同作用。我们表明，GradDrop的表现优于传统多任务和转移学习设置中最先进的多旋转方法，我们讨论了GradDrop如何揭示最佳的多洛斯训练与梯度随机性之间的联系。

The vast majority of deep models use multiple gradient signals, typically corresponding to a sum of multiple loss terms, to update a shared set of trainable weights. However, these multiple updates can impede optimal training by pulling the model in conflicting directions. We present Gradient Sign Dropout (GradDrop), a probabilistic masking procedure which samples gradients at an activation layer based on their level of consistency. GradDrop is implemented as a simple deep layer that can be used in any deep net and synergizes with other gradient balancing approaches. We show that GradDrop outperforms the state-of-the-art multiloss methods within traditional multitask and transfer learning settings, and we discuss how GradDrop reveals links between optimal multiloss training and gradient stochasticity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题