论文标题

只需选择一个标志:使用梯度标志辍学的深度多任务模型优化

Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout

论文作者

Chen, Zhao, Ngiam, Jiquan, Huang, Yanping, Luong, Thang, Kretzschmar, Henrik, Chai, Yuning, Anguelov, Dragomir

论文摘要

绝大多数深层模型都使用多个梯度信号,通常与多个损失条款的总和相对应,以更新一组共享的可训练权重。但是,这些多个更新可以通过将模型置于冲突的方向上来阻碍最佳培训。我们提出了梯度符号辍学(GradDrop),这是一种概率掩蔽过程,该过程根据其一致性水平在激活层中采样梯度。 GradDrop被实现为简单的深层,可以在任何深网中使用,并与其他梯度平衡方法协同作用。我们表明,GradDrop的表现优于传统多任务和转移学习设置中最先进的多旋转方法,我们讨论了GradDrop如何揭示最佳的多洛斯训练与梯度随机性之间的联系。

The vast majority of deep models use multiple gradient signals, typically corresponding to a sum of multiple loss terms, to update a shared set of trainable weights. However, these multiple updates can impede optimal training by pulling the model in conflicting directions. We present Gradient Sign Dropout (GradDrop), a probabilistic masking procedure which samples gradients at an activation layer based on their level of consistency. GradDrop is implemented as a simple deep layer that can be used in any deep net and synergizes with other gradient balancing approaches. We show that GradDrop outperforms the state-of-the-art multiloss methods within traditional multitask and transfer learning settings, and we discuss how GradDrop reveals links between optimal multiloss training and gradient stochasticity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源