通过对抗Softmax近似的极端分类

论文标题

通过对抗Softmax近似的极端分类

Extreme Classification via Adversarial Softmax Approximation

论文作者

Bamler, Robert, Mandt, Stephan

论文摘要

培训分类器在大量类别的“极端分类”中培训，已成为技术，科学和电子商务应用程序的主要兴趣的话题。传统的SoftMax回归引起的梯度成本与$ C $的类数量成正比，这通常非常昂贵。流行的可伸缩软马克斯近似依赖于均匀的负抽样，由于信噪比较差，因此会遭受缓慢的收敛性。在本文中，我们提出了一种简单的训练方法，以通过从模仿数据分布的对抗模型中绘制负面样本来大大增强梯度信号。我们的贡献是三倍：（i）一种对抗性抽样机制，该机制以$ c $中的对数产生负面样本，因此仍导致廉价的梯度更新；（ii）数学证明，这种对抗性采样可最大程度地减少梯度方差，而由于不均匀抽样引起的任何偏差都可以去除；（iii）对大规模数据集的实验结果，该结果表明，相对于几个竞争基线，训练时间减少了数量级。

Training a classifier over a large number of classes, known as 'extreme classification', has become a topic of major interest with applications in technology, science, and e-commerce. Traditional softmax regression induces a gradient cost proportional to the number of classes $C$, which often is prohibitively expensive. A popular scalable softmax approximation relies on uniform negative sampling, which suffers from slow convergence due a poor signal-to-noise ratio. In this paper, we propose a simple training method for drastically enhancing the gradient signal by drawing negative samples from an adversarial model that mimics the data distribution. Our contributions are three-fold: (i) an adversarial sampling mechanism that produces negative samples at a cost only logarithmic in $C$, thus still resulting in cheap gradient updates; (ii) a mathematical proof that this adversarial sampling minimizes the gradient variance while any bias due to non-uniform sampling can be removed; (iii) experimental results on large scale data sets that show a reduction of the training time by an order of magnitude relative to several competitive baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题