论文标题

探索SoftMax功能的替代方案

Exploring Alternatives to Softmax Function

论文作者

Banerjee, Kunal, C, Vishak Prasad, Gupta, Rishi Raj, Vyas, Karthik, H, Anushree, Mishra, Biswajit

论文摘要

SoftMax功能被广泛用于人工神经网络中,用于多类分类,多标记分类,注意机制等。但是,其功效通常在文献中受到质疑。 log-softmax损失已显示出属于更通用的损耗功能,称为球形家族,其成员log-taylor softmax损失可以说是该类别中最好的替代方案。在另一种试图增强SoftMax函数的判别性质的方法中,已提出Soft-Margin SoftMax(Sm-SoftMax)是最合适的替代方案。在这项工作中,我们研究了Taylor SoftMax,Sm-Softmax和我们提出的Sm-Taylor SoftMax,这是早期两个函数的合并,作为软马克斯功能的替代方案。此外,我们探讨了将泰勒·软玛克斯(Taylor Softmax)扩展到十个术语(原始工作仅扩展到两个术语)的效果,以及在反向传播期间认为泰勒·软玛克斯(Taylor Softmax)为有限或无限序列的后果。我们在不同数据集上对图像分类任务的实验表明,Sm-Taylor SoftMax函数总是有超越正常软磁功能及其其他选择的配置。

Softmax function is widely used in artificial neural networks for multiclass classification, multilabel classification, attention mechanisms, etc. However, its efficacy is often questioned in literature. The log-softmax loss has been shown to belong to a more generic class of loss functions, called spherical family, and its member log-Taylor softmax loss is arguably the best alternative in this class. In another approach which tries to enhance the discriminative nature of the softmax function, soft-margin softmax (SM-softmax) has been proposed to be the most suitable alternative. In this work, we investigate Taylor softmax, SM-softmax and our proposed SM-Taylor softmax, an amalgamation of the earlier two functions, as alternatives to softmax function. Furthermore, we explore the effect of expanding Taylor softmax up to ten terms (original work proposed expanding only to two terms) along with the ramifications of considering Taylor softmax to be a finite or infinite series during backpropagation. Our experiments for the image classification task on different datasets reveal that there is always a configuration of the SM-Taylor softmax function that outperforms the normal softmax function and its other alternatives.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源