论文标题
参数扁平twish:一种自适应的非线性激活函数,用于深度学习
Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning
论文作者
论文摘要
激活函数是深度学习的关键组成部分,它在输入和输出之间执行非线性映射。整流的线性单元(Relu)一直是深度学习社区中最受欢迎的激活功能。但是,Relu包含几个缺点,可能导致对深神经网络效率低下的培训,这些缺点是:1)Relu的负取消特性倾向于将负输入视为学习的不重要信息,从而导致绩效降级; 2)Relu的固有预定性质不太可能促进对网络的额外灵活性,表现性和鲁棒性; 3)RELU的平均激活高度为正,并导致网络层中的偏置转移效应; 4)Relu的多线性结构限制了网络的非线性近似能力。为了解决这些缺点,本文引入了参数扁平twish(PFTS),以替代Relu。 By taking ReLU as a baseline method, the experiments showed that PFTS improved classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN- 5A, DNN-5B, DNN-5C,DNN-6和DNN-7分别。此外,PFT在比较方法中也达到了最高的平均等级。提出的PFT在训练过程中表现出较高的非线性近似功率,从而提高了网络的预测性能。
Activation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inefficient training of the deep neural networks, these are: 1) the negative cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in a performance degradation; 2) the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks; 3) the mean activation of ReLU is highly positive and leads to bias shift effect in network layers; and 4) the multilinear structure of ReLU restricts the non-linear approximation power of the networks. To tackle these shortcomings, this paper introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By taking ReLU as a baseline method, the experiments showed that PFTS improved classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN- 5A, DNN-5B, DNN-5C, DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean rank among the comparison methods. The proposed PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks.