论文标题

探索RNN中计数行为的长期概括

Exploring the Long-Term Generalization of Counting Behavior in RNNs

论文作者

El-Naggar, Nadine, Madhyastha, Pranava, Weyde, Tillman

论文摘要

在这项研究中,我们研究了LSTM,RELU和GRU模型在长序列上计数任务的概括。以前的理论工作已经确定,具有relu激活和LSTM的RNN具有适当的配置计数的能力,而GRU的局限性则可以防止正确计数较长的序列。尽管如此,LSTM在Dyck-1语言上的一些积极的经验结果,但我们的实验结果表明,LSTM无法学习比训练数据中明显更长的序列的正确计数行为。 Relus在行为方面显示出更大的差异,在大多数情况下,概括较差。长序列的概括在经验上与验证损失相关,但是可靠的长序列概括似乎不是通过使用当前技术的反向传播来实现。我们展示了LSTM,GRU和RELUS的不同故障模式。特别是,我们观察到,在LSTMS中激活函数的饱和度以及在标准训练方案中无法实现概括计数行为的正确权重设置。总而言之,学习可概括的计数行为仍然是一个开放的问题,我们讨论了进一步研究的潜在方法。

In this study, we investigate the generalization of LSTM, ReLU and GRU models on counting tasks over long sequences. Previous theoretical work has established that RNNs with ReLU activation and LSTMs have the capacity for counting with suitable configuration, while GRUs have limitations that prevent correct counting over longer sequences. Despite this and some positive empirical results for LSTMs on Dyck-1 languages, our experimental results show that LSTMs fail to learn correct counting behavior for sequences that are significantly longer than in the training data. ReLUs show much larger variance in behavior and in most cases worse generalization. The long sequence generalization is empirically related to validation loss, but reliable long sequence generalization seems not practically achievable through backpropagation with current techniques. We demonstrate different failure modes for LSTMs, GRUs and ReLUs. In particular, we observe that the saturation of activation functions in LSTMs and the correct weight setting for ReLUs to generalize counting behavior are not achieved in standard training regimens. In summary, learning generalizable counting behavior is still an open problem and we discuss potential approaches for further research.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源