深度RNN通用性能的最小宽度

论文标题

深度RNN通用性能的最小宽度

Minimal Width for Universal Property of Deep RNN

论文作者

Song, Chang hoon, Hwang, Geonho, Lee, Jun ho, Kang, Myungjoo

论文摘要

复发性神经网络（RNN）是一个广泛使用的深度学习网络，用于处理顺序数据。模仿动态系统，无限宽度的RNN可以在紧凑型域中近似任何开放的动力系统。通常，在实践中，具有有限宽度的深网比广泛的网络更有效。但是，深窄结构的通用近似定理尚未得到广泛的研究。在这项研究中，我们证明了深窄的RNN的普遍性，并表明通用性最小宽度的上限可以独立于数据的长度。具体而言，我们表明，具有relu激活的深度RNN可以分别与宽度$ d_x+d_y+2 $和$ \ max \ {d_x+1，d_y \} $发挥作用，$ d_x+d_y+d_y+2 $，$ \ \} $，目标函数在$ \ nationbbbbbbbb $ \ natione fir的$ \ natione fin fir sugiential j {d_ $ \ mathbb {r}^{d_y} $中的向量。如果激活功能为$ \ tanh $，我们还计算了所需的额外宽度。此外，我们证明了其他复发网络的普遍性，例如双向RNN。我们的理论和证明技术桥接了多层感知器和RNN，这可能是对深入RNN进行进一步研究的第一步。

A recurrent neural network (RNN) is a widely used deep-learning network for dealing with sequential data. Imitating a dynamical system, an infinite-width RNN can approximate any open dynamical system in a compact domain. In general, deep networks with bounded widths are more effective than wide networks in practice; however, the universal approximation theorem for deep narrow structures has yet to be extensively studied. In this study, we prove the universality of deep narrow RNNs and show that the upper bound of the minimum width for universality can be independent of the length of the data. Specifically, we show that a deep RNN with ReLU activation can approximate any continuous function or $L^p$ function with the widths $d_x+d_y+2$ and $\max\{d_x+1,d_y\}$, respectively, where the target function maps a finite sequence of vectors in $\mathbb{R}^{d_x}$ to a finite sequence of vectors in $\mathbb{R}^{d_y}$. We also compute the additional width required if the activation function is $\tanh$ or more. In addition, we prove the universality of other recurrent networks, such as bidirectional RNNs. Bridging a multi-layer perceptron and an RNN, our theory and proof technique can be an initial step toward further research on deep RNNs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题