论文标题

时间差异学习的干扰和概括

Interference and Generalization in Temporal Difference Learning

论文作者

Bengio, Emmanuel, Pineau, Joelle, Precup, Doina

论文摘要

我们研究时间差异(TD)学习的概括与干扰之间的联系。干扰被定义为两个不同梯度的内部产物,代表它们的比对。从有关神经网络,参数共享和学习动态的各种观察结果中,这个数量引起了人们的关注。我们发现,TD很容易导致低干预,不足的一般性参数,而在监督学习中的影响似乎逆转。我们假设该原因可以追溯到干扰和自举动力之间的相互作用。这是通过几个观察结果来支持的:概括差距与TD干扰之间的负相关关系,引导对干扰和局部目标的负面影响,以及TD(0)与TD($λ$)中信息传播率与诸如Monte-Carlo Policy Policy Policy Policy评估等回归任务之间的信息传播率之间的对比。我们希望这些新发现可以指导未来发现更好的自举方法。

We study the link between generalization and interference in temporal-difference (TD) learning. Interference is defined as the inner product of two different gradients, representing their alignment. This quantity emerges as being of interest from a variety of observations about neural networks, parameter sharing and the dynamics of learning. We find that TD easily leads to low-interference, under-generalizing parameters, while the effect seems reversed in supervised learning. We hypothesize that the cause can be traced back to the interplay between the dynamics of interference and bootstrapping. This is supported empirically by several observations: the negative relationship between the generalization gap and interference in TD, the negative effect of bootstrapping on interference and the local coherence of targets, and the contrast between the propagation rate of information in TD(0) versus TD($λ$) and regression tasks such as Monte-Carlo policy evaluation. We hope that these new findings can guide the future discovery of better bootstrapping methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源