论文标题

AKF-SR:基于自适应的卡尔曼过滤的后继表示

AKF-SR: Adaptive Kalman Filtering-based Successor Representation

论文作者

Malekzadeh, Parvin, Salimibeni, Mohammad, Hou, Ming, Mohammadi, Arash, Plataniotis, Konstantinos N.

论文摘要

在神经科学方面的最新研究表明,基于连续的模型(SR)模型比无模型算法更快地适应了目标位置的变化或奖励功能,而与基于模型的算法相比,计算成本较低。但是,尚不清楚这种表示如何帮助动物在决策中管理不确定性。现有的SR学习方法不会捕获有关估计的SR的不确定性。为了解决此问题,本文提出了一个基于卡尔曼过滤器的SR框架,称为自适应Kalman过滤基于基于Kalman的后继代表(AKF-SR)。首先,在AKF-SR框架中使用了Kalman时间差异方法,即Kalman过滤器和时间差异方法的组合,将SR学习过程施加到过滤问题中,从而使SR的不确定性估算中受益,并且在与Deep Neural网络基于Neural网络的Algorith相比的内存需求和灵敏度的不确定性估计中受益。然后在提出的AKF-SR框架中应用自适应的卡尔曼过滤方法,以调整Kalman Filter的测量噪声协方差和测量映射函数,这是影响过滤器性能的最重要参数。此外,提出了一种积极的学习方法,该方法利用SR的估计不确定性形成行为策略,从而提出更多访问较少的价值的行为策略,以改善代理商在与环境互动时收到的奖励方面的整体绩效。

Recent studies in neuroscience suggest that Successor Representation (SR)-based models provide adaptation to changes in the goal locations or reward function faster than model-free algorithms, together with lower computational cost compared to that of model-based algorithms. However, it is not known how such representation might help animals to manage uncertainty in their decision-making. Existing methods for SR learning do not capture uncertainty about the estimated SR. In order to address this issue, the paper presents a Kalman filter-based SR framework, referred to as Adaptive Kalman Filtering-based Successor Representation (AKF-SR). First, Kalman temporal difference approach, which is a combination of the Kalman filter and the temporal difference method, is used within the AKF-SR framework to cast the SR learning procedure into a filtering problem to benefit from the uncertainty estimation of the SR, and also decreases in memory requirement and sensitivity to model's parameters in comparison to deep neural network-based algorithms. An adaptive Kalman filtering approach is then applied within the proposed AKF-SR framework in order to tune the measurement noise covariance and measurement mapping function of Kalman filter as the most important parameters affecting the filter's performance. Moreover, an active learning method that exploits the estimated uncertainty of the SR to form the behaviour policy leading to more visits to less certain values is proposed to improve the overall performance of an agent in terms of received rewards while interacting with its environment.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源