走向Neorl网络；目的图的出现

论文标题

走向Neorl网络；目的图的出现

Towards neoRL networks; the emergence of purposive graphs

论文作者

Leikanger, Per R.

论文摘要

目的性AI的NEORL框架通过模拟认知图实现潜在学习，其一般价值函数（GVF）表达对单独状态的操作欲望。代理人对奖励的期望，以在经过的空间中学习的预测表示，使Neorl代理可以根据奖励假设从学习的地图中提取有目的的行为。我们将NEORL模块视为网络中的节点，以欲望为输入和州行动Q值作为输出，我们将进一步探讨这一寓言。我们看到，具有欧几里得意义的行动设置意味着将国家行动向量解释为欧几里得的欲望投影。代理中神经节点的自主性欲望允许更深的神经行为图。实验证实了由自主欲望控制的NEORL网络的影响，验证了目的网络的四个原则。由有目的网络管辖的神经代理可以在学习时实时导航欧几里得空间，这体现了现代AI如何仍然可以从早期心理学中获利。

The neoRL framework for purposive AI implements latent learning by emulated cognitive maps, with general value functions (GVF) expressing operant desires toward separate states. The agent's expectancy of reward, expressed as learned projections in the considered space, allows the neoRL agent to extract purposive behavior from the learned map according to the reward hypothesis. We explore this allegory further, considering neoRL modules as nodes in a network with desire as input and state-action Q-value as output; we see that action sets with Euclidean significance imply an interpretation of state-action vectors as Euclidean projections of desire. Autonomous desire from neoRL nodes within the agent allows for deeper neoRL behavioral graphs. Experiments confirm the effect of neoRL networks governed by autonomous desire, verifying the four principles for purposive networks. A neoRL agent governed by purposive networks can navigate Euclidean spaces in real-time while learning, exemplifying how modern AI still can profit from inspiration from early psychology.

下载PDF全文

下载文献需遵守相关版权规定

论文标题