关于加固学习过滤的温和讲座

论文标题

关于加固学习过滤的温和讲座

A Gentle Lecture Note on Filtrations in Reinforcement Learning

论文作者

van Heeswijk, W. J. A.

论文摘要

本说明旨在提供有关强化学习（RL）中过滤概念的基本直觉。过滤通常用于正式定义RL问题，但是对于没有测量理论背景的人来说，它们的含义可能并不是显着的。从本质上讲，过滤是一种捕获直到时间$ t $的部分知识的结构，而没有揭示已经模拟的任何未来信息，但没有向决策者透露。我们在离散和连续结果空间上以金融领域的简单示例来说明这一点。此外，我们表明不需要过滤的概念，因为仅基于当前问题状态（由于马尔可夫的财产而可能）的决策足以消除决策过程中的未来知识。

This note aims to provide a basic intuition on the concept of filtrations as used in the context of reinforcement learning (RL). Filtrations are often used to formally define RL problems, yet their implications might not be eminent for those without a background in measure theory. Essentially, a filtration is a construct that captures partial knowledge up to time $t$, without revealing any future information that has already been simulated, yet not revealed to the decision-maker. We illustrate this with simple examples from the finance domain on both discrete and continuous outcome spaces. Furthermore, we show that the notion of filtration is not needed, as basing decisions solely on the current problem state (which is possible due to the Markovian property) suffices to eliminate future knowledge from the decision-making process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题