弥合Markowitz规划与深度强化学习之间的差距

论文标题

弥合Markowitz规划与深度强化学习之间的差距

Bridging the gap between Markowitz planning and deep reinforcement learning

论文作者

Benhamou, Eric, Saltiel, David, Ungari, Sandrine, Mukhopadhyay, Abhishek

论文摘要

While researchers in the asset management industry have mostly focused on techniques based on financial and risk planning techniques like Markowitz efficient frontier, minimum variance, maximum diversification or equal risk parity, in parallel, another community in machine learning has started working on reinforcement learning and more particularly deep reinforcement learning to solve other decision making problems for challenging task like autonomous driving, robot learning, and on a more conceptual side games solving like Go.本文的目的是通过显示深入的加强学习（DRL）技术来弥合这两种方法之间的差距，这可以使投资组合分配的新灯光归功于更一般的优化设置，该设置将投资组合分配作为最佳控制问题，而不仅仅是单步优化，而是一个单步优化，而是通过延迟的奖励进行了连续的控制。优点很多：（i）DRL地图直接通过设计向行动推销条件，因此应适应不断变化的环境，（ii）DRL不依赖任何传统的财务风险假设（例如这种风险）以差异为代表，（iii）DRL可以包含其他数据，并且是一种多投入方法，而不是更传统的优化方法。我们使用卷积网络进行了一些令人鼓舞的结果。

While researchers in the asset management industry have mostly focused on techniques based on financial and risk planning techniques like Markowitz efficient frontier, minimum variance, maximum diversification or equal risk parity, in parallel, another community in machine learning has started working on reinforcement learning and more particularly deep reinforcement learning to solve other decision making problems for challenging task like autonomous driving, robot learning, and on a more conceptual side games solving like Go. This paper aims to bridge the gap between these two approaches by showing Deep Reinforcement Learning (DRL) techniques can shed new lights on portfolio allocation thanks to a more general optimization setting that casts portfolio allocation as an optimal control problem that is not just a one-step optimization, but rather a continuous control optimization with a delayed reward. The advantages are numerous: (i) DRL maps directly market conditions to actions by design and hence should adapt to changing environment, (ii) DRL does not rely on any traditional financial risk assumptions like that risk is represented by variance, (iii) DRL can incorporate additional data and be a multi inputs method as opposed to more traditional optimization methods. We present on an experiment some encouraging results using convolution networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题