G-Learner和Girl：基于进球的财富管理和增强学习

论文标题

G-Learner和Girl：基于进球的财富管理和增强学习

G-Learner and GIRL: Goal Based Wealth Management with Reinforcement Learning

论文作者

Dixon, Matthew, Halperin, Igor

论文摘要

我们为基于目标的财富管理问题提供了强化学习方法，例如优化退休计划或目标过时的资金。在此类问题中，投资者试图通过在被雇用的同时对投资组合进行定期投资来实现财务目标，并在退休时定期从帐户中汲取灵感，此外还可以通过销售和购买不同的资产（例如股票）来重新平衡投资组合。我们不依靠消费效用，而是提出了G-Learner：具有明确定义的一步奖励的强化学习算法，不假定数据生成过程，并且适用于嘈杂的数据。我们的方法基于G学习 - Q学习方法的概率扩展。在本文中，我们演示了G-Learning在应用于二次奖励和高斯参考策略时如何产生熵调节的线性二次调节器（LQR）。这种关键的见解为财富管理任务提供了一种新颖且具有计算上的可处理工具，可扩展到高维投资组合。除了解决G学习的直接问题外，我们还提出了一种新算法，该算法将我们的基于目标的G学习方法扩展到了逆增强学习（IRL）的设置，在其中未观察到代理商收集的奖励，应推断出代理商的奖励。我们证明，女孩可以成功地学习G-Gearner代理的奖励参数，从而模仿其行为。最后，我们讨论了G-Learner和Girl算法在财富管理和机器人审核中的潜在应用。

We present a reinforcement learning approach to goal based wealth management problems such as optimization of retirement plans or target dated funds. In such problems, an investor seeks to achieve a financial goal by making periodic investments in the portfolio while being employed, and periodically draws from the account when in retirement, in addition to the ability to re-balance the portfolio by selling and buying different assets (e.g. stocks). Instead of relying on a utility of consumption, we present G-Learner: a reinforcement learning algorithm that operates with explicitly defined one-step rewards, does not assume a data generation process, and is suitable for noisy data. Our approach is based on G-learning - a probabilistic extension of the Q-learning method of reinforcement learning. In this paper, we demonstrate how G-learning, when applied to a quadratic reward and Gaussian reference policy, gives an entropy-regulated Linear Quadratic Regulator (LQR). This critical insight provides a novel and computationally tractable tool for wealth management tasks which scales to high dimensional portfolios. In addition to the solution of the direct problem of G-learning, we also present a new algorithm, GIRL, that extends our goal-based G-learning approach to the setting of Inverse Reinforcement Learning (IRL) where rewards collected by the agent are not observed, and should instead be inferred. We demonstrate that GIRL can successfully learn the reward parameters of a G-Learner agent and thus imitate its behavior. Finally, we discuss potential applications of the G-Learner and GIRL algorithms for wealth management and robo-advising.

下载PDF全文

下载文献需遵守相关版权规定

论文标题