使用加强学习解决不确定性下的月球着陆器问题

论文标题

使用加强学习解决不确定性下的月球着陆器问题

Solving The Lunar Lander Problem under Uncertainty using Reinforcement Learning

论文作者

Gadgil, Soham, Xin, Yunfeng, Xu, Chengzhe

论文摘要

强化学习（RL）是机器学习的领域，与使代理商能够以不确定性为导航环境，以最大程度地提高一些累积的长期奖励概念。在本文中，我们在Openai Gym的Lunarlander-V2环境中实施和分析了两种不同的RL技术，SARSA和DEEP QLEALNING。然后，我们引入了原始问题的其他不确定性，以测试上述技术的鲁棒性。借助我们最好的模型，我们能够使用SARSA代理商获得170+的平均奖励，而对于原始问题的深度Q学习代理，我们就可以实现170+的平均奖励。我们还表明，这些技术能够克服额外的不确定性，并获得两种代理的正平均奖励。然后，我们对两种技术进行比较分析，以结论哪种代理更好。

Reinforcement Learning (RL) is an area of machine learning concerned with enabling an agent to navigate an environment with uncertainty in order to maximize some notion of cumulative long-term reward. In this paper, we implement and analyze two different RL techniques, Sarsa and Deep QLearning, on OpenAI Gym's LunarLander-v2 environment. We then introduce additional uncertainty to the original problem to test the robustness of the mentioned techniques. With our best models, we are able to achieve average rewards of 170+ with the Sarsa agent and 200+ with the Deep Q-Learning agent on the original problem. We also show that these techniques are able to overcome the additional uncertainities and achieve positive average rewards of 100+ with both agents. We then perform a comparative analysis of the two techniques to conclude which agent peforms better.

下载PDF全文

下载文献需遵守相关版权规定

论文标题