增强学习的面部反馈：使用Tamer框架的案例研究和离线分析

论文标题

增强学习的面部反馈：使用Tamer框架的案例研究和离线分析

Facial Feedback for Reinforcement Learning: A Case Study and Offline Analysis Using the TAMER Framework

论文作者

Li, Guangliang, Dibeklioğlu, Hamdi, Whiteson, Shimon, Hung, Hayley

论文摘要

交互式加强学习为代理商提供了一种学习从人用户提供的评估反馈中解决任务的方法。先前的研究表明，人类在培训的早期就会提供大量的反馈，但此后非常稀少。在本文中，我们通过将其解释为评估反馈来研究代理商从培训者的面部表情中学习的潜力。为此，我们实施了Tamer，这是一种流行的交互式增强学习方法，它在强化学习的基准问题中 - 无限马里奥，并进行了首次对涉及561名参与者的驯服者的大规模研究。通过设计的CNN-RNN模型，我们的分析表明，告诉培训师使用面部表情和竞争可以提高使用面部表情估算正面和负面反馈的精度。此外，我们通过模拟实验的结果表明，仅从面部表情的预测反馈中学习是可能的，并且使用强/有效的预测模型或回归方法，面部响应将显着改善代理的性能。此外，我们的实验还支持以前的研究，证明了双向反馈和竞争性元素在训练界面中的重要性。

Interactive reinforcement learning provides a way for agents to learn to solve tasks from evaluative feedback provided by a human user. Previous research showed that humans give copious feedback early in training but very sparsely thereafter. In this article, we investigate the potential of agent learning from trainers' facial expressions via interpreting them as evaluative feedback. To do so, we implemented TAMER which is a popular interactive reinforcement learning method in a reinforcement-learning benchmark problem --- Infinite Mario, and conducted the first large-scale study of TAMER involving 561 participants. With designed CNN-RNN model, our analysis shows that telling trainers to use facial expressions and competition can improve the accuracies for estimating positive and negative feedback using facial expressions. In addition, our results with a simulation experiment show that learning solely from predicted feedback based on facial expressions is possible and using strong/effective prediction models or a regression method, facial responses would significantly improve the performance of agents. Furthermore, our experiment supports previous studies demonstrating the importance of bi-directional feedback and competitive elements in the training interface.

下载PDF全文

下载文献需遵守相关版权规定

论文标题