Lagrangian Q功能学习的方法（应用机器翻译应用）

论文标题

Lagrangian Q功能学习的方法（应用机器翻译应用）

Lagrangian Method for Q-Function Learning (with Applications to Machine Translation)

论文作者

Bojun, Huang

论文摘要

本文讨论了学习最佳Q功能的基本问题的新方法。在这种方法中，最佳Q函数被配制为源自经典Bellman最优方程的非线性拉格朗日函数的鞍点。该论文表明，尽管非线性具有非线性，但Lagrangian具有强大的双重性，这为Lagrangian通用方法铺平了Q功能学习的道路。作为演示，本文根据二元理论开发了模仿学习算法，并将算法应用于最先进的机器翻译基准。然后，该论文转弯以证明对拉格朗日鞍点的最优性的对称破坏现象，这证明了开发拉格朗日方法的很大程度上被忽视的方向。

This paper discusses a new approach to the fundamental problem of learning optimal Q-functions. In this approach, optimal Q-functions are formulated as saddle points of a nonlinear Lagrangian function derived from the classic Bellman optimality equation. The paper shows that the Lagrangian enjoys strong duality, in spite of its nonlinearity, which paves the way to a general Lagrangian method to Q-function learning. As a demonstration, the paper develops an imitation learning algorithm based on the duality theory, and applies the algorithm to a state-of-the-art machine translation benchmark. The paper then turns to demonstrate a symmetry breaking phenomenon regarding the optimality of the Lagrangian saddle points, which justifies a largely overlooked direction in developing the Lagrangian method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题