论文标题
通过随机梯度下降的在线决策的统计推断
Statistical Inference for Online Decision Making via Stochastic Gradient Descent
论文作者
论文摘要
在线决策旨在通过做出个性化决策并递归更新决策规则来学习最佳决策规则。在大数据的帮助下,它变得比以前更容易,但是新的挑战也会出现。由于应每步更新决策规则,因此使用所有历史数据的脱机更新在计算和存储方面效率低下。为此,我们提出了一种完全在线算法,可以通过随机梯度下降在线更新决策规则。它不仅有效,而且还支持各种参数奖励模型。为了关注在线决策的统计推断,我们建立了由我们的算法产生的参数估计值的渐近正态性和我们用来估计最佳价值的在线反向概率加权值估计器。还提供了有关参数和值估计器方差的在线插件估计器,并证明是一致的,因此使用我们的方法可以使用间隔估计和假设测试。拟议的算法和理论结果通过模拟和新闻文章建议的真实数据应用进行测试。
Online decision making aims to learn the optimal decision rule by making personalized decisions and updating the decision rule recursively. It has become easier than before with the help of big data, but new challenges also come along. Since the decision rule should be updated once per step, an offline update which uses all the historical data is inefficient in computation and storage. To this end, we propose a completely online algorithm that can make decisions and update the decision rule online via stochastic gradient descent. It is not only efficient but also supports all kinds of parametric reward models. Focusing on the statistical inference of online decision making, we establish the asymptotic normality of the parameter estimator produced by our algorithm and the online inverse probability weighted value estimator we used to estimate the optimal value. Online plugin estimators for the variance of the parameter and value estimators are also provided and shown to be consistent, so that interval estimation and hypothesis test are possible using our method. The proposed algorithm and theoretical results are tested by simulations and a real data application to news article recommendation.