论文标题

关于转移学习的概括:信息理论分析

On the Generalization for Transfer Learning: An Information-Theoretic Analysis

论文作者

Wu, Xuetong, Manton, Jonathan H., Aickelin, Uwe, Zhu, Jingge

论文摘要

转移学习或域的适应性与机器学习问题有关,其中培训和测试数据可能来自可能不同的概率分布。在这项工作中,我们对转移学习算法的概括错误和过多风险进行信息理论分析。我们的结果也许表明,Kullback-Leibler(KL)Divergence $ d(μ\ |μ')$在$μ$和$μ'$的特征中起着重要作用,其中分别表示培训数据的分布和测试数据。具体而言,我们为学习算法提供了概括错误和多余的风险上限,其中两个分布都可以在训练阶段可用。认识到界限一般可以是最佳的,我们通过通过\ textit {中心条件}做出更强的假设,为某种类别的算法(包括经验风险最小化(ERM)算法)提供了改善的多余风险上限。为了证明边界的有用性,我们将分析进一步扩展到Gibbs算法和嘈杂的随机梯度下降方法。然后,我们将与其他差异(例如$ ϕ $ divergence and Wasserstein距离)绑定的互助信息概括,这可能会导致更紧密的界限,并且在相对于$μ'$的$μ$不是绝对连续的情况下,可以处理案例。提供了几种数值结果来证明我们的理论发现。最后,为了解决由于缺乏数据分布知识而在实践中通常不直接适用的问题,我们开发了一种算法(称为Infoboost),该算法基于某些信息测量方法,动态调整了源数据和目标数据的重要性权重。经验结果表明了所提出的算法的有效性。

Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different probability distributions. In this work, we give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms. Our results suggest, perhaps as expected, that the Kullback-Leibler (KL) divergence $D(μ\|μ')$ plays an important role in the characterizations where $μ$ and $μ'$ denote the distribution of the training data and the testing data, respectively. Specifically, we provide generalization error and excess risk upper bounds for learning algorithms where data from both distributions are available in the training phase. Recognizing that the bounds could be sub-optimal in general, we provide improved excess risk upper bounds for a certain class of algorithms, including the empirical risk minimization (ERM) algorithm, by making stronger assumptions through the \textit{central condition}. To demonstrate the usefulness of the bounds, we further extend the analysis to the Gibbs algorithm and the noisy stochastic gradient descent method. We then generalize the mutual information bound with other divergences such as $ϕ$-divergence and Wasserstein distance, which may lead to tighter bounds and can handle the case when $μ$ is not absolutely continuous with respect to $μ'$. Several numerical results are provided to demonstrate our theoretical findings. Lastly, to address the problem that the bounds are often not directly applicable in practice due to the absence of the distributional knowledge of the data, we develop an algorithm (called InfoBoost) that dynamically adjusts the importance weights for both source and target data based on certain information measures. The empirical results show the effectiveness of the proposed algorithm.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源