论文标题
近似数据删除的算法:新结果和局限性
Algorithms that Approximate Data Removal: New Results and Limitations
论文作者
论文摘要
我们研究了使用经验风险最小化训练的机器学习模型中删除用户数据的问题。我们的重点是学习算法,这些算法返回经验风险最小化的算法和符合符合流式传输缩写的删除请求的近似学习算法。利用Infintesimal Jacknife,我们开发了一种在线学习算法,既计算又有效。与先前的记忆有效学习算法不同,我们针对的是使用非平滑正规化器(例如常用的$ \ ell_1 $,弹性网或核定标准惩罚)最小化目标的模型。我们还提供与最先进的方法一致的概括,删除能力和未学习保证。在各种基准数据集中,我们的算法在先前方法的运行时间上有所改善,同时保持相同的内存需求并测试准确性。最后,我们通过证明到目前为止引入的所有近似近似学习算法在问题设置中未能在常见的超参数调谐方法(例如交叉验证)中使用的所有近似近似学习算法来打开新的询问方向。
We study the problem of deleting user data from machine learning models trained using empirical risk minimization. Our focus is on learning algorithms which return the empirical risk minimizer and approximate unlearning algorithms that comply with deletion requests that come streaming minibatches. Leveraging the infintesimal jacknife, we develop an online unlearning algorithm that is both computationally and memory efficient. Unlike prior memory efficient unlearning algorithms, we target models that minimize objectives with non-smooth regularizers, such as the commonly used $\ell_1$, elastic net, or nuclear norm penalties. We also provide generalization, deletion capacity, and unlearning guarantees that are consistent with state of the art methods. Across a variety of benchmark datasets, our algorithm empirically improves upon the runtime of prior methods while maintaining the same memory requirements and test accuracy. Finally, we open a new direction of inquiry by proving that all approximate unlearning algorithms introduced so far fail to unlearn in problem settings where common hyperparameter tuning methods, such as cross-validation, have been used to select models.