论文标题
具有错误修复的自更新模型
Self-Updating Models with Error Remediation
论文作者
论文摘要
当前,许多环境使用机器学习模型进行数据处理和分析,这些模型是使用有限数量的培训数据点构建的。部署后,这些模型将暴露于大量以前的数据,并不是所有这些都代表了原始的有限培训数据。但是,由于后勤,带宽,时间,硬件和/或数据灵敏度约束,更新这些已部署的模型可能很困难。我们提出了一个具有错误修复(Sumer)的框架,自我更新的模型,其中部署的模型随着新数据的可用而自行更新。苏默(Sumer)使用半监督学习和噪声修复到迭代地重新训练模型的技术,该技术使用该模型中的智能选择预测作为新训练迭代的标签。苏默尔的一个关键组成部分是误差修复的概念,因为自标记的数据可能容易受到错误的传播。我们研究了Sumer在各种数据集和迭代中的使用。我们发现,自我更新模型(总和)的性能通常要比在提供其他以前未见数据时尝试自我更新的模型更好。在只有有限数量的初始培训数据的情况下,这种性能差距会突出。我们还发现,苏默尔的性能通常比总和的性能要好,这表明在应用错误修复方面有好处。因此,Sumer可以通过在动态环境中智能更新模型来自主增强现有数据处理系统的运行能力。
Many environments currently employ machine learning models for data processing and analytics that were built using a limited number of training data points. Once deployed, the models are exposed to significant amounts of previously-unseen data, not all of which is representative of the original, limited training data. However, updating these deployed models can be difficult due to logistical, bandwidth, time, hardware, and/or data sensitivity constraints. We propose a framework, Self-Updating Models with Error Remediation (SUMER), in which a deployed model updates itself as new data becomes available. SUMER uses techniques from semi-supervised learning and noise remediation to iteratively retrain a deployed model using intelligently-chosen predictions from the model as the labels for new training iterations. A key component of SUMER is the notion of error remediation as self-labeled data can be susceptible to the propagation of errors. We investigate the use of SUMER across various data sets and iterations. We find that self-updating models (SUMs) generally perform better than models that do not attempt to self-update when presented with additional previously-unseen data. This performance gap is accentuated in cases where there is only limited amounts of initial training data. We also find that the performance of SUMER is generally better than the performance of SUMs, demonstrating a benefit in applying error remediation. Consequently, SUMER can autonomously enhance the operational capabilities of existing data processing systems by intelligently updating models in dynamic environments.