时间限制的学习

论文标题

时间限制的学习

Time-Constrained Learning

论文作者

Filho, Sergio, Laber, Eduardo, Lazera, Pedro, Molinaro, Marco

论文摘要

考虑一个场景，我们有一个巨大的标签数据集$ {\ cal d} $，以及使用$ {\ cal d} $训练一些给定学习者的有限时间。由于我们可能无法使用整个数据集，我们应该如何进行？这种性质的问题激发了时间约束的学习任务（TCL）的定义：给定一个数据集$ {\ cal d} $从未知分布的$μ$，一个学习者$ {\ cal l} $和时间限制$ t $采样，目标是以最高$ t $ t $ t $ t $ t $ t $ t $ t $获得。至$μ$，在可以使用dataset $ {\ cal d} $的$ {\ cal l} $构建的$中。我们提出了TCT，这是一种基于机器教学原理设计的TCL任务的算法。我们提出了一项实验研究，其中涉及5个不同的学习者和20个数据集，其中我们表明TCT始终胜过另外两种算法：第一个是在[Dasgupta等人，ICML 19]中提出的黑盒学习者的老师，第二个是自然适应TCL设置的随机采样。我们还将TCT与随机梯度下降训练进行了比较 - 我们的方法再次持续更好。尽管我们的工作主要是实用的，但我们还表明，TCT的剥离版本可证明保证。在合理的假设下，我们的算法达到一定准确性所需的时间永远不会比批处理老师（发送一批示例）实现相似准确性的时间更大，在某些情况下，它几乎要好得多。

Consider a scenario in which we have a huge labeled dataset ${\cal D}$ and a limited time to train some given learner using ${\cal D}$. Since we may not be able to use the whole dataset, how should we proceed? Questions of this nature motivate the definition of the Time-Constrained Learning Task (TCL): Given a dataset ${\cal D}$ sampled from an unknown distribution $μ$, a learner ${\cal L}$ and a time limit $T$, the goal is to obtain in at most $T$ units of time the classification model with highest possible accuracy w.r.t. to $μ$, among those that can be built by ${\cal L}$ using the dataset ${\cal D}$. We propose TCT, an algorithm for the TCL task designed based that on principles from Machine Teaching. We present an experimental study involving 5 different Learners and 20 datasets where we show that TCT consistently outperforms two other algorithms: the first is a Teacher for black-box learners proposed in [Dasgupta et al., ICML 19] and the second is a natural adaptation of random sampling for the TCL setting. We also compare TCT with Stochastic Gradient Descent training -- our method is again consistently better. While our work is primarily practical, we also show that a stripped-down version of TCT has provable guarantees. Under reasonable assumptions, the time our algorithm takes to achieve a certain accuracy is never much bigger than the time it takes the batch teacher (which sends a single batch of examples) to achieve similar accuracy, and in some case it is almost exponentially better.

下载PDF全文

下载文献需遵守相关版权规定

论文标题