绕过指数时间预处理：通过重量数据相关预处理的快速神经网络培训

论文标题

绕过指数时间预处理：通过重量数据相关预处理的快速神经网络培训

Bypass Exponential Time Preprocessing: Fast Neural Network Training via Weight-Data Correlation Preprocessing

论文作者

Alman, Josh, Liang, Jiehao, Song, Zhao, Zhang, Ruizhe, Zhuo, Danyang

论文摘要

在过去的十年中，深度神经网络改变了我们的社会，并且它们已经广泛应用于各种机器学习应用程序中。最先进的深度神经网络每年的规模越来越大，以实现增加模型的准确性，因此，模型培训会消耗大量的计算资源，并且将来只会消耗更多的计算资源。在每次迭代中，使用当前的培训方法来处理\ Mathbb {r}^d $在一层中的数据点$ x \，我们需要花费$θ（md）$时间来评估一层中的所有$ m $神经元。这意味着处理整个层需要$θ（NMD）$ $ n $数据点的时间。最近的工作[Song，Yang and Zhang，Neurips 2021]将这次的每次迭代减少到$ O（NMD）$，但需要指数时间才能进行数据或神经网络权重进行预处理，从而使实际使用情况不太可能。在这项工作中，我们提出了一种新的预处理方法，该方法仅将重量数据相关性存储在树数据结构中，以便快速，动态地检测到每次迭代时的哪些神经元。我们的方法仅需要$ o（NMD）$时间进行预处理时间，并且仍然可以实现$ O（NMD）$时间。我们对新算法进行了补充，并证明了假设复杂性理论的流行猜想，人们无法基本上加快我们的算法来动态检测射击神经元。

Over the last decade, deep neural networks have transformed our society, and they are already widely applied in various machine learning applications. State-of-art deep neural networks are becoming larger in size every year to deliver increasing model accuracy, and as a result, model training consumes substantial computing resources and will only consume more in the future. Using current training methods, in each iteration, to process a data point $x \in \mathbb{R}^d$ in a layer, we need to spend $Θ(md)$ time to evaluate all the $m$ neurons in the layer. This means processing the entire layer takes $Θ(nmd)$ time for $n$ data points. Recent work [Song, Yang and Zhang, NeurIPS 2021] reduces this time per iteration to $o(nmd)$, but requires exponential time to preprocess either the data or the neural network weights, making it unlikely to have practical usage. In this work, we present a new preprocessing method that simply stores the weight-data correlation in a tree data structure in order to quickly, dynamically detect which neurons fire at each iteration. Our method requires only $O(nmd)$ time in preprocessing and still achieves $o(nmd)$ time per iteration. We complement our new algorithm with a lower bound, proving that assuming a popular conjecture from complexity theory, one could not substantially speed up our algorithm for dynamic detection of firing neurons.

下载PDF全文

下载文献需遵守相关版权规定

论文标题