深层神经网络的记忆计划

论文标题

深层神经网络的记忆计划

Memory Planning for Deep Neural Networks

论文作者

Levental, Maksim

论文摘要

在推理期间，在大规模系统的背景下，我们研究了DNN中的记忆分配模式。我们观察到，在多线程的上下文中，由于\ texttt {mutex}在系统内存分配器中的争论，这种内存分配模式受到了较高的延迟。由于这种\ texttt {mutex}引起的潜伏期在面向用户的服务中产生不良瓶颈。因此，我们提出了一种基于“记忆”的技术，\ texttt {memomalloc}，用于优化整体延迟，仅峰值内存使用中等增加。具体而言，我们的技术由一个运行时组件组成，该组件捕获了所有分配，并将它们与其高级源操作和静态分析组件相关联，该组件构建了有效的分配“计划”。我们在Pytorch深度学习框架中介绍了\ texttt {memomalloc}的实现，并评估了广泛的DNN体系结构上的内存消耗和执行性能。我们发现，\ texttt {memomalloc}优于最先进的通用内存分配器，就DNN推断潜伏期而言，高达40 \％。

We study memory allocation patterns in DNNs during inference, in the context of large-scale systems. We observe that such memory allocation patterns, in the context of multi-threading, are subject to high latencies, due to \texttt{mutex} contention in the system memory allocator. Latencies incurred due to such \texttt{mutex} contention produce undesirable bottlenecks in user-facing services. Thus, we propose a "memorization" based technique, \texttt{MemoMalloc}, for optimizing overall latency, with only moderate increases in peak memory usage. Specifically, our technique consists of a runtime component, which captures all allocations and uniquely associates them with their high-level source operation, and a static analysis component, which constructs an efficient allocation "plan". We present an implementation of \texttt{MemoMalloc} in the PyTorch deep learning framework and evaluate memory consumption and execution performance on a wide range of DNN architectures. We find that \texttt{MemoMalloc} outperforms state-of-the-art general purpose memory allocators, with respect to DNN inference latency, by as much as 40\%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题