论文标题

深层神经网络的记忆计划

Memory Planning for Deep Neural Networks

论文作者

Levental, Maksim

论文摘要

在推理期间,在大规模系统的背景下,我们研究了DNN中的记忆分配模式。我们观察到,在多线程的上下文中,由于\ texttt {mutex}在系统内存分配器中的争论,这种内存分配模式受到了较高的延迟。由于这种\ texttt {mutex}引起的潜伏期在面向用户的服务中产生不良瓶颈。因此,我们提出了一种基于“记忆”的技术,\ texttt {memomalloc},用于优化整体延迟,仅峰值内存使用中等增加。具体而言,我们的技术由一个运行时组件组成,该组件捕获了所有分配,并将它们与其高级源操作和静态分析组件相关联,该组件构建了有效的分配“计划”。我们在Pytorch深度学习框架中介绍了\ texttt {memomalloc}的实现,并评估了广泛的DNN体系结构上的内存消耗和执行性能。我们发现,\ texttt {memomalloc}优于最先进的通用内存分配器,就DNN推断潜伏期而言,高达40 \%。

We study memory allocation patterns in DNNs during inference, in the context of large-scale systems. We observe that such memory allocation patterns, in the context of multi-threading, are subject to high latencies, due to \texttt{mutex} contention in the system memory allocator. Latencies incurred due to such \texttt{mutex} contention produce undesirable bottlenecks in user-facing services. Thus, we propose a "memorization" based technique, \texttt{MemoMalloc}, for optimizing overall latency, with only moderate increases in peak memory usage. Specifically, our technique consists of a runtime component, which captures all allocations and uniquely associates them with their high-level source operation, and a static analysis component, which constructs an efficient allocation "plan". We present an implementation of \texttt{MemoMalloc} in the PyTorch deep learning framework and evaluate memory consumption and execution performance on a wide range of DNN architectures. We find that \texttt{MemoMalloc} outperforms state-of-the-art general purpose memory allocators, with respect to DNN inference latency, by as much as 40\%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源