分布式深度学习推理加速器使用边缘计算中的无缝协作

论文标题

分布式深度学习推理加速器使用边缘计算中的无缝协作

Distributed Deep Learning Inference Acceleration using Seamless Collaboration in Edge Computing

论文作者

Li, Nan, Iosifidis, Alexandros, Zhang, Qi

论文摘要

本文在协作边缘计算中使用分布式卷积神经网络（CNN）研究推断加速度。为了确保推理任务分配的推断准确性，我们在执行基于细分的分区时考虑接受场。为了最大化通信和计算过程之间的并行化，从而最大程度地降低了推理任务的总推理时间，我们设计了一种新颖的任务协作方案，在该方案中，在该方案中，在辅助边缘服务器（ESS）上执行了在主机ES上执行子任务的重叠区域，称为HALP。我们进一步将HALP扩展到了多个任务的情况。实验结果表明，对于单个任务，HALP可以加速VGG-16中的CNN推断1.7-2.0倍，而在GTX 1080TI和Jetson Agx Xavier上，每批任务的4个任务为1.7-1.8倍，该任务超过了最新的工作模式。此外，我们评估了在时间变化渠道下的服务可靠性，这表明HALP是通过严格的服务截止日期确保高服务可靠性的有效解决方案。

This paper studies inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing. To ensure inference accuracy in inference task partitioning, we consider the receptive-field when performing segment-based partitioning. To maximize the parallelization between the communication and computing processes, thereby minimizing the total inference time of an inference task, we design a novel task collaboration scheme in which the overlapping zone of the sub-tasks on secondary edge servers (ESs) is executed on the host ES, named as HALP. We further extend HALP to the scenario of multiple tasks. Experimental results show that HALP can accelerate CNN inference in VGG-16 by 1.7-2.0x for a single task and 1.7-1.8x for 4 tasks per batch on GTX 1080TI and JETSON AGX Xavier, which outperforms the state-of-the-art work MoDNN. Moreover, we evaluate the service reliability under time-variant channel, which shows that HALP is an effective solution to ensure high service reliability with strict service deadline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题