论文标题
变体并行性:用于物联网设备分布推理的轻量级深度卷积模型
Variant Parallelism: Lightweight Deep Convolutional Models for Distributed Inference on IoT Devices
论文作者
论文摘要
当在资源约束的IoT设备上分布模型时,通常用于满足实时推理局限性:(1)模型并行性(MP)和(2)类并行性(CP)。在MP中,设备之间传输笨重的中间数据(比输入大的数量级)会施加巨大的通信开销。尽管CP解决了此问题,但它对子模型的数量有局限性。此外,两种解决方案都是故障不耐受的,这是部署在边缘设备上的问题。我们提出了一种基于合奏的深度学习分布方法变体并行性(VP),其中生成了主模型的不同变体,并且可以在单独的计算机上部署。我们在原始型号周围设计了一个较轻的模型家庭,并同时训练它们以提高单个型号的准确性。我们对六个常见的中型对象识别数据集的实验结果表明,与MobileNetV2相比,与MobileNetV2相比,我们的模型的参数可能少5.8-7.1倍,少4.3-31倍,多重蓄能(MAC)和2.5-13.2倍的响应时间减少2.5-13.2倍。我们的技术很容易生成基本体系结构的几种变体。每个变体仅返回2K输出1 <= k <=(#class/2),代表顶级类,而不是MP中所需的浮点值。由于每个变体都提供了一个全级预测,因此在存在故障的情况下,与MP和CP相比,我们的方法保持较高的可用性。
Two major techniques are commonly used to meet real-time inference limitations when distributing models across resource-constrained IoT devices: (1) model parallelism (MP) and (2) class parallelism (CP). In MP, transmitting bulky intermediate data (orders of magnitude larger than input) between devices imposes huge communication overhead. Although CP solves this problem, it has limitations on the number of sub-models. In addition, both solutions are fault intolerant, an issue when deployed on edge devices. We propose variant parallelism (VP), an ensemble-based deep learning distribution method where different variants of a main model are generated and can be deployed on separate machines. We design a family of lighter models around the original model, and train them simultaneously to improve accuracy over single models. Our experimental results on six common mid-sized object recognition datasets demonstrate that our models can have 5.8-7.1x fewer parameters, 4.3-31x fewer multiply-accumulations (MACs), and 2.5-13.2x less response time on atomic inputs compared to MobileNetV2 while achieving comparable or higher accuracy. Our technique easily generates several variants of the base architecture. Each variant returns only 2k outputs 1 <= k <= (#classes/2), representing Top-k classes, instead of tons of floating point values required in MP. Since each variant provides a full-class prediction, our approach maintains higher availability compared with MP and CP in presence of failure.