论文标题
通过在边缘上的合奏来分散的低延迟协作推断
Decentralized Low-Latency Collaborative Inference via Ensembles on the Edge
论文作者
论文摘要
深神经网络(DNN)的成功在很大程度上取决于计算资源。虽然DNN通常在云服务器上使用,但在边缘设备上操作DNN的需求越来越大。边缘设备的计算资源通常受到限制,但是,通常将多个边缘设备部署在相同的环境中,并且可以相互可靠地通信。在这项工作中,我们建议通过允许多个用户在推理过程中协作以提高其准确性来促进DNN在优势上的应用。我们的机制(创造的{\ em Edge emembles})基于每个设备的各种预测变量,在推理过程中构成了模型集合。为了减轻通信开销,用户共享量化的功能,我们提出了一种将多个决策汇总到单个推理规则中的方法。我们分析了边缘合奏所引起的延迟,表明其性能提高是以在通信网络上共同假设下的较小延迟成本的代价。我们的实验表明,配备配备紧凑型DNN的Edge合奏的协作推断显着提高了让每个用户在本地推断的精度,并且可以使用比整体中所有网络更大的单个集中式DNN均优于球。
The success of deep neural networks (DNNs) is heavily dependent on computational resources. While DNNs are often employed on cloud servers, there is a growing need to operate DNNs on edge devices. Edge devices are typically limited in their computational resources, yet, often multiple edge devices are deployed in the same environment and can reliably communicate with each other. In this work we propose to facilitate the application of DNNs on the edge by allowing multiple users to collaborate during inference to improve their accuracy. Our mechanism, coined {\em edge ensembles}, is based on having diverse predictors at each device, which form an ensemble of models during inference. To mitigate the communication overhead, the users share quantized features, and we propose a method for aggregating multiple decisions into a single inference rule. We analyze the latency induced by edge ensembles, showing that its performance improvement comes at the cost of a minor additional delay under common assumptions on the communication network. Our experiments demonstrate that collaborative inference via edge ensembles equipped with compact DNNs substantially improves the accuracy over having each user infer locally, and can outperform using a single centralized DNN larger than all the networks in the ensemble together.