失重的神经网络可有效地推理

论文标题

失重的神经网络可有效地推理

Weightless Neural Networks for Efficient Edge Inference

论文作者

Susskind, Zachary, Arora, Aman, Miranda, Igor Dantas Dos Santos, Villon, Luis Armando Quintanilla, Katopodis, Rafael Fontella, de Araujo, Leandro Santiago, Dutra, Diego Leonel Cadette, Lima, Priscila Machado Vieira, Franca, Felipe Maia Galvao, Breternitz Jr., Mauricio, John, Lizy K.

论文摘要

失重的神经网络（WNN）是一类机器学习模型，使用表格查找来执行推理。这与使用多重功能操作的深神经网络（DNN）相反。最先进的WNN体系结构具有DNN的实现成本的一小部分，但仍落后于公共图像识别任务的准确性。此外，许多现有的WNN体系结构都遭受了高内存要求。在本文中，我们提出了一种新颖的WNN体系结构Bthowen，对先前工作进行了关键的算法和建筑改进，即计数Bloom过滤器，硬件友好的哈希和基于高斯的非线性温度计编码，以提高模型准确性并降低面积和能源消耗。 Bthowen通过为可比较的量化DNN提供出色的潜伏期和能源效率来针对大型且不断增长的边缘计算部门。与九个分类数据集的最先进的WNN相比，平均而言，Bthowen的误差降低了40％以上，模型大小降低了50％以上。然后，我们通过呈现基于FPGA的加速器来证明Bthowen架构的生存能力，并将其延迟和资源用法与类似准确的量化DNN加速器进行比较，包括多层感知器（MLP）和卷积模型。拟议的Bthowen模型的消耗少于MLP模型的能量近80％，潜伏期降低了近85％。为了寻求优势有效的ML，WNN显然值得额外关注。

Weightless Neural Networks (WNNs) are a class of machine learning model which use table lookups to perform inference. This is in contrast with Deep Neural Networks (DNNs), which use multiply-accumulate operations. State-of-the-art WNN architectures have a fraction of the implementation cost of DNNs, but still lag behind them on accuracy for common image recognition tasks. Additionally, many existing WNN architectures suffer from high memory requirements. In this paper, we propose a novel WNN architecture, BTHOWeN, with key algorithmic and architectural improvements over prior work, namely counting Bloom filters, hardware-friendly hashing, and Gaussian-based nonlinear thermometer encodings to improve model accuracy and reduce area and energy consumption. BTHOWeN targets the large and growing edge computing sector by providing superior latency and energy efficiency to comparable quantized DNNs. Compared to state-of-the-art WNNs across nine classification datasets, BTHOWeN on average reduces error by more than than 40% and model size by more than 50%. We then demonstrate the viability of the BTHOWeN architecture by presenting an FPGA-based accelerator, and compare its latency and resource usage against similarly accurate quantized DNN accelerators, including Multi-Layer Perceptron (MLP) and convolutional models. The proposed BTHOWeN models consume almost 80% less energy than the MLP models, with nearly 85% reduction in latency. In our quest for efficient ML on the edge, WNNs are clearly deserving of additional attention.

下载PDF全文

下载文献需遵守相关版权规定

论文标题