将硬件构建块暴露于机器学习框架

论文标题

将硬件构建块暴露于机器学习框架

Exposing Hardware Building Blocks to Machine Learning Frameworks

论文作者

Akhauri, Yash

论文摘要

有很多应用程序需要高通量和低潜伏期算法利用机器学习方法。在从开发基于神经网络的前驱逐前的行业中，可以看到对实时处理的需求，以增强移动宽带的前延伸器到设计基于FPGA的触发器，以在CERN为粒子物理学的主要科学努力中。在本论文中，我们探讨了如果我们将神经元视为$ f：b^{i} \ rightArrow b^{o} $的独特布尔函数，则如何将利基域域受益匪浅。我们专注于如何设计与这种神经元观点的拓扑结构，如何自动化这种神经网络设计策略以及在Xilinx FPGA上推断此类网络的策略。当设计将神经元视为独特的布尔功能的拓扑时，会产生主要的硬件约束。从根本上讲，在硬件上实现此类拓扑，这对神经元的“风扇”位严格限制，因为输入位长度的每次增量都可能增加了排列的一倍。我们通过探索实现稀疏性和探索激活量化的不同方法来解决此限制。此外，我们开发了一个支持培训具有自定义稀疏性和量化的神经网络的库。该库还支持训练有素的稀疏量化网络从Pytorch转换为Verilog代码，然后使用Vivado合成，所有这些网络都是Logicnet Tool-Flow的一部分。为了帮助更快的原型制作，我们还支持计算任何给定拓扑的最差硬件成本。我们希望我们对极稀疏的量化神经网络的行为的见解对研究界有用，并通过扩展使人们使用逻辑设计流来部署高效的神经网络。

There are a plethora of applications that demand high throughput and low latency algorithms leveraging machine learning methods. This need for real time processing can be seen in industries ranging from developing neural network based pre-distortors for enhanced mobile broadband to designing FPGA-based triggers in major scientific efforts by CERN for particle physics. In this thesis, we explore how niche domains can benefit vastly if we look at neurons as a unique boolean function of the form $f:B^{I} \rightarrow B^{O}$, where $B = \{0,1\}$. We focus on how to design topologies that complement such a view of neurons, how to automate such a strategy of neural network design, and inference of such networks on Xilinx FPGAs. Major hardware borne constraints arise when designing topologies that view neurons as unique boolean functions. Fundamentally, realizing such topologies on hardware asserts a strict limit on the 'fan-in' bits of a neuron due to the doubling of permutations possible with every increment in input bit-length. We address this limit by exploring different methods of implementing sparsity and explore activation quantization. Further, we develop a library that supports training a neural network with custom sparsity and quantization. This library also supports conversion of trained Sparse Quantized networks from PyTorch to VERILOG code which is then synthesized using Vivado, all of which is part of the LogicNet tool-flow. To aid faster prototyping, we also support calculation of the worst-case hardware cost of any given topology. We hope that our insights into the behavior of extremely sparse quantized neural networks are of use to the research community and by extension allow people to use the LogicNet design flow to deploy highly efficient neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题