论文标题
Roadnet-RT:实时路段的高吞吐量CNN体系结构和SOC设计
RoadNet-RT: High Throughput CNN Architecture and SoC Design for Real-Time Road Segmentation
论文作者
论文摘要
近年来,卷积神经网络在许多工程应用中都广受欢迎,尤其是计算机视觉。为了取得更好的性能,通常将更复杂的结构和高级操作纳入神经网络,从而导致很长的推理时间。对于诸如自主驾驶和虚拟现实之类的时间关键任务,实时处理是基本的。为了达到实时过程速度,在本文中提出了一个轻巧的高通量CNN体系结构,即道路分割。在GTX 1080 GPU上运行时,它在Kitti Road细分任务的测试集上达到90.33%的最大得分,每帧8 ms。与最先进的网络相比,RoadNet-RT将推理时间加快了20倍,其精度损失仅为6.2%。为了进行硬件设计优化,定制了几种技术,例如可分开的卷积和不均匀的内核大小卷积,旨在进一步缩短处理时间。提出的CNN体系结构已在FPGA ZCU102 MPSOC平台上成功实现,该平台可实现83.05 GOPS的计算能力。系统吞吐量达到每秒327.9帧,图像尺寸为1216x176。
In recent years, convolutional neural network has gained popularity in many engineering applications especially for computer vision. In order to achieve better performance, often more complex structures and advanced operations are incorporated into the neural networks, which results very long inference time. For time-critical tasks such as autonomous driving and virtual reality, real-time processing is fundamental. In order to reach real-time process speed, a light-weight, high-throughput CNN architecture namely RoadNet-RT is proposed for road segmentation in this paper. It achieves 90.33% MaxF score on test set of KITTI road segmentation task and 8 ms per frame when running on GTX 1080 GPU. Comparing to the state-of-the-art network, RoadNet-RT speeds up the inference time by a factor of 20 at the cost of only 6.2% accuracy loss. For hardware design optimization, several techniques such as depthwise separable convolution and non-uniformed kernel size convolution are customized designed to further reduce the processing time. The proposed CNN architecture has been successfully implemented on an FPGA ZCU102 MPSoC platform that achieves the computation capability of 83.05 GOPS. The system throughput reaches 327.9 frames per second with image size 1216x176.