论文标题
带有倾斜层融合的实时超级分辨率加速器
A Real Time Super Resolution Accelerator with Tilted Layer Fusion
论文作者
论文摘要
基于深度学习的超分辨率取得了高质量的结果,但是其繁重的计算工作量,较大的缓冲区和高外部存储器带宽抑制了其在移动设备中的用途。为了解决上述问题,本文使用倾斜层融合方法提出了一个实时硬件加速器,该方法将外部DRAM带宽降低92 \%,并且仅需要102KB的芯片内存。以40nm CMOS流程实现的设计达到1920x1080@60fps吞吐量,在600MHz运行时,使用544.3k门计数;与以前的设计相比,它具有更高的吞吐量和更低的面积成本。
Deep learning based superresolution achieves high-quality results, but its heavy computational workload, large buffer, and high external memory bandwidth inhibit its usage in mobile devices. To solve the above issues, this paper proposes a real-time hardware accelerator with the tilted layer fusion method that reduces the external DRAM bandwidth by 92\% and just needs 102KB on-chip memory. The design implemented with a 40nm CMOS process achieves 1920x1080@60fps throughput with 544.3K gate count when running at 600MHz; it has higher throughput and lower area cost than previous designs.