使用集成光子张量芯的平行卷积处理

论文标题

使用集成光子张量芯的平行卷积处理

Parallel convolution processing using an integrated photonic tensor core

论文作者

Feldmann, Johannes, Youngblood, Nathan, Karpov, Maxim, Gehring, Helge, Li, Xuan, Stappers, Maik, Gallo, Manuel Le, Fu, Xin, Lukashchuk, Anton, Raja, Arslan, Liu, Junqiu, Wright, David, Sebastian, Abu, Kippenberg, Tobias, Pernice, Wolfram, Bhaskaran, Harish

论文摘要

随着超高速度移动网络和Internet连接设备的扩散，随着人工智能的兴起，世界正在逐步增加数量的数据 - 需要快速，高效且智能的方式处理的数据。这些发展正在推动现有计算范式的限制，高度平行，快速和可扩展的硬件概念变得越来越重要。在这里，我们演示了一个计算特异性集成的光子张量核 - 可在TERA-Multiply-Multiply-Accumulate每秒（TMAC/S）速度下操作的ASIC能力的光学类似物。光子核心使用相位变化存储器阵列和基于光子芯片的光学频率梳（Soliton Microcombs）实现并行的光子内存计算。该计算减少为测量可重构和非谐振的被动组件的光学传输，并且可以以超过14 GHz的带宽运行，仅受调制器和光电探测器的速度限制。鉴于在微波线速率，超低损失硝酸盐波导以及高速芯片检测器和调节器高速损失硅的杂种整合的最新进展，我们的方法为光子张量芯的全部CMOS晶状体尺度整合提供了途径。尽管我们专注于卷积处理，但更普遍的结果表明，在要求AI应用程序（例如自动驾驶，实时视频处理和下一代云计算服务）中，集成光子学对平行，快速和高效的计算硬件的主要潜力。

With the proliferation of ultra-high-speed mobile networks and internet-connected devices, along with the rise of artificial intelligence, the world is generating exponentially increasing amounts of data - data that needs to be processed in a fast, efficient and smart way. These developments are pushing the limits of existing computing paradigms, and highly parallelized, fast and scalable hardware concepts are becoming progressively more important. Here, we demonstrate a computational specific integrated photonic tensor core - the optical analog of an ASIC-capable of operating at Tera-Multiply-Accumulate per second (TMAC/s) speeds. The photonic core achieves parallelized photonic in-memory computing using phase-change memory arrays and photonic chip-based optical frequency combs (soliton microcombs). The computation is reduced to measuring the optical transmission of reconfigurable and non-resonant passive components and can operate at a bandwidth exceeding 14 GHz, limited only by the speed of the modulators and photodetectors. Given recent advances in hybrid integration of soliton microcombs at microwave line rates, ultra-low loss silicon nitride waveguides, and high speed on-chip detectors and modulators, our approach provides a path towards full CMOS wafer-scale integration of the photonic tensor core. While we focus on convolution processing, more generally our results indicate the major potential of integrated photonics for parallel, fast, and efficient computational hardware in demanding AI applications such as autonomous driving, live video processing, and next generation cloud computing services.

下载PDF全文

下载文献需遵守相关版权规定

论文标题