Marvel：用于空间加速器的DNN操作员的以数据为中心的编译器

论文标题

Marvel：用于空间加速器的DNN操作员的以数据为中心的编译器

Marvel: A Data-centric Compiler for DNN Operators on Spatial Accelerators

论文作者

Chatarasi, Prasanth, Kwon, Hyoukjun, Raina, Natesh, Malik, Saurabh, Haridas, Vaisakh, Parashar, Angshuman, Pellauer, Michael, Krishna, Tushar, Sarkar, Vivek

论文摘要

空间DNN加速器的效率在很大程度上取决于编译器及其成本模型的能力，可以为DNN模型的各种操作员生成优化的映射，以实现加速器的计算和内存资源。但是，现有的成本模型缺乏对操作员的正式边界，无法进行精确且可处理的分析，这对新的DNN操作员构成了适应性挑战。为了应对这一挑战，我们利用最近引入的以数据为中心（MDC）表示法。我们对DNN运算符的正式了解可以在MDC表示法中描述，因为MDC的成本模型始终可以分析符合符号的任何映射。此外，我们引入了用于将映射转换为探索映射空间的MDC符号的转换。由于映射的空间很大，因此搜索最佳映射是具有挑战性的，而且这一挑战被新的操作员和多样化的加速器配置加剧了。为了应对这一挑战，我们提出了一种脱钩的外芯片/芯片方法，该方法将映射空间分解为离chip和On-Chip of-Chip和On-Chip Subspace in-Chorpace contpace contpace contpace contpace contpace contpace contpace。这种分解的动机是大大降低搜索空间的大小，并优先考虑片外数据运动的优化，与芯片数据运动相比，芯片数据运动的优化高2-3个数量级。我们在一种称为{\ em marvel}的工具中实现了我们的方法，我们方法的另一个主要好处是，它适用于与MDC符号符合MDC表示法的任何DNN操作员。

The efficiency of a spatial DNN accelerator depends heavily on the compiler and its cost model ability to generate optimized mappings for various operators of DNN models on to the accelerator's compute and memory resources. But, existing cost models lack a formal boundary over the operators for precise and tractable analysis, which poses adaptability challenges for new DNN operators. To address this challenge, we leverage the recently introduced Maestro Data-Centric (MDC) notation. We develop a formal understanding of DNN operators whose mappings can be described in the MDC notation, because any mapping adhering to the notation is always analyzable by the MDC's cost model. Furthermore, we introduce a transformation for translating mappings into the MDC notation for exploring the mapping space. Searching for the optimal mappings is challenging because of the large space of mappings, and this challenge gets exacerbated with new operators and diverse accelerator configurations.To address this challenge, we propose a decoupled off-chip/on-chip approach that decomposes the mapping space into off-chip and on-chip subspaces, and first optimizes the off-chip subspace followed by the on-chip subspace. The motivation for this decomposition is to reduce the size of the search space dramatically and also to prioritize the optimization of off-chip data movement, which is 2-3 orders of magnitude more compared to the on-chip data movement. We implemented our approach in a tool called {\em Marvel}, and another major benefit of our approach is that it is applicable to any DNN operator conformable with the MDC notation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题