深入学习的芯片放置

论文标题

深入学习的芯片放置

Chip Placement with Deep Reinforcement Learning

论文作者

Mirhoseini, Azalia, Goldie, Anna, Yazgan, Mustafa, Jiang, Joe, Songhori, Ebrahim, Wang, Shen, Lee, Young-Joon, Johnson, Eric, Pathak, Omkar, Bae, Sungmin, Nazi, Azade, Pak, Jiwoo, Tong, Andy, Srinivasa, Kavya, Hang, William, Tuncer, Emre, Babu, Anand, Le, Quoc V., Laudon, James, Ho, Richard, Carpenter, Roger, Dean, Jeff

论文摘要

在这项工作中，我们提出了一种基于学习的芯片放置方法，这是芯片设计过程中最复杂和耗时的阶段之一。与先前的方法不同，我们的方法具有从过去的经验中学习并随着时间的推移而改善的能力。特别是，随着我们在更多的芯片块上训练时，我们的方法在快速生成以前看不见的芯片块的优化位置方面变得更好。为了实现这些结果，我们将放置作为加强学习（RL）问题，并训练代理将芯片网表的节点放在芯片画布上。为了使我们的RL政策概括为看不见的障碍，我们将在预测放置质量的监督任务中基础表示学习。通过设计一个可以准确预测各种网络名单及其位置的奖励的神经体系结构，我们能够生成输入网络名单的丰富功能嵌入。然后，我们将此体系结构用作策略和价值网络的编码器，以实现转移学习。我们的目标是最大程度地减少PPA（功率，性能和区域），我们表明，在不到6个小时的时间内，我们的方法可以生成超人人类或对现代加速器网表的可比性，而现有的基线需要循环中的人类专家并需要几周。

In this work, we present a learning-based approach to chip placement, one of the most complex and time-consuming stages of the chip design process. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of chip blocks, our method becomes better at rapidly generating optimized placements for previously unseen chip blocks. To achieve these results, we pose placement as a Reinforcement Learning (RL) problem and train an agent to place the nodes of a chip netlist onto a chip canvas. To enable our RL policy to generalize to unseen blocks, we ground representation learning in the supervised task of predicting placement quality. By designing a neural architecture that can accurately predict reward across a wide variety of netlists and their placements, we are able to generate rich feature embeddings of the input netlists. We then use this architecture as the encoder of our policy and value networks to enable transfer learning. Our objective is to minimize PPA (power, performance, and area), and we show that, in under 6 hours, our method can generate placements that are superhuman or comparable on modern accelerator netlists, whereas existing baselines require human experts in the loop and take several weeks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题