具有随机操作访问特定图块（烤）哈希的有效模型压缩

论文标题

具有随机操作访问特定图块（烤）哈希的有效模型压缩

Efficient model compression with Random Operation Access Specific Tile (ROAST) hashing

论文作者

Desai, Aditya, Zhou, Keren, Shrivastava, Anshumali

论文摘要

深度学习的进步通常与增加模型大小有关。模型大小极大地影响了深层模型的部署成本和延迟。例如，由于伯特（Bert）的尺寸纯粹，不能将诸如伯特（Bert）之类的模型部署在边缘设备和手机上。结果，深度学习的大多数进步尚未达到优势。模型压缩已在自然语言处理，视觉和推荐域的文献中寻求当之无愧的关注。本文提出了一种模型不合时宜的，对缓存的模型压缩方法：随机操作访问特定的瓷砖（烤）哈希。烘烤通过轻巧的映射将参数击倒。值得注意的是，在敲击这些参数时，烤肉通过将内存访问模式与参数访问模式对齐来利用缓存层次结构。与流行的参数共享方法hashednet相比，烤器最多可训练的速度最高$ \ sim 25 \ times $ $ $ \ sim 50 \ times $ $更快。此外，烤肉还引入了全球重量共享，从经验和理论上优于Hashednet的本地重量共享，并且本身可能具有独立的兴趣。借助烤肉，我们会呈现第一个压缩的BERT，即$ 100 \ times -1000 \ times $较小，但不会导致质量降解。这些在通用体系结构（例如变形金刚）上的压缩水平对于在资源受限设备（例如移动设备）上的SOTA模型部署的未来很有希望

Advancements in deep learning are often associated with increasing model sizes. The model size dramatically affects the deployment cost and latency of deep models. For instance, models like BERT cannot be deployed on edge devices and mobiles due to their sheer size. As a result, most advances in Deep Learning are yet to reach the edge. Model compression has sought much-deserved attention in literature across natural language processing, vision, and recommendation domains. This paper proposes a model-agnostic, cache-friendly model compression approach: Random Operation Access Specific Tile (ROAST) hashing. ROAST collapses the parameters by clubbing them through a lightweight mapping. Notably, while clubbing these parameters, ROAST utilizes cache hierarchies by aligning the memory access pattern with the parameter access pattern. ROAST is up to $\sim 25 \times$ faster to train and $\sim 50 \times$ faster to infer than the popular parameter sharing method HashedNet. Additionally, ROAST introduces global weight sharing, which is empirically and theoretically superior to local weight sharing in HashedNet, and can be of independent interest in itself. With ROAST, we present the first compressed BERT, which is $100\times - 1000\times$ smaller but does not result in quality degradation. These compression levels on universal architecture like transformers are promising for the future of SOTA model deployment on resource-constrained devices like mobile and edge devices

下载PDF全文

下载文献需遵守相关版权规定

论文标题