论文标题
从离散到连续卷积层
From Discrete to Continuous Convolution Layers
论文作者
论文摘要
卷积神经网络(CNN)的基本操作是特征图的空间调整。这是通过跨卷积(donwscaling)或转置卷积(升级)来完成的。此类操作仅限于在预定的整数步骤(步幅)下移动的固定过滤器。连续层的空间尺寸与整数量表因子相关,在体系结构设计时进行了预先确定,并且在整个训练和推理时间内保持固定。我们提出了从离散层到连续卷积(CC)层的通用旋转层的概括。 CC层通过将过滤器表示为子像素坐标上的学习连续函数,从而自然扩展了转与层。这允许将特征地图的可学习和原则性的调整大小划分到任何尺寸,并在范围内动态和一致。一旦训练,CC层可用于输出推理时选择的任何比例/大小。量表可以是非整数的,并且在轴之间有所不同。 CC为建筑设计带来了新的自由,例如推理时的动态层形状,或逐渐的体系结构,其中大小在每层层的小因素变化。这引起了许多理想的CNN属性,新的建筑设计功能和有用的应用程序。我们进一步表明,当前的转变器遭受了固有的未对准,这些未对准会被CC层改善。
A basic operation in Convolutional Neural Networks (CNNs) is spatial resizing of feature maps. This is done either by strided convolution (donwscaling) or transposed convolution (upscaling). Such operations are limited to a fixed filter moving at predetermined integer steps (strides). Spatial sizes of consecutive layers are related by integer scale factors, predetermined at architectural design, and remain fixed throughout training and inference time. We propose a generalization of the common Conv-layer, from a discrete layer to a Continuous Convolution (CC) Layer. CC Layers naturally extend Conv-layers by representing the filter as a learned continuous function over sub-pixel coordinates. This allows learnable and principled resizing of feature maps, to any size, dynamically and consistently across scales. Once trained, the CC layer can be used to output any scale/size chosen at inference time. The scale can be non-integer and differ between the axes. CC gives rise to new freedoms for architectural design, such as dynamic layer shapes at inference time, or gradual architectures where the size changes by a small factor at each layer. This gives rise to many desired CNN properties, new architectural design capabilities, and useful applications. We further show that current Conv-layers suffer from inherent misalignments, which are ameliorated by CC layers.