TVConv：用于布局感知视觉处理的有效翻译变体卷积

论文标题

TVConv：用于布局感知视觉处理的有效翻译变体卷积

TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing

论文作者

Chen, Jierun, He, Tianlang, Zhuo, Weipeng, Ma, Li, Ha, Sangtae, Chan, S. -H. Gary

论文摘要

随着卷积赋予许多智能应用程序的能力，动态卷积进一步使其能够适应各种投入。但是，静态和动态的卷积是布局 - 敏捷的或计算较高的，因此不适合特定于布局的应用，例如面部识别和医疗图像分割。我们观察到这些应用自然表现出大图像（空间）方差和较小的跨图像方差的特征。该观察结果激发了我们有效的翻译变体卷积（TVCONV），用于布局感知的视觉处理。从技术上讲，TVConv由亲和力图和一个生成块组成。虽然亲和力图优雅地描绘了像素对型关系，但可以将重量产生的块显式过度参数化，以更好地训练，同时保持有效的推理。尽管在概念上很简单，但TVConv显着提高了卷积的效率，并且可以很容易地插入各种网络体系结构中。面部识别的广泛实验表明，TVConv将计算成本降低了3.1倍，并将相应的吞吐量提高了2.3倍，同时与深度卷积相比保持了高度的精度。此外，对于相同的计算成本，我们将平均准确性提高高达4.21％。我们还对视盘/CUP分段任务进行了实验，并获得更好的概括性能，这有助于减轻严重的数据稀缺问题。代码可在https://github.com/jierunchen/tvconv上找到。

As convolution has empowered many smart applications, dynamic convolution further equips it with the ability to adapt to diverse inputs. However, the static and dynamic convolutions are either layout-agnostic or computation-heavy, making it inappropriate for layout-specific applications, e.g., face recognition and medical image segmentation. We observe that these applications naturally exhibit the characteristics of large intra-image (spatial) variance and small cross-image variance. This observation motivates our efficient translation variant convolution (TVConv) for layout-aware visual processing. Technically, TVConv is composed of affinity maps and a weight-generating block. While affinity maps depict pixel-paired relationships gracefully, the weight-generating block can be explicitly overparameterized for better training while maintaining efficient inference. Although conceptually simple, TVConv significantly improves the efficiency of the convolution and can be readily plugged into various network architectures. Extensive experiments on face recognition show that TVConv reduces the computational cost by up to 3.1x and improves the corresponding throughput by 2.3x while maintaining a high accuracy compared to the depthwise convolution. Moreover, for the same computation cost, we boost the mean accuracy by up to 4.21%. We also conduct experiments on the optic disc/cup segmentation task and obtain better generalization performance, which helps mitigate the critical data scarcity issue. Code is available at https://github.com/JierunChen/TVConv.

下载PDF全文

下载文献需遵守相关版权规定

论文标题