场景文本识别，并具有更精细的网格纠正

论文标题

场景文本识别，并具有更精细的网格纠正

Scene Text Recognition With Finer Grid Rectification

论文作者

Wang, Gang

论文摘要

场景文本识别是一个具有挑战性的问题，因为风格不规则和各种扭曲。本文提出了一个端到端可训练的模型由更精细的整流模块和双向注意识别网络（FIRBARN）组成。整流模块采用更细的网格来纠正变形的输入图像，而双向解码器仅包含一个解码层，而不是两个分离的层。 Firbarn可以以较弱的监督方式进行训练，只需要场景文本图像和相应的单词标签。借助灵活的纠正和新颖的双向解码器，对标准基准的广泛评估结果表明，Firbarn的表现优于先前的工作，尤其是在不规则数据集上。

Scene Text Recognition is a challenging problem because of irregular styles and various distortions. This paper proposed an end-to-end trainable model consists of a finer rectification module and a bidirectional attentional recognition network(Firbarn). The rectification module adopts finer grid to rectify the distorted input image and the bidirectional decoder contains only one decoding layer instead of two separated one. Firbarn can be trained in a weak supervised way, only requiring the scene text images and the corresponding word labels. With the flexible rectification and the novel bidirectional decoder, the results of extensive evaluation on the standard benchmarks show Firbarn outperforms previous works, especially on irregular datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题