论文标题
AE Textspotter:学习视觉和语言表示模棱两可的文本发现
AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting
论文作者
论文摘要
场景文本发现旨在检测并识别自然图像中具有多个字符的整个单词或句子。这仍然是具有挑战性的,因为当字符之间的间距很大或角色之间的间距均匀分布在多个行和列中时,歧义通常会发生,这使得角色的许多视觉上合理的分组(例如,“柏林”被错误地检测为“ berl”和“ berl”和“在图1(c)中”)。与以前仅使用视觉特征进行文本检测的作品不同,这项作品提出了一个新颖的文本点,名为“歧义”消除了文本sptepter(AE Textspotter),该文本既删除了文本sptepter(AE Textspotter),该文本既学习视觉和语言特征,以显着降低文本检测中的歧义。拟议的AE TextSpotter具有三个重要好处。 1)语言表示与框架中的视觉表示一起学习。据我们所知,这是第一次使用语言模型改善文本检测。 2)使用经过精心设计的语言模块来降低文本线路不正确的检测信心,从而在检测阶段轻松修剪它们。 3)广泛的实验表明,AE TextSpotter的表现要优于其他最先进的方法。例如,我们仔细地从IC19-RECTS数据集中选择了一组极为模棱两可的样本的验证集,其中我们的方法超过其他方法超过4%。该代码已在https://github.com/whai362/ae_textspotter上发布。验证集的图像列表和评估脚本已在https://github.com/whai362/tda-rects上发布。
Scene text spotting aims to detect and recognize the entire word or sentence with multiple characters in natural images. It is still challenging because ambiguity often occurs when the spacing between characters is large or the characters are evenly spread in multiple rows and columns, making many visually plausible groupings of the characters (e.g. "BERLIN" is incorrectly detected as "BERL" and "IN" in Fig. 1(c)). Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection. The proposed AE TextSpotter has three important benefits. 1) The linguistic representation is learned together with the visual representation in a framework. To our knowledge, it is the first time to improve text detection by using a language model. 2) A carefully designed language module is utilized to reduce the detection confidence of incorrect text lines, making them easily pruned in the detection stage. 3) Extensive experiments show that AE TextSpotter outperforms other state-of-the-art methods by a large margin. For example, we carefully select a validation set of extremely ambiguous samples from the IC19-ReCTS dataset, where our approach surpasses other methods by more than 4%. The code has been released at https://github.com/whai362/AE_TextSpotter. The image list and evaluation scripts of the validation set have been released at https://github.com/whai362/TDA-ReCTS.