论文标题
在自然场景中阅读中国人,并以一袋宗教的方式阅读
Reading Chinese in Natural Scenes with a Bag-of-Radicals Prior
论文作者
论文摘要
近年来,在拉丁数据集上的场景文本识别(STR)进行了广泛的研究,而最新的(SOTA)模型通常达到高精度。但是,非拉丁蛋白转录本(例如中文)的性能并不令人满意。在本文中,我们收集了六个开源中国STR数据集,并评估了一系列经典方法在拉丁数据集中表现良好,发现了大量的性能下降。为了提高中文数据集的性能,我们提出了一种新颖的自由基(RE)表示,以利用汉字的意识形态描述。汉字的意识形态描述首先转换为激进的袋子,然后与角色矢量融合模块(CVFM)融合在一起。此外,我们还利用一袋自由基作为多任务训练的监督信号,以改善我们模型的意识形态结构的感知。实验表明,使用RE + CVFM +多任务训练的模型的性能与六个中国STR数据集的基线相比要优越。此外,我们还利用一袋自由基作为多任务训练的监督信号,以改善我们模型的意识形态结构的感知。实验表明,使用RE + CVFM +多任务训练的模型的性能与六个中国STR数据集的基线相比要优越。
Scene text recognition (STR) on Latin datasets has been extensively studied in recent years, and state-of-the-art (SOTA) models often reach high accuracy. However, the performance on non-Latin transcripts, such as Chinese, is not satisfactory. In this paper, we collect six open-source Chinese STR datasets and evaluate a series of classic methods performing well on Latin datasets, finding a significant performance drop. To improve the performance on Chinese datasets, we propose a novel radical-embedding (RE) representation to utilize the ideographic descriptions of Chinese characters. The ideographic descriptions of Chinese characters are firstly converted to bags of radicals and then fused with learnable character embeddings by a character-vector-fusion-module (CVFM). In addition, we utilize a bag of radicals as supervision signals for multi-task training to improve the ideographic structure perception of our model. Experiments show performance of the model with RE + CVFM + multi-task training is superior compared with the baseline on six Chinese STR datasets. In addition, we utilize a bag of radicals as supervision signals for multi-task training to improve the ideographic structure perception of our model. Experiments show performance of the model with RE + CVFM + multi-task training is superior compared with the baseline on six Chinese STR datasets.