域自适应场景文本检测的自我训练

论文标题

域自适应场景文本检测的自我训练

Self-Training for Domain Adaptive Scene Text Detection

论文作者

Chen, Yudi, Wang, Wei, Zhou, Yu, Yang, Fei, Yang, Dongbao, Wang, Weiping

论文摘要

尽管基于深度学习的场景文本检测取得了巨大进展，但训练有素的探测器遭受了不同领域的严重性能降解。通常，在目标域中训练检测器是必不可少的。但是，数据收集和注释既昂贵又耗时。为了解决这个问题，我们提出了一个自我培训框架，以自动使用未经注释的视频或图像的伪标签来自动挖掘艰难的例子。为了减少硬性示例的噪音，基于检测和跟踪结果的融合，实现了一个新型的文本挖掘模块。然后，为视频不可用的任务而设计的图像到视频生成方法，只能使用图像。对标准基准测试的实验结果，包括ICDAR2015，MSRA-TD500，ICDAR2017 MLT，证明了我们自我训练方法的有效性。通过自我训练和对真实数据进行微调的简单掩码R-CNN可以通过最先进的方法获得可比甚至优越的结果。

Though deep learning based scene text detection has achieved great progress, well-trained detectors suffer from severe performance degradation for different domains. In general, a tremendous amount of data is indispensable to train the detector in the target domain. However, data collection and annotation are expensive and time-consuming. To address this problem, we propose a self-training framework to automatically mine hard examples with pseudo-labels from unannotated videos or images. To reduce the noise of hard examples, a novel text mining module is implemented based on the fusion of detection and tracking results. Then, an image-to-video generation method is designed for the tasks that videos are unavailable and only images can be used. Experimental results on standard benchmarks, including ICDAR2015, MSRA-TD500, ICDAR2017 MLT, demonstrate the effectiveness of our self-training method. The simple Mask R-CNN adapted with self-training and fine-tuned on real data can achieve comparable or even superior results with the state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题