论文标题

旧手写文档的文字提取和恢复

Text Extraction and Restoration of Old Handwritten Documents

论文作者

Wadhwani, Mayank, Kundu, Debapriya, Chakraborty, Deepayan, Chanda, Bhabatosh

论文摘要

图像修复是非常关键的计算机视觉任务。本文介绍了使用深神经网络恢复旧降级手写文档的两种新方法。除此之外,还引入了26个遗产字母图像的小规模数据集。训练所需网络的地面真实数据是半自动生成的,涉及颜色转化,基于高斯混合模型的分割和形状校正的务实组合,并使用数学形态学算子。在第一种方法中,已使用深层神经网络从文档图像中提取文本,后来使用高斯混合物建模进行了背景重建。但是高斯混合物建模需要手动设置参数,以减轻我们提出的第二种方法,其中背景重建和前景提取(包括用其原始颜色提取文本)都使用深神经网络完成。实验表明,即使使用小数据集培训,提出的系统在具有严重降解的手写文档图像上表现良好。因此,所提出的方法非常适合数字遗产保护存储库。值得一提的是,这些方法可以轻松扩展到印刷后的退化文档。

Image restoration is very crucial computer vision task. This paper describes two novel methods for the restoration of old degraded handwritten documents using deep neural network. In addition to that, a small-scale dataset of 26 heritage letters images is introduced. The ground truth data to train the desired network is generated semi automatically involving a pragmatic combination of color transformation, Gaussian mixture model based segmentation and shape correction by using mathematical morphological operators. In the first approach, a deep neural network has been used for text extraction from the document image and later background reconstruction has been done using Gaussian mixture modeling. But Gaussian mixture modelling requires to set parameters manually, to alleviate this we propose a second approach where the background reconstruction and foreground extraction (which which includes extracting text with its original colour) both has been done using deep neural network. Experiments demonstrate that the proposed systems perform well on handwritten document images with severe degradations, even when trained with small dataset. Hence, the proposed methods are ideally suited for digital heritage preservation repositories. It is worth mentioning that, these methods can be extended easily for printed degraded documents.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源