论文标题
桌子结构提取,双向封盖复发单元网络
Table Structure Extraction with Bi-directional Gated Recurrent Unit Networks
论文作者
论文摘要
表呈现给读者的摘要和结构化信息,这使表结构提取是文档理解应用程序的重要组成部分。但是,表结构标识是一个困难的问题,不仅是因为表布局和样式的变化很大,而且还因为页面布局的变化和噪声污染级别的变化。已经进行了大量研究来识别表结构,其中大多数是基于光学特征识别(OCR)应用启发式方法的,以将表的手选择布局特征。由于表布局的变化以及OCR产生的错误,这些方法无法很好地概括。在本文中,我们提出了一种强大的基于深度学习的方法,以高精度中的文档图像中的检测表中提取行和列。在提出的解决方案中,首先对表图像进行预处理,然后将其馈送到带有门控复发单元(GRU)的双向复发性神经网络,然后是具有软最大激活的完全连接层。网络从上到下扫描图像,以及从左到右的,并将每个输入分类为行分隔符或列分隔符。我们已经根据公开可用的UNLV以及ICDAR 2013数据集进行了基准测试,在该数据集上,它的表现优于最先进的表结构提取系统。
Tables present summarized and structured information to the reader, which makes table structure extraction an important part of document understanding applications. However, table structure identification is a hard problem not only because of the large variation in the table layouts and styles, but also owing to the variations in the page layouts and the noise contamination levels. A lot of research has been done to identify table structure, most of which is based on applying heuristics with the aid of optical character recognition (OCR) to hand pick layout features of the tables. These methods fail to generalize well because of the variations in the table layouts and the errors generated by OCR. In this paper, we have proposed a robust deep learning based approach to extract rows and columns from a detected table in document images with a high precision. In the proposed solution, the table images are first pre-processed and then fed to a bi-directional Recurrent Neural Network with Gated Recurrent Units (GRU) followed by a fully-connected layer with soft max activation. The network scans the images from top-to-bottom as well as left-to-right and classifies each input as either a row-separator or a column-separator. We have benchmarked our system on publicly available UNLV as well as ICDAR 2013 datasets on which it outperformed the state-of-the-art table structure extraction systems by a significant margin.