使用经常性卷积神经网络从科学文献中基于视觉的布局检测

论文标题

使用经常性卷积神经网络从科学文献中基于视觉的布局检测

Vision-Based Layout Detection from Scientific Literature using Recurrent Convolutional Neural Networks

论文作者

Yang, Huichen, Hsu, William H.

论文摘要

我们提出了一种将卷积神经网络调整为对象识别和分类为科学文献布局检测（SLLD）的方法，这是几种信息提取问题的共享子任务。科学出版物包含多种类型的研究人员在各个学科中寻求的信息，这些信息被组织为抽象，书目和记录相关工作，实验方法和结果的部分；但是，由于其多样化的布局，没有有效的方法来提取此信息。在本文中，我们提出了一种新的方法，用于开发一个端到端的学习框架，以细分和对科学文档的主要区域进行分类。我们将科学文档布局分析视为数字图像上的对象检测任务，而没有任何其他文本功能在培训过程中需要添加到网络中。我们的技术目标是通过对预训练的网络进行微调实施转移学习，从而证明，这种深度学习体系结构适合于缺乏从头开始培训的文档语料库的任务。作为该方法实验评估的实验测试床的一部分，我们创建了一个合并的多孔数据集，用于科学出版物布局检测任务。与基线卷积神经网络体系结构相比，使用此合并的数据集对预训练的基础网络进行微调，通过对预训练的基本网络进行微调，表现出良好的改进。

We present an approach for adapting convolutional neural networks for object recognition and classification to scientific literature layout detection (SLLD), a shared subtask of several information extraction problems. Scientific publications contain multiple types of information sought by researchers in various disciplines, organized into an abstract, bibliography, and sections documenting related work, experimental methods, and results; however, there is no effective way to extract this information due to their diverse layout. In this paper, we present a novel approach to developing an end-to-end learning framework to segment and classify major regions of a scientific document. We consider scientific document layout analysis as an object detection task over digital images, without any additional text features that need to be added into the network during the training process. Our technical objective is to implement transfer learning via fine-tuning of pre-trained networks and thereby demonstrate that this deep learning architecture is suitable for tasks that lack very large document corpora for training ab initio. As part of the experimental test bed for empirical evaluation of this approach, we created a merged multi-corpus data set for scientific publication layout detection tasks. Our results show good improvement with fine-tuning of a pre-trained base network using this merged data set, compared to the baseline convolutional neural network architecture.

下载PDF全文

下载文献需遵守相关版权规定

论文标题