通过并行系统提高准确性并加速文档图像分类

论文标题

通过并行系统提高准确性并加速文档图像分类

Improving accuracy and speeding up Document Image Classification through parallel systems

论文作者

Ferrando, Javier, Dominguez, Juan Luis, Torres, Jordi, Garcia, Raul, Garcia, David, Garrido, Daniel, Cortada, Jordi, Valero, Mateo

论文摘要

本文介绍了一项研究，显示了在文档分类任务中与较重的卷积神经网络（CNN）相比，有效网络模型的好处，在机构数字化过程中的基本问题。我们在RVL-CDIP数据集中显示，我们可以通过更轻的模型改善先前的结果，并在较小的域内数据集（例如Tobacco3482）上呈现其传输学习能力。此外，我们提出了一个集成管道，该管道能够通过将图像模型预测与BERT模型在OCR提取的文本上生成的图像模型预测相结合，从而仅通过将图像模型预测与图像模型预测相结合。我们还表明，可以有效地增加批量的大小而不会阻碍其准确性，从而可以通过在多个GPU中并行化来加速训练过程，从而减少所需的计算时间。最后，我们暴露了Pytorch和Tensorflow深度学习框架之间的训练性能差异。

This paper presents a study showing the benefits of the EfficientNet models compared with heavier Convolutional Neural Networks (CNNs) in the Document Classification task, essential problem in the digitalization process of institutions. We show in the RVL-CDIP dataset that we can improve previous results with a much lighter model and present its transfer learning capabilities on a smaller in-domain dataset such as Tobacco3482. Moreover, we present an ensemble pipeline which is able to boost solely image input by combining image model predictions with the ones generated by BERT model on extracted text by OCR. We also show that the batch size can be effectively increased without hindering its accuracy so that the training process can be sped up by parallelizing throughout multiple GPUs, decreasing the computational time needed. Lastly, we expose the training performance differences between PyTorch and Tensorflow Deep Learning frameworks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题