论文标题

通过表结构识别,在ICT供应链社交网络中共享有效的信息共享

Efficient Information Sharing in ICT Supply Chain Social Network via Table Structure Recognition

论文作者

Xiao, Bin, Akkaya, Yakup, Simsek, Murat, Kantarci, Burak, Alkheir, Ala Abu

论文摘要

全球信息和通信技术(ICT)供应链是一个复杂的网络,由所有类型的参与者组成。通常将其作为社交网络,以讨论供应链网络的关系,属性和供应链管理中的发展。信息共享在提高供应链效率方面起着至关重要的作用,并且由于人类的可读性,数据表是描述ICT供应链中电子组件商品的最常见数据格式。但是,随着电子文档数量的飙升,它已经超出了人类读者的能力,并且由于复杂的表结构和异质布局,自动处理表格数据也是一项挑战。表结构识别(TSR)的目的是表示具有机器开发格式的复杂结构的表,以便可以自动处理表格数据。在本文中,我们将TSR提出为对象检测问题,并建议生成复杂表结构的直观表示,以实现与商品相关的表格数据的结构。为了应对无边界和小的布局,我们通过考虑每个类别的检测难度来提出一种对成本敏感的损失功能。此外,我们提出了一种新型的锚生成方法,该方法使用表中的表格应具有相同的高度,并且表格中的行应共享相同的宽度。我们基于更快的RCNN实施了建议的方法,并以平均平均精度(AP)实现94.79%,并始终提高不同基准模型的1.5%以上的AP。

The global Information and Communications Technology (ICT) supply chain is a complex network consisting of all types of participants. It is often formulated as a Social Network to discuss the supply chain network's relations, properties, and development in supply chain management. Information sharing plays a crucial role in improving the efficiency of the supply chain, and datasheets are the most common data format to describe e-component commodities in the ICT supply chain because of human readability. However, with the surging number of electronic documents, it has been far beyond the capacity of human readers, and it is also challenging to process tabular data automatically because of the complex table structures and heterogeneous layouts. Table Structure Recognition (TSR) aims to represent tables with complex structures in a machine-interpretable format so that the tabular data can be processed automatically. In this paper, we formulate TSR as an object detection problem and propose to generate an intuitive representation of a complex table structure to enable structuring of the tabular data related to the commodities. To cope with border-less and small layouts, we propose a cost-sensitive loss function by considering the detection difficulty of each class. Besides, we propose a novel anchor generation method using the character of tables that columns in a table should share an identical height, and rows in a table should share the same width. We implement our proposed method based on Faster-RCNN and achieve 94.79% on mean Average Precision (AP), and consistently improve more than 1.5% AP for different benchmark models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源