论文标题

ScansSD:扫描PDF文档图像中数学公式的单射击检测器

ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images

论文作者

Mali, Parag, Kukkadapu, Puneeth, Mahdavi, Mahshad, Zanibbi, Richard

论文摘要

我们介绍了扫描单射击检测器(SCANSSD),以定位文本的数学公式偏移并嵌入文本线中。 SCANSSD仅使用视觉功能来检测:不使用格式或排版信息,例如布局,字体或字符标签。给定600个DPI文档页面图像,单个SHOT检测器(SSD)使用滑动窗口在多个尺度上定位公式,然后将候选检测汇总为汇总以获得页面级别的结果。在我们的实验中,我们使用TFD-ICDAR2019V2数据集,这是GTDB扫描的数学文章集合的修改。 SCANSSD以高精度检测公式中的字符,获得0.926的F-SCORE,并检测总体召回率高的公式。检测误差在很大程度上是很小的,例如在大空格差距(例如,对于可变约束)和相邻文本线上合并公式的分裂公式。获得了0.796(IOU $ \ geq0.5 $)和0.733(IOU $ \ ge 0.75 $)的公式检测F评分。我们的数据,评估工具和代码公开可用。

We introduce the Scanning Single Shot Detector (ScanSSD) for locating math formulas offset from text and embedded in textlines. ScanSSD uses only visual features for detection: no formatting or typesetting information such as layout, font, or character labels are employed. Given a 600 dpi document page image, a Single Shot Detector (SSD) locates formulas at multiple scales using sliding windows, after which candidate detections are pooled to obtain page-level results. For our experiments we use the TFD-ICDAR2019v2 dataset, a modification of the GTDB scanned math article collection. ScanSSD detects characters in formulas with high accuracy, obtaining a 0.926 f-score, and detects formulas with high recall overall. Detection errors are largely minor, such as splitting formulas at large whitespace gaps (e.g., for variable constraints) and merging formulas on adjacent textlines. Formula detection f-scores of 0.796 (IOU $\geq0.5$) and 0.733 (IOU $\ge 0.75$) are obtained. Our data, evaluation tools, and code are publicly available.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源