关于场景文本识别模型的校准

论文标题

关于场景文本识别模型的校准

On Calibration of Scene-Text Recognition Models

论文作者

Slossberg, Ron, Anschel, Oron, Markovitz, Amir, Litman, Ron, Aberdam, Aviad, Tsiper, Shahar, Mazor, Shai, Wu, Jon, Manmatha, R.

论文摘要

在这项工作中，我们研究了场景文本识别（STR）的单词级信心校准问题。尽管在过去的几十年中，置信度校准的主题一直是一个活跃的研究领域，但几乎没有探索结构化和序列预测校准的情况。我们分析了最近的几种STR方法，并表明它们始终过于自信。然后，我们专注于在单词而不是字符级别上对STR模型的校准。特别是，我们证明，对于基于注意的解码器，与未校准模型相比，单个字符预测的校准增加了单词级校准误差。此外，我们将现有的校准方法以及新的基于序列的扩展应用于众多STR模型，显示校准误差减少了近7倍。最后，我们通过将我们提出的序列校准方法作为临时级别的步骤应用于Beam-Search，显示出始终如一地提高了准确性结果。

In this work, we study the problem of word-level confidence calibration for scene-text recognition (STR). Although the topic of confidence calibration has been an active research area for the last several decades, the case of structured and sequence prediction calibration has been scarcely explored. We analyze several recent STR methods and show that they are consistently overconfident. We then focus on the calibration of STR models on the word rather than the character level. In particular, we demonstrate that for attention based decoders, calibration of individual character predictions increases word-level calibration error compared to an uncalibrated model. In addition, we apply existing calibration methodologies as well as new sequence-based extensions to numerous STR models, demonstrating reduced calibration error by up to a factor of nearly 7. Finally, we show consistently improved accuracy results by applying our proposed sequence calibration method as a preprocessing step to beam-search.

下载PDF全文

下载文献需遵守相关版权规定

论文标题