一种新的统一方法，用于检测视频中马拉松赛跑者和体育运动员的文字

论文标题

一种新的统一方法，用于检测视频中马拉松赛跑者和体育运动员的文字

A New Unified Method for Detecting Text from Marathon Runners and Sports Players in Video

论文作者

Nag, Sauradip, Shivakumara, Palaiahnakote, Pal, Umapada, Lu, Tong, Blumenstein, Michael

论文摘要

由于柔性/五颜六色的衣服以及人体或动作的不同结构引起的质量差和不利影响，检测到马拉松赛跑者和体育播放器的Torsos上的文本和视频中的体育运动是一个具有挑战性的问题。本文提出了一种应对上述挑战的新统一方法。提出的方法将文本像素以一种检测候选区域的新方式融合了梯度幅度和方向相干性。候选区域用于确定通过k均值聚类在框架差异上获得的时间框架簇的数量。此过程依次检测关键帧。所提出的方法使用像素和时间框架的组件水平探索了贝叶斯对皮肤部分的可能性，这些颜色值都提供了带有皮肤成分的融合图像。基于皮肤信息，该方法通过在它们之间找到结构和空间相干来检测面部和躯干。我们进一步提出了自适应像素，将深度学习模型连接起来以从躯干区域进行文本检测。该方法在我们自己的数据集中测试了从马拉松/体育视频和三个标准数据集收集的数据集，即马拉松图像的RBNR，MMM和R-ID，以评估性能。此外，还在标准的自然场景数据集上测试了所提出的方法，即CTW1500和MS-Coco文本数据集，以显示所提出的方法的客观性。关于BIB编号/文本检测不同数据集的最新方法的比较研究表明，所提出的方法的表现优于现有方法。

Detecting text located on the torsos of marathon runners and sports players in video is a challenging issue due to poor quality and adverse effects caused by flexible/colorful clothing, and different structures of human bodies or actions. This paper presents a new unified method for tackling the above challenges. The proposed method fuses gradient magnitude and direction coherence of text pixels in a new way for detecting candidate regions. Candidate regions are used for determining the number of temporal frame clusters obtained by K-means clustering on frame differences. This process in turn detects key frames. The proposed method explores Bayesian probability for skin portions using color values at both pixel and component levels of temporal frames, which provides fused images with skin components. Based on skin information, the proposed method then detects faces and torsos by finding structural and spatial coherences between them. We further propose adaptive pixels linking a deep learning model for text detection from torso regions. The proposed method is tested on our own dataset collected from marathon/sports video and three standard datasets, namely, RBNR, MMM and R-ID of marathon images, to evaluate the performance. In addition, the proposed method is also tested on the standard natural scene datasets, namely, CTW1500 and MS-COCO text datasets, to show the objectiveness of the proposed method. A comparative study with the state-of-the-art methods on bib number/text detection of different datasets shows that the proposed method outperforms the existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题