Google Landmarks数据集V2-用于实例级识别和检索的大规模基准

论文标题

Google Landmarks数据集V2-用于实例级识别和检索的大规模基准

Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval

论文作者

Weyand, Tobias, Araujo, Andre, Cao, Bingyi, Sim, Jack

论文摘要

尽管图像检索和实例识别技术正在迅速发展，但需要具有挑战性的数据集来准确衡量其性能 - 同时提出与实际应用相关的新颖挑战。我们介绍了Google Landmarks Dataset V2（GLDV2），这是一种用于在人制和自然地标领域中大规模，细粒实例识别和图像检索的新基准。 GLDV2是迄今为止最大的此类数据集，其中包括超过5M图像和200K不同的实例标签。它的测试集由118K图像组成，带有基础真理注释，用于检索和识别任务。地面真相构造涉及超过800个小时的人类注释器工作。我们的新数据集具有以前数据集未考虑的现实世界应用程序启发的几个具有挑战性的属性：非常长的类分布，很大一部分的室外测试照片和较大的阶级内变异性。该数据集来自Wikimedia Commons，Wikimedia Commons是世界上最大的Landmark照片集合。我们根据最先进的方法以及公共挑战的竞争成果为识别和检索任务提供基线结果。我们进一步证明了数据集对转移学习的适用性，通过证明对其进行训练的图像嵌入式可以在独立数据集上实现竞争性检索性能。可在https://github.com/cvdfoundation/google-landmark上获得数据集图像，地面真相和公制评分代码。

While image retrieval and instance recognition techniques are progressing rapidly, there is a need for challenging datasets to accurately measure their performance -- while posing novel challenges that are relevant for practical applications. We introduce the Google Landmarks Dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval in the domain of human-made and natural landmarks. GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels. Its test set consists of 118k images with ground truth annotations for both the retrieval and recognition tasks. The ground truth construction involved over 800 hours of human annotator work. Our new dataset has several challenging properties inspired by real world applications that previous datasets did not consider: An extremely long-tailed class distribution, a large fraction of out-of-domain test photos and large intra-class variability. The dataset is sourced from Wikimedia Commons, the world's largest crowdsourced collection of landmark photos. We provide baseline results for both recognition and retrieval tasks based on state-of-the-art methods as well as competitive results from a public challenge. We further demonstrate the suitability of the dataset for transfer learning by showing that image embeddings trained on it achieve competitive retrieval performance on independent datasets. The dataset images, ground-truth and metric scoring code are available at https://github.com/cvdfoundation/google-landmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题