大规模图像数据集的复杂度度量

论文标题

大规模图像数据集的复杂度度量

Measures of Complexity for Large Scale Image Datasets

论文作者

Rahane, Ameet Annasaheb, Subramanian, Anbumani

论文摘要

大型图像数据集是机器学习领域的增长趋势。但是，很难定量理解或指定各种数据集相互比较的方式 - 即，如果一个数据集更复杂或更难就基于深度学习的网络进行``学习''。在这项工作中，我们构建了一系列相对计算的简单方法来测量数据集的复杂性。此外，我们提出了一种证明高维数据可视化的方法，以帮助对数据集进行视觉比较。我们使用来自自主驾驶研究社区的四个数据集 - CityScapes，IDD，BDD和Vistas进行了分析。使用基于熵的指标，我们提出了这些数据集的排序复杂性，我们将其与已建立的等级顺序相比，相对于深度学习。

Large scale image datasets are a growing trend in the field of machine learning. However, it is hard to quantitatively understand or specify how various datasets compare to each other - i.e., if one dataset is more complex or harder to ``learn'' with respect to a deep-learning based network. In this work, we build a series of relatively computationally simple methods to measure the complexity of a dataset. Furthermore, we present an approach to demonstrate visualizations of high dimensional data, in order to assist with visual comparison of datasets. We present our analysis using four datasets from the autonomous driving research community - Cityscapes, IDD, BDD and Vistas. Using entropy based metrics, we present a rank-order complexity of these datasets, which we compare with an established rank-order with respect to deep learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题