论文标题

SYNBOL:使用合成数据集探测学习算法

Synbols: Probing Learning Algorithms with Synthetic Datasets

论文作者

Lacoste, Alexandre, Rodríguez, Pau, Branchaud-Charron, Frédéric, Atighehchian, Parmida, Caccia, Massimo, Laradji, Issam, Drouin, Alexandre, Craddock, Matt, Charlin, Laurent, Vázquez, David

论文摘要

通过引入基准数据集的引入,推动了现有算法的限制。因此,使数据集的设计能够测试特定的属性和学习算法的故障模式,这是一个高度兴趣的问题,因为它直接影响了该领域的创新。从这个意义上讲,我们介绍了Synbols(合成符号) - 一种工具,用于快速生成新的数据集,这些数据集具有低分辨率图像中呈现的潜在特征的丰富组成。 Synbols利用了Unicode标准和开放字体社区提供的广泛的艺术字体中可用的大量符号。我们的工具的高级接口提供了一种语言,用于快速生成潜在特征的新分布,包括各种类型的纹理和遮挡。为了展示Synbols的多功能性,我们使用它来剖析各种学习设置中标准学习算法中的局限性和缺陷,包括监督学习,主动学习,出于分布概括,无监督的表示的表示和对象计数。

Progress in the field of machine learning has been fueled by the introduction of benchmark datasets pushing the limits of existing algorithms. Enabling the design of datasets to test specific properties and failure modes of learning algorithms is thus a problem of high interest, as it has a direct impact on innovation in the field. In this sense, we introduce Synbols -- Synthetic Symbols -- a tool for rapidly generating new datasets with a rich composition of latent features rendered in low resolution images. Synbols leverages the large amount of symbols available in the Unicode standard and the wide range of artistic font provided by the open font community. Our tool's high-level interface provides a language for rapidly generating new distributions on the latent features, including various types of textures and occlusions. To showcase the versatility of Synbols, we use it to dissect the limitations and flaws in standard learning algorithms in various learning setups including supervised learning, active learning, out of distribution generalization, unsupervised representation learning, and object counting.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源