才能：联合学习带注释的图像存储库

论文标题

才能：联合学习带注释的图像存储库

FLAIR: Federated Learning Annotated Image Repository

论文作者

Song, Congzheng, Granqvist, Filip, Talwar, Kunal

论文摘要

联合学习的跨设备是一种新兴的机器学习（ML）范式，其中大量设备共同训练ML模型，而数据仍保留在设备上。该研究领域有一系列独特的实践挑战，为了系统地取得进步，需要策划的新数据集与此范式兼容。图像域中的现有联合学习基准不能准确捕获许多现实世界中用例的规模和异质性。我们介绍了Flair，这是一个具有挑战性的大规模注释图像数据集，用于用于联合学习的多标签分类。弗莱尔（Flair）拥有来自51,414个Flickr用户的429,078张图像，并捕获了联合学习中通常遇到的许多复杂性，例如异质用户数据和长尾标签分布。我们在此数据集上的不同任务中实现了不同的学习设置中的多个基线。我们认为，天赋可以作为推进联邦学习最先进的具有挑战性的基准。数据集访问和基准的代码可在\ url {https://github.com/apple/ml-flair}上获得。

Cross-device federated learning is an emerging machine learning (ML) paradigm where a large population of devices collectively train an ML model while the data remains on the devices. This research field has a unique set of practical challenges, and to systematically make advances, new datasets curated to be compatible with this paradigm are needed. Existing federated learning benchmarks in the image domain do not accurately capture the scale and heterogeneity of many real-world use cases. We introduce FLAIR, a challenging large-scale annotated image dataset for multi-label classification suitable for federated learning. FLAIR has 429,078 images from 51,414 Flickr users and captures many of the intricacies typically encountered in federated learning, such as heterogeneous user data and a long-tailed label distribution. We implement multiple baselines in different learning setups for different tasks on this dataset. We believe FLAIR can serve as a challenging benchmark for advancing the state-of-the art in federated learning. Dataset access and the code for the benchmark are available at \url{https://github.com/apple/ml-flair}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题