机器学习数据市场的复制稳定回报分配

论文标题

机器学习数据市场的复制稳定回报分配

Replication-Robust Payoff-Allocation for Machine Learning Data Markets

论文作者

Han, Dongge, Wooldridge, Michael, Rogers, Alex, Ohrimenko, Olga, Tschiatschek, Sebastian

论文摘要

supperular函数一直是多种现实应用程序的强大数学模型。最近，用于建模数据和功能等实体之间的诸如信息和冗余之类的概念的机器学习（ML）中越来越重要。在这些应用中，一个关键问题是回报分配，即如何评估每个实体对集体目标的重要性？为此，合作游戏理论的经典解决方案概念提供了有原则的收益分配方法。但是，尽管游戏理论文献广泛，但在研究中的回报分配相对不足。特别是，在新兴的子模型应用程序中出现的一个重要概念是冗余，这可能来自各种来源，例如丰富的数据或恶意操作，在这些来源中，玩家复制其资源并在多个身份下行动。尽管许多游戏理论解决方案概念可以直接用于子模型游戏中，但在这些设置中天真地应用它们进行回报可能会导致鲁棒性问题，以免复制。在本文中，我们系统地研究了子模型游戏中的复制操作并研究了复制鲁棒性，该指标可以定量测量解决方案概念抗复制的鲁棒性。使用此指标，我们提出的条件理论上表征了半百老汇的鲁棒性，即夏普利和班扎夫价值在内的广泛解决方案概念。此外，我们从经验上验证了我们在新兴的Subsodular ML应用程序（即ML数据市场）上验证我们的理论结果。

Submodular functions have been a powerful mathematical model for a wide range of real-world applications. Recently, submodular functions are becoming increasingly important in machine learning (ML) for modelling notions such as information and redundancy among entities such as data and features. Among these applications, a key question is payoff allocation, i.e., how to evaluate the importance of each entity towards the collective objective? To this end, classic solution concepts from cooperative game theory offer principled approaches to payoff allocation. However, despite the extensive body of game-theoretic literature, payoff allocation in submodular games are relatively under-researched. In particular, an important notion that arises in the emerging submodular applications is redundancy, which may occur from various sources such as abundant data or malicious manipulations where a player replicates its resource and act under multiple identities. Though many game-theoretic solution concepts can be directly used in submodular games, naively applying them for payoff allocation in these settings may incur robustness issues against replication. In this paper, we systematically study the replication manipulation in submodular games and investigate replication robustness, a metric that quantitatively measures the robustness of solution concepts against replication. Using this metric, we present conditions which theoretically characterise the robustness of semivalues, a wide family of solution concepts including the Shapley and Banzhaf value. Moreover, we empirically validate our theoretical results on an emerging submodular ML application, i.e., the ML data market.

下载PDF全文

下载文献需遵守相关版权规定

论文标题