样本级：使用拒绝采样的弱多目标建议

论文标题

样本级：使用拒绝采样的弱多目标建议

Sample-Rank: Weak Multi-Objective Recommendations Using Rejection Sampling

论文作者

Shukla, Abhay, Sathyanarayana, Jairaj, Banerjee, Dipyaman

论文摘要

在线食品订购市场是多利益相关者系统，建议会影响系统中每个参与者的经验和成长。此设置中的推荐系统必须封装不同利益相关者的目标和约束，以便找到项目的实用性以供推荐。基于约束优化的方法通常涉及复杂的配方，并且在涉及数百万个实体的生产环境中具有很高的计算复杂性。简化和放松技术（例如，标量）有助于引入亚典型性，并且由于所需的调整量可能会耗时。在本文中，我们介绍了一种涉及多目标抽样的方法，然后对用户 - 权利化（样本级）进行排名，以推动针对市场的多目标（MO）目标的建议。拟议的方法的新颖性是，它将MO推荐问题减少到从所需的多目标分布中进行采样，然后使用它来构建生产友好的学习范围（LTR）模型。在离线实验中，我们表明我们能够对MO标准偏向建议，并在AUC和NDCG等指标中可接受的权衡。我们还显示了大型在线A/B实验的结果，该方法在统计学上具有统计学上的显着提升为2.64％的平均每订单（RPO）（目标＃1），而转换率没有下降（CR）（目标＃2），而平均最后一英里的最后一英里段则越过平均（目标＃3）（目标＃3），vs.基线排名方法。此方法还大大减少了在MO设置中建模开发和部署的时间，并允许对更多的目标和其他类型的LTR模型进行琐碎的扩展。

Online food ordering marketplaces are multi-stakeholder systems where recommendations impact the experience and growth of each participant in the system. A recommender system in this setting has to encapsulate the objectives and constraints of different stakeholders in order to find utility of an item for recommendation. Constrained-optimization based approaches to this problem typically involve complex formulations and have high computational complexity in production settings involving millions of entities. Simplifications and relaxation techniques (for example, scalarization) help but introduce sub-optimality and can be time-consuming due to the amount of tuning needed. In this paper, we introduce a method involving multi-goal sampling followed by ranking for user-relevance (Sample-Rank), to nudge recommendations towards multi-objective (MO) goals of the marketplace. The proposed method's novelty is that it reduces the MO recommendation problem to sampling from a desired multi-goal distribution then using it to build a production-friendly learning-to-rank (LTR) model. In offline experiments we show that we are able to bias recommendations towards MO criteria with acceptable trade-offs in metrics like AUC and NDCG. We also show results from a large-scale online A/B experiment where this approach gave a statistically significant lift of 2.64% in average revenue per order (RPO) (objective #1) with no drop in conversion rate (CR) (objective #2) while holding the average last-mile traversed flat (objective #3), vs. the baseline ranking method. This method also significantly reduces time to model development and deployment in MO settings and allows for trivial extensions to more objectives and other types of LTR models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题