论文标题
Instacart的基于嵌入的杂货搜索模型
An Embedding-Based Grocery Search Model at Instacart
论文作者
论文摘要
电子商务搜索的关键是如何最好地利用大型但嘈杂的日志数据。在本文中,我们在Instacart介绍了基于嵌入的杂货搜索模型。该系统通过基于两磅的变压器的编码器体系结构来学习查询和产品表示。为了解决冷门问题,我们专注于基于内容的功能。为了在嘈杂的数据上有效地培训模型,我们提出了一种自我逆转学习方法和级联培训方法。 Accon是一个离线人类评估数据集,我们在20@20中实现了10%的相对改善,对于在线A/B测试,我们每次搜索(CAPS)实现4.1%的Cart-Addds和1.5%的总商品价值(GMV)改进。我们描述了如何训练和部署基于嵌入的搜索模型,并详细分析了我们方法的有效性。
The key to e-commerce search is how to best utilize the large yet noisy log data. In this paper, we present our embedding-based model for grocery search at Instacart. The system learns query and product representations with a two-tower transformer-based encoder architecture. To tackle the cold-start problem, we focus on content-based features. To train the model efficiently on noisy data, we propose a self-adversarial learning method and a cascade training method. AccOn an offline human evaluation dataset, we achieve 10% relative improvement in RECALL@20, and for online A/B testing, we achieve 4.1% cart-adds per search (CAPS) and 1.5% gross merchandise value (GMV) improvement. We describe how we train and deploy the embedding based search model and give a detailed analysis of the effectiveness of our method.