Semantic Product Search for Matching Structured Product Catalogs in E-Commerce
Summary
The problem with product search is that product unlike other search has different attributes, i.e, dimension, color which can't be represented like a text, which makes it hard to embed a product to latent space. In this paper, the authors have used all the field embedding and aggregate them to generate single vector embedding. In information retrieval, there are typically 2 stages (a) candidate generation & (b) Re-ranking. In this paper, the authors have worked on the former one, so the ranking doesn't matter and recall matters a lot.
Model Architecture:
So each query goes to a transformer block to generate a
Fields in Product
For each product, the authors have used following fields to express the product:
Annotations
« his method has shown to be effective compared to directly using the vector for [CLS] token or max-pooling »(2)
Date : 08-18-2020
Authors : Jason Ingyu Choi, Surya Kallumadi, Bhaskar Mitra, Eugene Agichtein, Faizan Javed
Paper Link : http://arxiv.org/abs/2008.08180
Zotero Link: Preprint PDF
Citation : @article{Choi_Kallumadi_Mitra_Agichtein_Javed_2020, title={Semantic Product Search for Matching Structured Product Catalogs in E-Commerce}, url={http://arxiv.org/abs/2008.08180}, DOI={10.48550/arXiv.2008.08180}, abstractNote={Retrieving all semantically relevant products from the product catalog is an important problem in E-commerce. Compared to web documents, product catalogs are more structured and sparse due to multi-instance fields that encode heterogeneous aspects of products (e.g. brand name and product dimensions). In this paper, we propose a new semantic product search algorithm that learns to represent and aggregate multi-instance fields into a document representation using state of the art transformers as encoders. Our experiments investigate two aspects of the proposed approach: (1) effectiveness of field representations and structured matching; (2) effectiveness of adding lexical features to semantic search. After training our models using user click logs from a well-known E-commerce platform, we show that our results provide useful insights for improving product search. Lastly, we present a detailed error analysis to show which types of queries benefited the most by fielded representations and structured matching.}, note={arXiv:2008.08180}, number={arXiv:2008.08180}, publisher={arXiv}, author={Choi, Jason Ingyu and Kallumadi, Surya and Mitra, Bhaskar and Agichtein, Eugene and Javed, Faizan}, year={2020}, month=aug }