Large Language Models are Zero-Shot Rankers for Recommender Systems

Summary

In this paper, the authors have tried to use Pre-trained LLMs as a zero-shot recommendation system. For that, authors have formalized the recommendation problem as a condition recommendation problem, where the condition is dependent upon the previous human interactions with the videos.

The whole procedure has in total 3 parts: (1) H = Given the sequence of previous user interactions (2) C = Candidates for the ranking (3) Final ranking module.

Authors have found out several issues and potential solution for using LLM as a recommendation model:

LLMs struggle to perceive the order

In the first module, given the sequence of interaction in the prompt the authors have found out that there is no difference in performance given the sequence in order or random. LLMs doesn't see the interactions as sequence but look at a whole. The authors have proposed multiple solutions:

Recency-focused prompting: In that, in addition to providing the sequence the authors have provided the most recent one as another line at the end.

I’ve watched the following movies in the past in order: ’0. Multiplicity’, ’1. Jurassic Park’, . . .. Note that my most recently watched movie is Dead Presidents. . . .

But my assumption, is that even using that the model might just be using the latest one as an additional information and don't look at the order at all. Authors have not provided any comparison on random order + recency prompting vs sequential order + recency prompting.

In-context Learning: In that, authors have provided some in-context examples

What did authors try to accomplish?

What are the key elements of the approach?

What can I use for myself?

What are the other references I can read next?

3+ Most Important Things

1+ Deficiencies

3+ New Ideas

Annotations

Annotation

« We first formalizethe recommendation problem as a conditional ranking task, »()

Annotation

« consideringsequential interaction histories as conditions and the items retrieved by other candidate generation models as candidates. »()

Annotation

« Overall, we attempt to answer thefollowing key questions:– What factors affect the zero-shot ranking performance of LLMs?– What data or knowledge do LLMs rely on for recommendation? »(2)

Annotation

« where user ratings are regarded as interactions, »(5)

Annotation

« where reviewsare regarded as interactions »(5)

Annotation

« We filter out users and items with fewer than fiveinteractions. »(5)

Annotation

« The hyperparameter temperature of calling LLMsis set to 0.2. »(5)

Annotation

« In LLM-based methods, historical interactions are naturally arranged in an ordered sequence »(6)

Annotation

« LLMs struggle to perceive the order of given historical user behaviors »(6)

Annotation

« indicating that LLMs are not sensitive to the order of the given historical user interactions. »(6)

Annotation

« Therefore too many historical user behaviors (e.g., |H| = 50) may overwhelm LLMs and lead to a performance drop. »(6)

Annotation

« though the best strategy may vary on different datasets »(7)

Annotation

« Observation 1. LLMs struggle to perceive the order of the given sequential interaction histories. By employing specifically designed promptings, LLMs can be triggered to perceive the order of historical user behaviors, leading to improved ranking performance. »(7)

Annotation

Strategy 1: Give the recency prompt like at the end say explicitly which one was watched last. But I am expecting the LLM is only using that information rather than using whole sequence. A good idea is to use random sequence with that recency prompt to check the effect.

Strategy 2: give examples how the model should use the sequence to predict the next one. Though it seems like a good idea for result, I dont think it has much effect on learning sequence order. Examples or ICL always helps LLM.

Annotation

« It has been shown that LLMs are generally sensitive to the order of examples in the prompts for NLP tasks »(8)

Annotation

« In this way, one candidate may appear in different positions. We then merge the results of each round to derive the final ranking »(8)

Annotation

« Observation 2. LLMs suffer from position bias and popularity bias while ranking, which can be alleviated by bootstrapping or specially designed prompting strategies. »(9)

Annotation

« We would emphasize that the goal of evaluating zero-shot recommendation methods is not to surpass conventional models. The goal is to demonstrate the strong recommendation capabilities of pre-trained base models, which can be further adapted and transferred to downstream scenarios. »(10)

Metadata

Date : 01-24-2024

Authors : Yupeng Hou, Junjie Zhang, Zihan Lin, Hongyu Lu, Ruobing Xie, Julian McAuley, Wayne Xin Zhao

Paper Link : http://arxiv.org/abs/2305.08845

Zotero Link: Preprint PDF

Tags : ##now, ##p1

Citation : @article{Hou_Zhang_Lin_Lu_Xie_McAuley_Zhao_2024, title={Large Language Models are Zero-Shot Rankers for Recommender Systems}, url={http://arxiv.org/abs/2305.08845}, DOI={10.48550/arXiv.2305.08845}, abstractNote={Recently, large language models (LLMs) (e.g., GPT-4) have demonstrated impressive general-purpose task-solving abilities, including the potential to approach recommendation tasks. Along this line of research, this work aims to investigate the capacity of LLMs that act as the ranking model for recommender systems. We first formalize the recommendation problem as a conditional ranking task, considering sequential interaction histories as conditions and the items retrieved by other candidate generation models as candidates. To solve the ranking task by LLMs, we carefully design the prompting template and conduct extensive experiments on two widely-used datasets. We show that LLMs have promising zero-shot ranking abilities but (1) struggle to perceive the order of historical interactions, and (2) can be biased by popularity or item positions in the prompts. We demonstrate that these issues can be alleviated using specially designed prompting and bootstrapping strategies. Equipped with these insights, zero-shot LLMs can even challenge conventional recommendation models when ranking candidates are retrieved by multiple candidate generators. The code and processed datasets are available at https://github.com/RUCAIBox/LLMRank.}, note={arXiv:2305.08845}, number={arXiv:2305.08845}, publisher={arXiv}, author={Hou, Yupeng and Zhang, Junjie and Lin, Zihan and Lu, Hongyu and Xie, Ruobing and McAuley, Julian and Zhao, Wayne Xin}, year={2024}, month=jan }

Summary

LLMs struggle to perceive the order

What did authors try to accomplish?

What are the key elements of the approach?

What can I use for myself?

What are the other references I can read next?

3+ Most Important Things

1+ Deficiencies

3+ New Ideas

Annotations

Related Notes