Large Language Models are Zero-Shot Rankers for Recommender Systems

Summary

In this paper, the authors have tried to use Pre-trained LLMs as a zero-shot recommendation system. For that, authors have formalized the recommendation problem as a condition recommendation problem, where the condition is dependent upon the previous human interactions with the videos.

The whole procedure has in total 3 parts: (1) H = Given the sequence of previous user interactions (2) C = Candidates for the ranking (3) Final ranking module.

Authors have found out several issues and potential solution for using LLM as a recommendation model:

LLMs struggle to perceive the order

In the first module, given the sequence of interaction in the prompt the authors have found out that there is no difference in performance given the sequence in order or random. LLMs doesn't see the interactions as sequence but look at a whole. The authors have proposed multiple solutions:

  1. Recency-focused prompting: In that, in addition to providing the sequence the authors have provided the most recent one as another line at the end.
I’ve watched the following movies in the past in order: ’0. Multiplicity’, ’1. Jurassic Park’, . . .. Note that my most recently watched movie is Dead Presidents. . . .

But my assumption, is that even using that the model might just be using the latest one as an additional information and don't look at the order at all. Authors have not provided any comparison on random order + recency prompting vs sequential order + recency prompting.

  1. In-context Learning: In that, authors have provided some in-context examples

What did authors try to accomplish?

What are the key elements of the approach?

What can I use for myself?

3+ Most Important Things

1+ Deficiencies

3+ New Ideas

Annotations

Annotation

« We first formalizethe recommendation problem as a conditional ranking task, »()

Annotation

« consideringsequential interaction histories as conditions and the items retrieved by other candidate generation models as candidates. »()

Annotation

« Overall, we attempt to answer thefollowing key questions:– What factors affect the zero-shot ranking performance of LLMs?– What data or knowledge do LLMs rely on for recommendation? »(2)

Annotation

Annotation

« where user ratings are regarded as interactions, »(5)

Annotation

« where reviewsare regarded as interactions »(5)

Annotation

« We filter out users and items with fewer than fiveinteractions. »(5)

Annotation

« The hyperparameter temperature of calling LLMsis set to 0.2. »(5)

Annotation

« In LLM-based methods, historical interactions are naturally arranged in an ordered sequence »(6)

Annotation

« LLMs struggle to perceive the order of given historical user behaviors »(6)

Annotation

« indicating that LLMs are not sensitive to the order of the given historical user interactions. »(6)

Annotation

« Therefore too many historical user behaviors (e.g., |H| = 50) may overwhelm LLMs and lead to a performance drop. »(6)

Annotation

Annotation

« though the best strategy may vary on different datasets »(7)

Annotation

« Observation 1. LLMs struggle to perceive the order of the given sequential interaction histories. By employing specifically designed promptings, LLMs can be triggered to perceive the order of historical user behaviors, leading to improved ranking performance. »(7)

Annotation

Strategy 1: Give the recency prompt like at the end say explicitly which one was watched last. But I am expecting the LLM is only using that information rather than using whole sequence. A good idea is to use random sequence with that recency prompt to check the effect.

Strategy 2: give examples how the model should use the sequence to predict the next one. Though it seems like a good idea for result, I dont think it has much effect on learning sequence order. Examples or ICL always helps LLM.

Annotation

« It has been shown that LLMs are generally sensitive to the order of examples in the prompts for NLP tasks »(8)

Annotation

« In this way, one candidate may appear in different positions. We then merge the results of each round to derive the final ranking »(8)

Annotation

« Observation 2. LLMs suffer from position bias and popularity bias while ranking, which can be alleviated by bootstrapping or specially designed prompting strategies. »(9)

Annotation

« We would emphasize that the goal of evaluating zero-shot recommendation methods is not to surpass conventional models. The goal is to demonstrate the strong recommendation capabilities of pre-trained base models, which can be further adapted and transferred to downstream scenarios. »(10)


Related Notes