Pre-Fill in LLM

#llm #interview

Pre-fill is the initial phase of the inference where the inference engine (i.e. vllm) creates the query, key, value of the whole prompt and save it into the KV Cache.

References

Related Notes

Layer Normalization
Feature Hashing
Parallelism in LLM
Decoder Only Transformer
ROUGE-N Score