do_sample (vllm vs hf)
Huggingface
In hugggingface, there is do_sample, which determines if there is probabilistic sampling or deterministic.
If do_sample==true, then it converts logits → probability distribution → multinomial sample
else, it just uses torch.argmax to take the next token
For both cases, the logits are processed by temperature using
When do_sample=False, the temperature value is completely ignored
VLLM
In VLLM, there is no explicit do_sample
As seen below, if the temperature is under the threshold then its greedy, meaning do_sample==False

else, it will be probabilistic.