do_sample (vllm vs hf)

Huggingface

In hugggingface, there is do_sample, which determines if there is probabilistic sampling or deterministic.

If do_sample==true, then it converts logits → probability distribution → multinomial sample
else, it just uses torch.argmax to take the next token
For both cases, the logits are processed by temperature using logits=logits/temperature

When do_sample=False, the temperature value is completely ignored

VLLM

In VLLM, there is no explicit do_sample

As seen below, if the temperature is under the threshold then its greedy, meaning do_sample==False

else, it will be probabilistic.


References

Footnotes