do_sample (vllm vs hf)

Huggingface

In hugggingface, there is do_sample, which determines if there is probabilistic sampling or deterministic.

If do_sample==true, then it converts logits → probability distribution → multinomial sample
else, it just uses torch.argmax to take the next token
For both cases, the logits are processed by temperature using $l o g i t s = l o g i t s / t e m p e r a t u r e$

When do_sample=False, the temperature value is completely ignored

VLLM

In VLLM, there is no explicit do_sample

As seen below, if the temperature is under the threshold then its greedy, meaning do_sample==False

else, it will be probabilistic.

do_sample (vllm vs hf)

Huggingface

VLLM

References

Footnotes