Greedy Decoding

#deep-learning #interview #nlp

It's a method used by the decoders to generate texts
On greedy decoding, for each step, decoder predict the word that is most likely (highest probability) given the previously generated words
Main issue is that most of the time, it doesn't give the global optimal result

Greedy Decoding Formula

${\hat{y}}^{t} = a r g m a x_{i} P_{θ} (y_{t} = w | y_{1 : t - 1}, X)$

How to use in Transformers Library:

greedy_output = model.generate(**model_inputs, max_new_tokens=40)

Related Notes