- Its a method used to generate texts based on the probability
- Depends on Beam size B
- Better than Greedy Decoding as looking at multiple possibilities than 1
- when B = 1, it's Greedy Decoding
- Larger B: Better Result, Slow decoding
- Small B: Worse Result, Faster Decoding
- Beam Search is mostly used in inference, but it can be used in training [1]
Steps:
- Start with
<SOS>
token
- For each step,
- Find the top B words with most probabilities, given encoded input
X
and generated output for this time t
,
- Take the top B
- Continue to Step 2, unless
<EOS>
is generated