Gumbel Softmax

Gumbel Softmax is used mostly in the models where we need to sample discrete variables and also need to back-propagate through them. If we do usual decoding process, we would have used the argmax, which is not differentiable so we can't back-propagate through it.

The idea of the gumbel softmax is to add some noise which is sampled from Gumbel(0,1) and sample from that noise. A temperature parameter is used to control how much discrete we want the output distribution.

From the paper

As we can see from the fig,

Gumbel softmax is differentiable because it is continuous for all values >= 0

Gumbel Softmax distribution formula:

yi=exp((log(πi)+gi)/τ)jj=nexp((log(πj)+gj)/τ)π is the probability and g is the sampled gumbel τ is the temperature

References


Related Notes