KL Divergence
- KL-Divergence = Kullback-Leibler Divergence
- KL divergence in it's core is the ration of two probability distribution
- Very common in Reinforcement Learning from Human Feedback (RLHF) algorithms (i.e Proximal Policy Optimization (PPO), GRPO, KTO)
- So,
KL Divergence = ?