GRPO


References

  1. https://yugeten.github.io/posts/2025/01/ppogrpo/
  2. https://huggingface.co/blog/NormalUhr/grpo
  3. https://medium.com/@sulbha.jindal/proximal-policy-optimization-ppo-vs-group-relative-policy-optimization-grpo-988fa7af0241
  4. https://arxiv.org/abs/2402.03300

Related Notes