These all are very rough notes. Writing rough notes allows me share more content, since polishing takes lots of time. While I hope it's useful, it's likely lower quality and less carefully considered. It's very possible I wouldn't stand by this content if I thought about it more. So take it with a grain of salt. - Adopted from colah's blog
Also, if you find something incorrect, Please let me know by comment (comments are enabled at the end of each note)
⭐️ Featured Notes ⭐️
Recent Notes (See All)
| Sliding Window Attention |
| Multi-Head Latent Attention |
| Why do we scale attention weights? |
| Pre-Fill in LLM |