#transformer
2 total
- Attention Residuals: Making Residual Connections Attention-Like
A reading of Kimi Team's Attention Residuals technical report: why residual connections should become attention-like too, and how Full AttnRes / Block AttnRes turn that idea into a trainable, deployable system
- Attention Is All You Need: The Transformer Blueprint
A study note on the Transformer paper, with real Python code examples