#LLM
10 total
- Externalization in LLM Agents: Cognitive Artifacts for Agent Engineering
A reading of Externalization in LLM Agents through the lens of cognitive artifacts: agent progress is increasingly about moving memory, skills, protocols, and runtime governance outside the model.
- AutoCodeBench: When LLMs Generate Code Benchmarks
Why the Elixir column stands out in AutoCodeBench, and how it opens up a discussion of difficulty equivalence in automatically generated multilingual code benchmarks
- Attention Residuals: Making Residual Connections Attention-Like
A reading of Kimi Team's Attention Residuals technical report: why residual connections should become attention-like too, and how Full AttnRes / Block AttnRes turn that idea into a trainable, deployable system
- Training Compute-Optimal Large Language Models: What Chinchilla Changed
The Chinchilla paper — why most large models were undertrained, and how to spend your compute budget wisely, with real Python code examples
- Scaling Laws for Neural Language Models: The Mathematics of Scale
The mathematics of scale — why bigger models are predictably better, with real Python code examples
- Language Models are Few-Shot Learners: GPT-3 and In-Context Learning
Larger models, better at eliciting abilities from context, with real Python code examples
- BERT: The Pre-Training Blueprint for Language Understanding
Establishing the pre-training paradigm, with real Python code examples
- Sequence to Sequence Learning: The Encoder-Decoder Blueprint
Establishing the encoder-decoder paradigm, with real Python code examples
- Neural Machine Translation by Jointly Learning to Align and Translate: Attention Before Transformers
The origin of attention mechanism, with real Python code examples
- Attention Is All You Need: The Transformer Blueprint
A study note on the Transformer paper, with real Python code examples