GPU Fundamentals for LLM Engineers: CUDA, VRAM, and What Actually Matters
GPU architecture, CUDA basics, VRAM budgeting, and mixed precision training — the hardware fundamentals every LLM engineer needs to know.
GPU architecture, CUDA basics, VRAM budgeting, and mixed precision training — the hardware fundamentals every LLM engineer needs to know.
Build LLM agents from scratch — function calling, the ReAct pattern, structured JSON output, and the Model Context Protocol (MCP). Three notebooks covering the complete agent stack.
The complete guide to fine-tuning LLMs — from full fine-tuning to LoRA to QLoRA. Implement LoRA from scratch, use the PEFT library, and fine-tune 7B models on consumer GPUs.
Build a complete RAG pipeline from scratch — document chunking, FAISS vector search, BM25 keyword retrieval, hybrid search, and LLM generation. Full working code included.
Build a BPE tokenizer from scratch, explore production tokenizers from GPT-2 to LLaMA 3, and learn why tokenization choices affect everything from math ability to multilingual cost.
Build a complete decoder-only transformer from scratch — RMSNorm, SwiGLU, residual connections, and the full GPT architecture. Working PyTorch code included.
Build scaled dot-product attention, multi-head attention, causal masking, KV cache, and grouped query attention from scratch in PyTorch. The fundamental operation behind GPT-4, LLaMA 3, and every modern language model.
Building a Transformer from Scratch in PyTorch: A Complete Technical Guide Transformers have revolutionized machine learning, powering everything from ChatGPT to advanced computer vision models. Yet many practitioners use transformers as black boxes without understanding their inner workings. If you…