Tag self-attention

GPU & Hardware

GPU Fundamentals for LLM Engineers: CUDA, VRAM, and What Actually Matters

GPU architecture, CUDA basics, VRAM budgeting, and mixed precision training — the hardware fundamentals every LLM engineer needs to know.

singularitytheai@gmail.com
March 30, 2026

Agents

Building LLM Agents: Function Calling, Structured Output, and MCP

Build LLM agents from scratch — function calling, the ReAct pattern, structured JSON output, and the Model Context Protocol (MCP). Three notebooks covering the complete agent stack.

singularitytheai@gmail.com
March 30, 2026

Fine-Tuning

Fine-Tuning LLMs: Full Fine-Tuning vs LoRA vs QLoRA

The complete guide to fine-tuning LLMs — from full fine-tuning to LoRA to QLoRA. Implement LoRA from scratch, use the PEFT library, and fine-tune 7B models on consumer GPUs.

singularitytheai@gmail.com
March 30, 2026

Building a RAG Pipeline from Scratch: FAISS, BM25, and Hybrid Search

Build a complete RAG pipeline from scratch — document chunking, FAISS vector search, BM25 keyword retrieval, hybrid search, and LLM generation. Full working code included.

singularitytheai@gmail.com
March 30, 2026

NLP Foundations

Tokenization from Scratch: BPE, SentencePiece, and Why It Matters

Build a BPE tokenizer from scratch, explore production tokenizers from GPT-2 to LLaMA 3, and learn why tokenization choices affect everything from math ability to multilingual cost.

singularitytheai@gmail.com
March 30, 2026

Transformer Architecture

Building a Transformer Block: From Attention to Complete GPT

Build a complete decoder-only transformer from scratch — RMSNorm, SwiGLU, residual connections, and the full GPT architecture. Working PyTorch code included.

singularitytheai@gmail.com
March 30, 2026

Transformer Architecture

Self-Attention from Scratch: The Core of Every LLM

Build scaled dot-product attention, multi-head attention, causal masking, KV cache, and grouped query attention from scratch in PyTorch. The fundamental operation behind GPT-4, LLaMA 3, and every modern language model.

singularitytheai@gmail.com
March 30, 2026