Category Transformer Architecture

Deep dives into transformer components: attention, positional encoding, MoE, and more.