Building a GPT From Scratch in C# - Introduction

#csharp #machinelearning #transformers #tutorial

Why This Course Exists

I'm on a journey to deepen my understanding of AI, and I wanted to properly learn how the transformer architecture works - not just at a hand-wavy conceptual level, but well enough to build one from scratch.

The problem was that most tutorials weren't clicking for me, and it came down to three things.

They're written in Python. I'm a C# developer. Python isn't hard to read, but working in an unfamiliar language adds cognitive load in exactly the wrong place. You end up spending mental energy on syntax and idioms instead of on the concepts you're trying to learn.
They lean on libraries like NumPy, PyTorch, and Hugging Face. Powerful tools, but when a single function call hides an entire matrix multiplication or an attention computation, you don't really understand what's happening underneath. You're learning the API, not the algorithm.
They assume you're comfortable with calculus. I'm not. When a tutorial casually drops derivative notation and expects you to follow along, that's another barrier that has nothing to do with understanding how a transformer works.

So I wanted a tutorial that removed all three of those barriers. One written in C#, with zero external dependencies, where every operation is visible in the code, and where concepts like gradients are explained through practical techniques you can run and verify - not through math notation you're expected to already know.

I couldn't find one that fit, so I built this course to fill that gap. If you're a C# developer who wants to understand transformers at the implementation level, without needing a math degree to get there, this is for you.

Course Map

The course builds a complete GPT-style language model from scratch in C#, with zero ML framework dependencies. By the end, you'll have a working character-level language model that learns patterns from text and generates new, plausible-sounding text.

Every chapter produces runnable code that builds on the previous one. The concepts layer like this:

Chapter 0:  Project Setup - creating the project and file structure
Chapter 1:  Value - a single number that tracks how it was computed
Chapter 2:  Backward - teaching Value to compute gradients automatically  
Chapter 3:  Tokenizer - turning text into numbers and back
Chapter 4:  Bigram Model - the simplest possible "language model" (no neural net)
Chapter 5:  Linear + Softmax - the two workhorses of neural networks
Chapter 6:  Embeddings + Loss - giving tokens a learned identity
Chapter 7:  Training Loop + Adam - making the model learn
Chapter 8:  RMSNorm + Residuals - stabilising deep networks
Chapter 9:  Attention - the mechanism that lets tokens "look at" each other
Chapter 10: Multi-Head Attention + MLP - the full Transformer block
Chapter 11: Full GPT - assembling everything into a model class
Chapter 12: Inference - generating new text from the trained model

Each chapter tells you what it depends on, what it adds, and what the code should do when you run it.

Project File Structure

By the end of the course, your project will contain these files:

MicroGPT/
├── MicroGPT.csproj       Created by dotnet CLI
├── input.txt              Training dataset (32K names)
│
│   ── Core files (permanent) ──────────────────────────
│
├── Value.cs               The computation recorder (Ch 1-2)
├── GradientCheck.cs       Nudge-test verification tool (Ch 1)
├── Tokenizer.cs           Text-to-numbers conversion (Ch 3)
├── BigramModel.cs         Simplest possible language model (Ch 4)
├── Helpers.cs             Pure math functions (Ch 5, 6, 8)
├── Model.cs               The GPT model (Ch 11)
├── AdamOptimiser.cs       Reusable Adam optimiser (Ch 11)
├── Program.cs             Entry point - final training + inference (Ch 11-12)
│
│   ── Chapter exercises (created as you go) ───────────
│
├── Chapter1Exercise.cs    Verify Value operations and gradient checking
├── Chapter2Exercise.cs    Verify backward pass computes correct gradients
├── Chapter3Exercise.cs    Verify tokenization encode/decode
├── Chapter4Exercise.cs    Run the bigram baseline model
├── Chapter5Exercise.cs    Verify Softmax produces valid probabilities
├── Chapter6Exercise.cs    Embeddings, forward pass, and loss
├── Chapter7Exercise.cs    Training loop with simplified model
├── Chapter8Exercise.cs    Verify RmsNorm
├── Chapter9Exercise.cs    Hand-crafted single-head attention demo
├── Chapter10Exercise.cs   Multi-head attention + MLP block demo
│
└── FullTraining.cs        The full training loop and inference (Ch 11-12)

The core files build up over the course and stay permanently. The exercise files are self-contained - each one has a Run() method.

Program.cs is a small dispatcher that routes to whichever chapter you want to run: dotnet run -- ch3 runs Chapter 3, dotnet run -- full runs the final training and inference from Chapters 11-12. No args defaults to full. You'll build the dispatcher in Chapter 0 (just the stub) and fill in each case as the course progresses.

Reference implementation

The complete source code for this course lives at Garyljackson/GPT-From-Scratch-CSharp on GitHub. You can clone it to follow along, or check your work against it as you go.

DEV Community

Building a GPT From Scratch in C# - Introduction

Why This Course Exists

Course Map

Project File Structure

Reference implementation

Top comments (0)