Chapter 0: Project Setup

#csharp #machinelearning #transformers #tutorial

What You'll Build

A .NET console project that's ready to run, set up the way the rest of the course expects, with the training dataset downloaded.

Prerequisites

You'll need the .NET 10 SDK (or later) installed. Check with:

dotnet --version

If you don't have it, download it from https://dotnet.microsoft.com/download.

Create the Project

Open a terminal and run:

# Create a new console application
dotnet new console -n MicroGPT -f net10.0

# Move into the project directory
cd MicroGPT

This creates a MicroGPT.csproj file and a Program.cs with a "Hello, World!" placeholder.

Download the Training Data

The dataset is a text file with ~32,000 human names, one per line. Download it into the project directory:

# Linux / macOS
curl -o input.txt https://raw.githubusercontent.com/karpathy/makemore/refs/heads/master/names.txt

# Windows (PowerShell)
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/karpathy/makemore/refs/heads/master/names.txt" -OutFile "input.txt"

A Note for Visual Studio Users

If you plan to run the project from Visual Studio rather than dotnet run at the command line, do this now or the program won't find input.txt. Visual Studio runs your application from the bin/Debug/net10.0/ folder, not the project root, so the dataset needs to sit alongside the compiled binary.

The fix is to tell the build system to copy input.txt into the output folder automatically. Open MicroGPT.csproj and add this inside the <Project> element:

<!-- --- MicroGPT.csproj (add inside the <Project> element) --- -->

<ItemGroup>
  <None Update="input.txt">
    <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
  </None>
</ItemGroup>

This copies input.txt alongside the compiled binary whenever it's newer than the existing copy. After this change, both dotnet run and Visual Studio's Run/Debug button will find the file.

Create the Core Source Files

Create the empty source files alongside Program.cs. These are the permanent files that make up the final model.

# Linux / macOS
touch Value.cs GradientCheck.cs Tokenizer.cs BigramModel.cs Helpers.cs Model.cs AdamOptimiser.cs FullTraining.cs

# Windows (PowerShell)
"Value.cs", "GradientCheck.cs", "Tokenizer.cs", "BigramModel.cs", "Helpers.cs", "Model.cs", "AdamOptimiser.cs", "FullTraining.cs" | ForEach-Object { New-Item -ItemType File -Name $_ }

As you work through the chapters, you'll also create exercise files (Chapter1Exercise.cs, Chapter2Exercise.cs, and so on). Each chapter will tell you when to create one. They're self-contained - each has a static class with a Run() method - and you run them via a dispatcher in Program.cs:

dotnet run -- ch1     # runs Chapter1Exercise.Run()
dotnet run -- ch7     # runs Chapter7Exercise.Run()
dotnet run -- full    # runs the final training + inference
dotnet run            # same as "full"

You'll build the dispatcher skeleton now and add a line to it at the end of every chapter, so dotnet run -- chN just works from Chapter 1 onwards.

Create the Dispatcher and Verify the Setup

Clear out the placeholder Program.cs and replace it with the dispatcher skeleton below. Every chapter case is already wired up but commented out - you'll uncomment one line at the end of each chapter as you go. Until then, the default branch prints a sanity check that confirms the training dataset is in place:

// --- Program.cs ---

namespace MicroGPT;

public static class Program
{
    public static void Main(string[] args)
    {
        string chapter = args.Length > 0 ? args[0].ToLowerInvariant() : "";

        switch (chapter)
        {
            // Uncomment each case as you complete the corresponding chapter.
            // case "gradcheck":
            //     GradientCheck.RunAll();
            //     break;
            // case "ch1":
            //     Chapter1Exercise.Run();
            //     break;
            // case "ch2":
            //     Chapter2Exercise.Run();
            //     break;
            // case "ch3":
            //     Chapter3Exercise.Run();
            //     break;
            // case "ch4":
            //     Chapter4Exercise.Run();
            //     break;
            // case "ch5":
            //     Chapter5Exercise.Run();
            //     break;
            // case "ch6":
            //     Chapter6Exercise.Run();
            //     break;
            // case "ch7":
            //     Chapter7Exercise.Run();
            //     break;
            // case "ch8":
            //     Chapter8Exercise.Run();
            //     break;
            // case "ch9":
            //     Chapter9Exercise.Run();
            //     break;
            // case "ch10":
            //     Chapter10Exercise.Run();
            //     break;
            // case "full":
            //     FullTraining.Run();
            //     break;

            default:
                Console.WriteLine("MicroGPT project is ready.");
                Console.WriteLine($"Dataset exists: {File.Exists("input.txt")}");
                if (File.Exists("input.txt"))
                {
                    int lineCount = File.ReadAllLines("input.txt").Length;
                    Console.WriteLine($"Dataset lines: {lineCount}");
                }
                break;
        }
    }
}

Run it:

dotnet run

You should see:

MicroGPT project is ready.
Dataset exists: True
Dataset lines: 32033

If that works, you're ready to start building.

The Big Picture: How a Neural Network Learns

Before we write any code, here's the 60-second version of what we're building and why. If you already know what "forward pass," "loss," and "gradient" mean, skip ahead to Chapter 1.

A neural network is a math function with thousands of adjustable numbers called parameters. At the start, these parameters are random - the function produces garbage. Training is the process of adjusting them until the function produces something useful.

Training repeats four steps over and over:

Step 1 - The forward pass. Feed an input (like the letters of a name) through the math function. The function does a chain of operations - additions, multiplications, exponentials - and produces an output: a prediction of what character comes next.

Step 2 - The loss. Compare the prediction to the correct answer. The loss is a single number that measures how wrong the prediction was. A loss of zero means the prediction was perfect. A high loss means the model is basically guessing randomly. Our goal is to make this number go down.

Step 3 - The backward pass. This is the clever bit. For each of the thousands of parameters, we need to know: "if I nudged this number up a tiny bit, would the loss go up or down, and by how much?" That "how much" is called the gradient.

Computing gradients by hand would be impossibly tedious. Instead, we record every operation during the forward pass, then walk that record backward using a calculus shortcut called the chain rule. This process is called backpropagation, and it gives us the gradient for every parameter automatically.

Step 4 - The update. Now that we know each parameter's gradient, we nudge every parameter a tiny step in the direction that makes the loss smaller. Repeat from Step 1 with the next piece of training data.

That's the entire learning loop. Everything in this course - the Value class, the Backward method, the Softmax function, the Adam optimiser - is a piece of this loop. When you see these terms in the chapters that follow, you'll know where they fit.