DEV Community

Zyphorah
Zyphorah

Posted on

Prysma: Anatomy of an LLVM Compiler Built from Scratch in 8 Weeks

Prysma: https://github.com/prysma-llvm/prysma

This is a compiler development project I started about 8 weeks ago. I’m a CEGEP student, and since systems engineering of this scale isn’t taught at my level, I decided to build my own low-level ecosystem from scratch. Prysma isn’t just a student project; it’s a complete language and a modular infrastructure designed with the constraints of industrial production tools in mind. This document is a technical dissection of the architecture, my engineering choices, and the effort invested in the project.

  1. Meta-generation and automation of the frontend

Developing a compiler normally requires manually coding hundreds of classes for the Abstract Syntax Tree (AST) and its visitors, which generates a lot of technical debt. To avoid this, I created a compiler generator in Python.
Prysma’s grammar is defined in an ast.yaml file. My Python engine (engine_generation.py), which uses Jinja2, reads this specification and generates all the C++ code for the frontend (classes, virtual methods, interfaces). This strategy is inspired by LLVM’s TableGen. It allows me to add a new operator in 30 seconds. Without this technique, it would take me about an hour to add a single node, because I would have to manually modify the token, the lexer, the parser, and the visitors, with a high risk of errors. Now, everything is handled by automated templates.

  1. Parallel Orchestration with llvm::ThreadPool

A modern compiler needs to be fast, so I architected the orchestrator around llvm::ThreadPool. Prysma processes files in parallel for the lexing, parsing, and IR generation phases. The technical challenge was that LLVM contexts are not thread-safe. I had to isolate each compilation unit in its own context and memory module before the final merging by the linker. Managing race conditions on global symbols required strict adherence to the object lifecycle.

  1. Native Object Model and V-Tables

Prysma implements a class model directly in LLVM IR, including encapsulation (public, private, protected). Implementing polymorphism was one of the most complex aspects. I modeled navigation in virtual method tables (V-Tables) at the binary level using LLVM’s opaque types (llvm::StructType). Call resolution is handled at runtime with GetElementPtr (GEP) instructions to retrieve function pointers. Because a single-byte error causes Segfaults, this part is still in an unstable version in the compiler.

  1. Memory Management: Arena and Heap

Memory allocation is crucial for speed. For the AST nodes, I use a memory arena (llvm::BumpPtrAllocator). The compiler reserves a massive block and simply advances a pointer for each allocation in $O(1)$. Everything is freed at once at the end, as in Clang.
Subscribe to the Medium newsletter

For the Prysma language itself, I implemented dynamic allocation with the new and delete keywords, which communicate with libc’s malloc and free. Loops also manage their stack via LLVM’s alloca instruction.

  1. Unit and Functional Testing System

To ensure the reliability of the backend, I implemented a robust pipeline. I use Catch2 for C++ tests of the AST and the register. I also developed a test orchestrator in Python (orchestrator_test.py) that uses templates to compile and execute hundreds of files simultaneously. This allows testing recursion, variable shading, and thread collisions. Deployment is blocked by GitHub Actions if a single test fails.

  1. Execution Volume and Work Methodology

Systems engineering demands a significant amount of execution time. To make this much progress in 8 weeks, I worked 14 hours a day, 7 days a week. Designing an LLVM backend requires reading thousands of pages of documentation and debugging complex memory errors.

AI was a great help in understanding this complexity. My method was iterative: I generated LLVM IR code (version 18) from C++ code to inspect and understand each line. I combined Doxygen’s technical documentation with questions posed to the AI ​​to master everything. To maintain this pace, I managed my fatigue with caffeine (a maximum of three times a week to avoid upregulation), accepting the impact on my mental health to achieve my goals. I was completely absorbed by the project.

  1. Data-Oriented Design (Work by Félix-Olivier Dumas)

Félix-Olivier Dumas joined the Prysma team to restructure the project’s algorithmic foundation. He implemented a Data-Oriented Design (DOD) architecture for managing the AST, which is more efficient.

Prysma: https://github.com/prysma-llvm/prysma

Top comments (0)