David Rivera

Posted on Oct 24 • Edited on Oct 26

A High-Level Overview of Address Spaces: Their Place in ClangIR and LLVM

#gpu #programming #opensource #cpp

For a couple of months now, I've been on a mission to get immersed in a highly impactful open source project. As a student, getting into compilers has always been the kind of thing that sparked my interest — ever since building a programming language from scratch about a year ago, powered by LLVM and other somewhat esoteric dependencies like GNU Bison. I've been wanting to get into the trenches and see how production-level development is really done by the big companies.

ClangIR was one of the projects that caught my attention. Here's a brief overview:

ClangIR essentially aims to modernize Clang's well-established codegen pipeline by representing high-level semantics through its own dialect in MLIR. One of the main problems with the current state of the pipeline is that LLVM IR by itself is very low-level — it's mainly designed around representing a program to be run on a CPU, and it often drops high-level constructs like polymorphism, STL concepts, coroutines, and things revolving around that. Preserving these high-level semantics will potentially enable better optimizations and even better diagnostics through MLIR's capability of attaching AST nodes directly to the IR.

If you'd like to know more I highly recommend watching this LLVM talk which covers the fundamentals and motivation behind the project:

I've been contributing to this project since summer, mainly involved in porting X86 intrinsics and modeling them in the CIR dialect. I've been helping with certain lowering fixes related to these and have explored the structure of the CUDA runtime from Clang's perspective.

In the context of this blog, I'd like to emphasize ClangIR's potential to also make it easier to model C/C++-derived programming languages (mainly CUDA, HIP, OpenACC, and many more!) whose offloading models are directly represented in some of the core upstream MLIR dialects — see: MLIR Dialects. We have the potential of connecting these front-end representations to the robust infrastructure that revolves around MLIR.

Note also that I might utilize compiler jargon and present certain table-gen, cir, and LLVM IR

To me, this looked like a strong enough reason to get into the project. On paper, I'd be contributing to an upstream LLVM subproject (this is gold), which has the potential to revolutionize the current state of C++. As a student, this is such an amazing opportunity because of the learning possibilities I strongly value. The cool thing about this whole process is that we already have a reference for how to perform this implementation — the Codegen library has been around for around 15 years and in most cases is our main source of truth. We also have a wide variety of tests to prove the equivalence of the IR we generate.

Offloading Programming Languages and the Concept of Address Spaces

As we've witnessed the slowdown of Moore's Law over the last 20 years, we've seen the necessity of relying on different hardware to perform computations beyond traditional CPUs. In some of these programming models that revolve around targeting heterogeneous hardware, the developer is free to choose whether a function is executed on the host(CPU) or device(Other accelerator). Conceptually, this necessity arose because of the decrease in single-core performance over recent years. I found this very interesting post that dates back to 2015 documenting this trend: https://www.karlrupp.net/2015/06/40-years-of-microprocessor-trend-data/

Back in 2007, the invention of CUDA made us realize how important and powerful offload models can become. In heterogeneous computing, the concept of host and device is very relevant — certain programs may benefit from performing computations with low data dependency on the device (GPU), while others may be faster on traditional CPUs.

For GPUs: Take matrix multiplication as an example — while mathematically the order of multiplications is irrelevant, under IEEE 754 floating-point arithmetic the order does affect the result due to rounding errors. GPUs exploit parallelism by distributing these computations across threads, which can lead to small numerical differences depending on execution order.

On the other hand, for CPUs: tasks with very sequential control flow, operations on small data sets, memory-intensive tasks with complex branching, and many more may benefit from running on the CPU.

__global__ void add(float *A, float *B, float *C, int N) {
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i < N) C[i] = A[i] + B[i];
}

void main_offload(int N) {
    float *A, *B, *C;
    // 1. Allocate memory accessible by both CPU/GPU (Unified Memory)
    cudaMallocManaged((void**)&A, N * sizeof(float));
    cudaMallocManaged((void**)&B, N * sizeof(float));
    cudaMallocManaged((void**)&C, N * sizeof(float));

    // 2. Offload/Execute on GPU
    add<<< (N+255)/256, 256 >>>(A, B, C, N);

    // 3. Wait for GPU and access results on CPU
    cudaDeviceSynchronize(); 

    // 4. Cleanup
    cudaFree(A); cudaFree(B); cudaFree(C);
}

Why Address Spaces Matter

This is where address spaces come into play. Compilers need an abstract way of reasoning about where memory is located — not just for correctness, but also for potential optimizations. Think about it: if the compiler knows a pointer refers to GPU memory versus CPU memory, it can make very different decisions about caching, prefetching, and access patterns.

Pointers are conceptually the best candidate to hold this information, and that's exactly how it's modelled in LLVM. By attaching address space information to pointer types, we give the compiler the context it needs to generate efficient code for heterogeneous systems.

Notice how address spaces are bound to pointers in this snippet:


define dso_local void @_Z3fooPU3AS1i(ptr addrspace(1) noundef %arg) #0 {
entry:
  %arg.addr = alloca ptr addrspace(1), align 8
  store ptr addrspace(1) %arg, ptr %arg.addr, align 8
  ret void
}

Two Flavors: Language-Specific vs. Target Address Spaces

LLVM presents two types of address spaces: Language-Specific and Target. In the following table, I present their fundamental differences:

Aspect	Language-Specific Address Space	Target Address Space
Level of Abstraction	High-level concept used by a front-end like Clang to represent memory qualifiers from the source code.	Low-level implementation detail defined by the backend, representing actual physical memory regions.
Purpose	To provide language-specific alias information and guide front-end optimizations, such as mapping `local` and `global` memory in OpenCL.	To represent distinct hardware memory spaces for code generation and target-specific optimizations.
Representation	Stored as language-defined enumerations, like `clang::LangAS` in Clang, with specific names (e.g., `opencl_local`).	Stored as integer identifiers in the LLVM IR, whose meaning is interpreted by the backend.
Mapping	The front-end maps the language-specific address spaces to integer identifiers in the LLVM IR.	The target backend provides the semantic meaning for the integer identifiers. For a CPU, all may map to address space 0. For a GPU, different integers map to distinct memory regions.
Example	The OpenCL `__local` keyword is a language-specific address space that indicates memory is shared by a workgroup.	On a GPU, the `__local` address space might map to a target-specific integer, such as `3`, that represents on-chip, workgroup-shared memory.

The Implementation Journey

With this high-level overview, let's talk about the process of implementing this. I have to note first that address spaces were already implemented in the incubator, but we went through an interesting redesign after receiving some amazing feedback from the maintainers. Some of the code had to be drastically changed — and honestly, I think that was the best part of this process. I didn't need to blindly follow whatever we had implemented a couple of months ago. Rather, I had to come up with reasonable solutions that were going to go through a significant feedback process.

Starting with Target Address Spaces

Initially, we implemented target address spaces. While I could expand on the details and some of the decisions we debated and what we ended up shipping, I'd like to show you the structure of the Address Space attribute and its place in the Pointer Type.

//===----------------------------------------------------------------------===//
// TargetAddressSpaceAttr
//===----------------------------------------------------------------------===//

def CIR_TargetAddressSpaceAttr : CIR_Attr< "TargetAddressSpace",
                                         "target_address_space"> {
  let summary = "Represents a target-specific numeric address space";
  let description = [{
    The TargetAddressSpaceAttr represents a target-specific numeric address space,
    corresponding to the LLVM IR `addressspace` qualifier and the clang
    `address_space` attribute.

    A value of zero represents the default address space. The semantics of non-zero
    address spaces are target-specific.

    Example:
    ```
{% endraw %}
mlir
    // Target-specific numeric address spaces
    !cir.ptr<!s32i, addrspace(target<1>)>
    !cir.ptr<!s32i, addrspace(target<10>)>
{% raw %}

    ```
  }];
}

Here's how it looks in practice with actual C code:

void foo(int __attribute__((address_space(1))) *arg) {
  return;
}

And its respective lowering to CIR — notice how the address space is attached to the pointer types:

cir.func dso_local @_Z3fooPU3AS1i(%arg0: !cir.ptr<!s32i, target_address_space(1)> loc(fused[#loc3, #loc4])) inline(never) {
  %0 = cir.alloca !cir.ptr<!s32i, target_address_space(1)>, !cir.ptr<!cir.ptr<!s32i, target_address_space(1)>>, ["arg", init] {alignment = 8 : i64} loc(#loc8)
  cir.store %arg0, %0 : !cir.ptr<!s32i, target_address_space(1)>, !cir.ptr<!cir.ptr<!s32i, target_address_space(1)>> loc(#loc5)
  cir.return loc(#loc6)
} loc(#loc7)

The attribute flows through the entire representation — from the function parameter, to the allocation, to the store operation. This consistency is crucial for maintaining correctness throughout the compilation pipeline.

Implementation Journey: From Concept to Code

The minimum viable goal I set for myself was straightforward: ensure that address spaces are correctly represented in the underlying IR when code flows through the CIR pipeline. Sounds simple, right? But once you start implementing an attribute in MLIR, you realize there's a whole chain of decisions to make:

First, how should the assembly format look? You need to design syntax that's both human-readable and consistent with MLIR conventions. Second, how will your attribute be parsed? The parser needs to handle the syntax you've designed and convert it into the internal representation. Third, and this is where things get really interesting in the context of Clang — we're consuming information directly from the AST (Abstract Syntax Tree). So the question becomes: how do we bridge that gap? How do we transform high-level AST nodes carrying address space information into the concrete CIR operations we need to generate?

These were the core challenges I tackled during this contribution. It's the kind of work that takes you on a journey from a high-level language construct all the way down to its underlying structural representation — exactly the kind of compiler work I find fascinating.

What I Learned: Navigating Large Codebases

I think one of the most valuable skills I developed through this process is learning to navigate massive codebases. Clang and LLVM are huge — we're talking millions of lines of code built up over nearly two decades. Finding the right abstraction, understanding where your changes fit into the existing architecture, and tracing how data flows through multiple layers of transformation — these are skills you can only develop by doing. And honestly, this experience will stick with me forever.

For anyone interested in the technical details and implementation specifics, I'd highly recommend checking out my PR: https://github.com/llvm/llvm-project/pull/161028

Scope and Future Work

I want to be transparent: what we landed covers just a foundational portion of what address spaces truly encompass. There's a lot I didn't tackle yet — The implementation of Language Specific Addresses, how conversions work between different address spaces(I'm almost done with this), how they interact with type qualifiers, and how they propagate through even lower layers of abstraction during LLVM IR generation and eventually code generation.

I kept the scope manageable for a reason. Between school, other commitments, and this being a side project, I haven't been able to immerse myself in this full-time as much as I'd like (yeah, school has definitely been taking a toll on me lately). But I'm proud of what we shipped, and it lays the groundwork for future iterations.

Reflections on Open Source

Just to wrap up, I want to highlight something that still feels surreal to me: the power of open source as a learning platform. Through this project, I've had the opportunity to collaborate with engineers from AMD, NVIDIA, and Meta — people who work on production compilers and toolchains used by millions.

If you had asked me a year ago whether I'd have the skills or confidence to contribute to a project like this, to engage in design discussions with industry professionals, or to have my code reviewed by compiler experts — I probably wouldn't have believed it was possible. Yet here we are.

That's the magic of open source: it doesn't care about your resume or your credentials. It cares about your willingness to learn, your ability to take feedback, and your commitment to shipping quality work. For any student reading this who's on the fence about diving into a large open source project — just do it. The learning curve is steep, but the growth is exponential.

DEV Community