DEV Community: David Rivera

OSD Final week: My last set of contributions in the course

David Rivera — Mon, 08 Dec 2025 01:28:07 +0000

It's almost the end of the semester; for the last release 0.4 of the OSD course at Seneca, we're tasked to deliver meaningful work in the shape of contributions to projects. Luckily, I know what to do: over the last couple of months, one of my sources of satisfaction has been to contribute to ClangIR and generally to the LLVM project, with some sporadic contributions to other projects that spark my interest (Both IREE and wgpu). My planning for this might have been done a little bit late, mainly stuff that got in the way. I really wish I had the time to spend most of my effort on compilers and generally understanding what's behind abstractions (on the side of also working on a simple RISC-V emulator, which I'm building with the purpose of understanding, low level encoding), but hey, I'm in school so I don't have a lot of options here.

According to my standards, a week might not be enough to deliver meaningful work. Meaningful work is something you really feel proud of and takes time, maybe months, or even years. by the time I'm writing this. I've cleared out most of my assignments for the courses I took this semester. So my entire energy is gonna go towards this. I want to emphasize my enthusiasm for compilers and high-performance computing, specifically GPUS. Getting a bit more specific, I'm working on the inner logic related to how host and device communicate, which is something very explicit in CUDA and HIP compilation.

What I want to tackle this week:

ClangIR/LLVM:

I've been giving support to CUDA/HIP in ClangIR, and that's still the way forward in my case. I'm very fortunate to have an interest in this field since there are actual engineers with decades of experience working on this project. People working in national laboratories (In the US) and big chip companies are putting effort into this, so this is a fantastic opportunity. It's actually quite an interesting way to learn about heterogeneous computing, as I've mentioned in my last couple of posts, I'm not learning by following a tutorial or something like that; I'm actually learning by looking at the infrastructure that is around these paradigms of computing in one of the main projects that supports it, which is LLVM.

WebGPU/naga

I've also found interest in the way graphic API's work. I may have talked about this in a different post, but I'm working on Naga (which is a sub-project of WGPU), which is the fundamental translation layer between WGSL and target-specific shading languages.There are a couple of optimization opportunities I wanna tackle which I'll try to deliver before the end of the week.

Update: More CIR PR's and Release 0.3

David Rivera — Mon, 24 Nov 2025 20:35:55 +0000

The last couple of weeks were pretty productive, despite the heavy academic load of the semester. For this release, I wanted to focus mainly on ClangIR. It feels natural to follow the momentum I've built on this project; ultimately, it is the area I feel most comfortable with and the one that has pushed me to learn the most about compilers.

I completed two main PRs: one focused on backporting work done in October to the incubator, and the other focused on implementing missing CUDA features within CIR.

Backporting AddressSpaces from Upstream - ClangIR

Link: https://github.com/llvm/clangir/pull/1986

At the time of writing, I am still addressing some feedback on this PR. The main idea, which I blogged about previously, is to model how different offload programming languages handle memory address representations on various hardware. This matters because utilizing specific memory locations can drastically reduce latency and offer orders of magnitude in performance improvements—this is certainly true for shared memory located within a GPU workgroup/wave.

The attribute was modeled on top of pointer types. Initially, the implementation was geared towards representing target address spaces. The main challenge of backporting this to the incubator was managing the compatibility with language-specific address spaces as well. In theory, you could attach either a target or a language-specific attribute to a pointer. I am currently implementing an interface to make both generic, but it is proving difficult due to the intrinsic complexities of TableGen and MLIR definitions.

Adding Support for Stream-Per-Thread in CUDA - ClangIR

Link: https://github.com/llvm/clangir/pull/1997

One of the perks of contributing to LLVM is learning from the "gold standard" implementations of C-like languages within Clang's ecosystem.

In GPU programming, streams act as a command buffer between the host and the device. Operations sent to a stream are executed in the exact order they are received. In older CUDA versions, the default stream was shared across all host threads, meaning massive queues could cause false serialization and performance overhead. (See: Nvidia Blog on CUDA 7 Streams).

While older CUDA versions didn't support stream-per-thread, it has become the norm in modern versions.

In reflection... You'll find that the Clang code generation pipeline holds an embedded runtime for CUDA, OpenCL, and HIP. The main takeaway is that many features have a clear baseline in ClangIR compared to the original CodeGen implementation. I didn't need extensive domain-specific CUDA knowledge to support this; it is essentially a driver flag that instructs the compiler to alter the default stream semantics during code generation.

OSD600: Lab 9

David Rivera — Sun, 23 Nov 2025 22:38:32 +0000

Repo Link: https://github.com/RiverDave/rrcm/tree/main

I haven't been too enthusiastic about the development of this project in the last few weeks. The main reason is that most of what we did was mainly based on learning and implementing fairly small details on our projects (like testing, CI's and stuff). I think something else to consider is that this was already a pre-defined project that we had to develop, so it wasn't something I was extremely passionate about from the beginning (Although I think most of these points are valid from an academic perspective — It would be hard to evaluate a whole group with different projects and different ideas, It is also aimed towards people just getting started in OSS). I felt engaged throughout those first sessions, given that the dopamine hits I was getting out of trying to develop a neat, structured and clean project faded away after diverting my attention to the boring mandatory courses my program had to offer (especially the APD/Enterprise Java shit). I've also taken a lot of time in the recent sprint I've taken towards amplifying my stream of contributions throughout the semester — Thing that I really wanna keep on doing given the personal and "professional?" satisfaction it provides.

I renamed my project from rust-cli-tool, which was more of a placeholder name, to RRCM, short for Rusty Repo Context Manager. Good thing I noticed, as that's the most sensible thing to do. It's not a completely amazing name, but it will do until the end of the semester.

I'll follow up with some of the questions posed at the lab wiki:

Which release tool and package registry did you choose? Provide links to everything you mention.

The project has been released in Crates.io in the following link.

What was the process for creating a release? Be detailed so that other developers could read your blog and get some idea how to do the same.

The release process was easier than I expected; what mainly needs to be done is to add a git tag that "locks" or stores a checkpoint for your source code. This procedure is really straightforward in the Rust ecosystem, at least. In my own case, the complex part was figuring out how to get the release automation right through GitHub Actions, but luckily, I found very useful templates externally.

What did you learn while "aha!" moments or get stuck?

What I learnt mainly had to do with git tags, I wasn't familiar with them at all, but now it makes sense how all of these big projects actually lock their source code not only to builds but also to different platforms/architectures. I didn't really get stuck, but I imagine it would've been considerably harder to release a project in a "different" ecosystem like C++ (Although the latter doesn't really have a package ecosystem at the end...).

How much did you have to alter your code, files, build, etc., to use the chosen package format and registry?

Mostly YAML and TOML files, I was also having certain issues with the code style I was utilizing in certain functions, so at the end, I had to write a patch to solve those issues separately.

How did the User Testing session with your partner go? Describe what it was like, where they got stuck, and how you corrected the problems.

I honestly did this a bit late (Saturday, Nov 22nd to be exact); therefore did the testing by myself. I did it on different machines and had zero issues. The only nit of that testing session was the fact that the machine I utilized was fairly old, so the Cargo toolchain was a bit problematic to fix. But besides that, I'd say it was a really smooth process.

How do users install and use your project now that it has been released? Provide links and some instructions.

I have a series of steps to follow in my project's README.

Lab 8: CI pipelines

David Rivera — Sun, 16 Nov 2025 22:29:19 +0000

(write bout my experience with them in other projects.)
CI/CD pipelines are one of those concepts you'll usually encounter when working on projects to be deployed in production. In my experience, they're usually the source of truth to verify whether the changes performed in the codebase didn't break anything else. This is specially beneficial in projects that build in different platforms, architectures or operating systems. I've been somewhat familiar with them ever since my first Co-op at Kinaxis(If I recall correctly — the system we had in place wasn't as advanced as the one it is possible for us to handle with Github Actions, given that we had to manually trigger the runs, which was very annoying).

I could mention a lot of reasons why it is necessary to have CI/CD pipelines, but perhaps you might find them important in two cases:

we may want to enforce a certain coding style throughout the code base. So that all of our files have the same formatting style.
In Projects that I've contributed to, like LLVM, which is meant to target multiple hardware and operating systems, whether it is a system running X64 or ARM64 and whether its OS is Linux, Windows or macOS, you want to verify that all of your tests you'll write work just as well between each other.

Now here are my responses to the questions.

How did you set up your GitHub Actions CI Workflow? What did the YAML for this workflow look like?

I'm not the best at writing these YAML files from scratch. I find the most effective way of setting a CI in your project is mostly through some sort of template. GitHub provides facilities for this when you instantiate a repo. It is widely recommended to research linters, formatters and the respective testing frameworks to integrate them on every "serious" project's CI.

How did your partner's repo and testing setup differ from yours?

To be honest, I had never written tests in Go; therefore, this was somewhat of a new process for me. What always surprises me about Go is the inherent simplicity that forms the language. Every time I find a project, I find it syntactically familiar. Regarding the tests, it quite surprised me the way the comments were laid out — as if it was related to some mathematical exercise. I replicated the style of other tests, in this case we're testing if the pattern provided (.log) correctly ignores .log files in a directory:

func TestIsIgnored_FilenameMatch(t *testing.T) {
    // Given: GitIgnore with a filename pattern
    tempDir := t.TempDir()
    gitignorePath := filepath.Join(tempDir, ".gitignore")
    if err := os.WriteFile(gitignorePath, []byte("*.log"), 0644); err != nil {
        t.Fatalf("Failed to create .gitignore: %v", err)
    }
    gi, err := NewGitIgnore(tempDir)
    if err != nil {
        t.Fatalf("Failed to create GitIgnore: %v", err)
    }

    // When/Then: Should match filename pattern
    if !gi.IsIgnored("test.log", false) {
        t.Error("Expected 'test.log' to be ignored")
    }
    if gi.IsIgnored("test.txt", false) {
        t.Error("Expected 'test.txt' to not be ignored")
    }
}

This is a very isolated example, but I found it quite unique to see how the worlds of mathematics merge in a very smooth way with programming.

What was it like writing tests for a project you didn't create?

As I mentioned in the beginning, being go such a simple language to understand simplifies things in an incredible way. I've honestly written tests in complicated projects in C++, specifically with GTest and have written a bunch of LIT tests in LLVM, and I must say, Go is up there with Rust when it comes to having a great quality of life while writing tests.

What do you think of CI now that you've set it up for yourself? If you did the optional challenges, talk about how that went and what you learned.

CI is something I set up before this lab, and to be honest, when developing things from scratch, I find it to be something I cannot live without. Implanting x feature and having the confidence and the security that something you added didn't break anything, it's truly amazing. I did set up a linter with Clippy, which is a core part of the Rust ecosystem. I particularly found it a little annoying to work with because it tends to be quite picky with whitespaces.

OSD600 Lab 7: Testing

David Rivera — Sat, 08 Nov 2025 04:29:30 +0000

I'll make this one really straightforward:

Over the last couple of months working on the repo context manager project (see), Testing was one of the fundamental things one cannot miss. Specifically, as I was iterating largely and adding features over and over again, it was essential for me to keep track of correctness throughout the main logic. Although not explicitly stated in any of the labs so far, I've been adding tests all along this time.

This dedication to testing was a crucial habit, as it dramatically boosted my confidence while adding features. Having a robust set of tests meant I could refactor or introduce major changes quickly, secure in the knowledge that if I broke something core, the test suite would immediately flag it. For an evolving project, this continuous verification is key to maintaining high code quality and ensuring the main logic remains sound, no matter how fast development moves.

Luckily nowadays in modern programming languages like Rust (in my case) testing functionalities are very straightforward and are already builtin—into the ecosystem by itself.

Rust testing capabilities

Rust's testing is deeply integrated into the language and its build tool, Cargo, making it straightforward and idiomatic. Note that in order to simulate test scenarios and generating dummy files and directories the crate: TempDir had to be utilized.

Unit Tests

Location: Unit tests are typically placed in a separate tests module within the same file as the code they are testing. This allows them to test private functions and internal implementation details.

Syntax They are defined using the #[cfg(test)] attribute on the module and the #[test] attribute on the function itself.

Execution You run them using the command cargo test.

Assertions Tests rely on built-in assertion macros like assert!, assert_eq!, and assert_ne! to check for expected outcomes. For example, assert_eq!(2 + 2, 4);.

Integration Tests

Location: Integration tests are placed in a dedicated tests directory at the root of your project (not inside the src folder). Each file in this directory is treated as an independent crate.

Purpose: These tests only interact with the public interface of your library, simulating how external code would use it.

Execution: They are also run using the cargo test command, automatically found by Cargo in the tests directory.

This is an example of what it could look like:

// In src/lib.rs, inside the module with the function to test
pub fn add_two(a: i32) -> i32 {
    a + 2
}

#[cfg(test)] // This attribute tells Rust to compile this module only when running tests
mod tests {
    use super::*; // Bring the outer items (like add_two) into scope

    #[test] // This attribute marks a function as a test
    fn it_works() {
        // Assertions are the core of tests
        assert_eq!(add_two(2), 4); // Checks if the function's output equals the expected value
    }

    #[test]
    #[should_panic] // This attribute ensures the test passes only if the function panics
    fn handles_bad_input() {
        // Example: a function that panics on zero input
        // some_function(0);
        assert!(true); // Placeholder for a panicking call
    }
}

In summary, Rust provides a powerful, zero-setup testing environment where unit and integration tests are handled seamlessly by Cargo, encouraging developers to write tests alongside their code.

Leaving my comfort zone: Contributing to wgpu—A light overview on Graphic API's

David Rivera — Tue, 28 Oct 2025 19:18:19 +0000

Over the last year, I've been widely immersed in the LLVM Project and some other projects derived from it (like IREE) that aim to deliver robust compiler infrastructure across multiple hardware and multiple domains. It has honestly been an amazing experience, especially given the technical skills it has provided me.

If you're interested in seeing my involvement, refer to my Github Profile. I have a portfolio that encompasses some projects aligned with my interests.

I've been very curious about GPUs these days—while I've been a consumer of these products, specifically on the chips oriented to gaming, I've grown curious about what the hype of AI is all revolving around. It is not a surprise that the discovery of the benefits of exploiting GPU parallelism on neural networks and AI in general has empowered the infrastructure of some of the most important companies like OpenAI or Anthropic.

This blog is not oriented towards my exploration of GPUs in that field (I plan to publish one soon after I finish my work on IREE). This time I'm heading to uncharted territory (at least until last weekend), which is the field of graphics APIs—specifically through a weekend contribution to the wgpu project.

More specifically, let's talk about wgpu—A WebGPU API implementation. one of the major ones, utilized in Firefox run by Mozilla. I'll cover things like what a basic overview of what a graphics api is and some of the core components I was able to identify in this project—WGSL and Naga. I might be missing some fundamentals here, but consider that this was merely a weekend project (my type of leisure).

A graphics API from 10000 feet

A graphics API mainly serves the purpose of serving as a bridge or interaction between an application's high-level rendering commands and the underlying graphics hardware (In most cases a GPU!).

By providing this set of functions, a graphics API simplifies the complexities of multiple hardware architectures, which allows developers to write portable code across multiple systems.

(In practice, this sounds not too different from compilers, specifically JIT compilers. at the end of the day what matters is that you're abstracting something down and making certain layers of your stack portable between targets — this is true in most cases).

Think about it this way: If we were to express rendering semantics in machine code directly from our app, it would be very tedious. Consider that we'd have to worry about the multiple architectures we'd have to target.

WebGPU and one of its implementations: wgpu

One of the major graphics API specifications is WebGPU. There's probably no better explanation than the one according to Mozilla:

The WebGPU API enables web developers to use the underlying system's GPU (Graphics Processing Unit) to carry out high-performance computations and draw complex images that can be rendered in the browser.

WebGPU is the successor to WebGL, providing better compatibility with modern GPUs, support for general-purpose GPU computations, faster operations, and access to more advanced GPU features.

While I could delve into more details on how it differs from the other major API's, For simplicity matters most of what I could find is that it provides better portability as it allows developers to run it on the browser through WebAssembly.

There are multiple implementations of the WebGPU API, most notably:

Dawn: Utilized in Chrome and Edge, written in C++
wgpu: Utilized in Firefox, written in Rust

Let's talk about wgpu, Shading languages (specifically WGSL, msl) and naga.

The Translation Layer: Where WGSL Meets Metal

When working with wgpu, you write your shaders in WGSL (WebGPU Shading Language), a modern, platform-agnostic shading language designed for the WebGPU specification. But here's the thing: your GPU doesn't understand WGSL directly. Each graphics backend—whether it's Metal on macOS/iOS, Vulkan on Linux/Windows, or DirectX on Windows—has its own native shading language.

This is where naga comes in. Naga is the shader translation library at the heart of wgpu, acting as a sophisticated compiler that bridges this gap. Think of it as a polyglot translator for GPU code.

How the Pipeline Works

The translation happens in several stages:

Your Application → WGSL Shader Code → Naga's Internal Representation (IR) → Backend-Specific Shading Language → Native GPU Compilation → GPU Execution

When you write a shader in WGSL and load it through wgpu's bindings, naga parses your shader code, validates it, converts it to an internal representation, and then translates it to the appropriate backend language. On macOS, that means translating WGSL to Metal Shading Language (MSL). On Linux with Vulkan, it's SPIR-V. On Windows with DirectX, it becomes HLSL.

The beauty of this architecture is that you, as a developer, never have to think about these backend differences. You write WGSL once, and naga handles the rest.

Inside the Translation: A Real Example

Let me show you what this translation looks like in practice, using a contribution I recently made to naga's Metal backend (PR #8432).

Consider a simple WGSL shader that computes a dot product on integer vectors:

let a: vec2<i32> = vec2<i32>(1, 2);
let b: vec2<i32> = vec2<i32>(3, 4);
let result: i32 = dot(a, b);

In WGSL, the dot() function works uniformly across all vector types—floats, integers, you name it. But Metal Shading Language has a quirk: its built-in dot() function only works with floating-point vectors. For integer vectors, we need a different approach.

Previously, naga would inline the dot product calculation directly wherever it appeared:

int result = (+ a.x * b.x + a.y * b.y);

This works, but if you use integer dot products multiple times in your shader, you get the same verbose expression repeated throughout your code. It's not ideal for code size or readability.

My PR changed this by introducing wrapper helper functions for integer dot products. Now, naga generates:

int naga_dot_int2(metal::int2 a, metal::int2 b) {
    return (a.x * b.x + a.y * b.y);
}

// Later in your shader:
int result = naga_dot_int2(a, b);

The wrapper function is emitted once at the top of the generated Metal shader and reused throughout. For each concrete type combination (like int2, uint3, long4), naga generates a uniquely named helper function using a mangling scheme: naga_dot_{type}{size}.

Why This Matters

This might seem like a small optimization, but it illustrates a crucial point about how shader translation works in practice. Naga isn't just doing a mechanical find-and-replace translation. It's actively making decisions about how to best represent WGSL semantics in each target language, accounting for the quirks and capabilities of different backends.

These optimizations happen transparently. You never have to write different shader code for different platforms. You don't even have to know that Metal's dot() function has this limitation. Naga handles the complexity so you can focus on writing your graphics code.

This is the power of having a dedicated translation layer like naga. It's not just making wgpu cross-platform—it's making it intelligently cross-platform, generating efficient, idiomatic code for each backend while maintaining perfect semantic equivalence with the source WGSL.

Reflection

As I stated at first, I didn't expect to get into this rabbit hole (Yes, I'm very prone to getting into these), this project is the first time I'm exposed to writing Rust in production! And honestly, it wasn't as bad as I thought. By far the best thing Rust does in my opinion, is pattern matching, which is very important when writing compilers, since you may want to map different data structures/translations 1:1.

There may be more issues I'll definitely be looking forward to tackling in this project, as my experience was surprisingly smooth. In practical terms, build times were a lot smoother, cargo is perhaps the best in terms of developer tooling, and in general, I have to say the Rust ecosystem provides a ton of quality of Life as compared to C++.

Just to wrap up, it feels pretty good to contribute to these projects knowing that this technology has the potential to be utilized by millions of users—I believe I highlighted this on my first blog; however, I think that's certainly what I feel regarding most projects I get into. One thing is for sure, I won't stop exploring OSS projects in the short term—It's quite enriching to interact with different developers across different domains :).

OSD600: Lab 6 — Repomix's token count tree feature and its prototyping in Rust

David Rivera — Tue, 28 Oct 2025 03:12:38 +0000

Summary

I analyzed how Repomix implements its token-count-based repository insights—especially the Token Count Tree—and prototyped an analogous capability in my Rust CLI tool. My prototype adds two insights to the summary section:

Language breakdown by file extension (files, lines, MB, % of total lines)
Top files by line count (a quick “hotspot” view)

While Repomix focuses on token counts (via OpenAI’s tiktoken models), I started with line counts to deliver a fast, useful proof-of-concept with zero external dependencies. Below are my reading notes, code references, and design takeaways.

What is the Feature?

Repomix exposes “Token Count Optimization” and a “token count tree” visualization:

Docs: README → Token Count Optimization
- https://github.com/yamadashy/repomix (search for “Token Count Tree”)
CLI options include --token-count-tree and summary knobs like --top-files-len
Output integrates token metrics into summaries and trees, enabling threshold filtering and hotspot discovery

Where It Lives in Repomix

While Repomix is a TypeScript project, the architecture is modular. The token counting and metrics are implemented across a few core modules:

Token counting metrics
- src/core/metrics/TokenCounter.ts
- https://github.com/yamadashy/repomix/blob/main/src/core/metrics/TokenCounter.ts
- src/core/metrics/tokenCounterFactory.ts
- https://github.com/yamadashy/repomix/blob/main/src/core/metrics/tokenCounterFactory.ts
- src/core/metrics/calculateMetrics.ts
- https://github.com/yamadashy/repomix/blob/main/src/core/metrics/calculateMetrics.ts
- src/core/tokenCount/buildTokenCountStructure.ts
- https://github.com/yamadashy/repomix/blob/main/src/core/tokenCount/buildTokenCountStructure.ts
Output generation (where metrics flow into formatted output)
- src/core/output/outputGenerate.ts
- https://github.com/yamadashy/repomix/blob/main/src/core/output/outputGenerate.ts
- src/core/output/outputSort.ts
- https://github.com/yamadashy/repomix/blob/main/src/core/output/outputSort.ts
Configuration
- README “Configuration Options” → output.topFilesLength, output.tokenCountTree, tokenCount.encoding, etc.

How It Works (High-Level)

A TokenCounter abstracts tokenization and can be configured for different encodings (e.g., o200k_base for GPT-4o).
Repomix builds a structured view of token counts per file and per directory (buildTokenCountStructure.ts), enabling tree visualization and threshold filtering.
Metrics are computed centrally (calculateMetrics.ts), then consumed by output generators to render in XML/Markdown/JSON.
Configuration flags control whether to include summaries, how many top files to display, and whether to include a token count tree.

What I Learned Reading the Code

Separation of Concerns: Token counting is isolated from output generation via metrics modules. This keeps formatting logic simple and makes metrics reusable.
Pluggable Encodings: The factory for TokenCounter makes it easy to switch models/encodings.
Tree Building: The token count tree is computed from a structured aggregation (not rendered-on-the-fly), which simplifies sorting and thresholding.
Config-Driven Output: The same metrics flow to multiple output styles with minimal branching.

Strategies I Used to Read the Code

GitHub UI browsing of src/core/metrics, src/core/tokenCount, and src/core/output
Skimmed README to map CLI flags to modules
Targeted searches (file names and keywords like TokenCounter, tokenCountTree) to find implementations
Cross-referenced type names between modules to follow data flow

Prototype in My Rust Tool

I implemented a fast proof-of-concept using line counts instead of tokens:

File: src/output.rs
Functionality added in the Summary section:
- Language breakdown (by file extension): files, lines, bytes, and % of total lines
- Top files by lines (first 10)

Rationale:

Zero new deps; leverages existing FileContext (with lines and size)
Mirrors Repomix’s “top files” and “token-tree” spirit with a simpler metric
Stable prototype surface for future evolution to real token counting

Next Steps (Planned Enhancements)

Add CLI/config knobs to toggle these sections and set list lengths (e.g., --top-files-len)
Improve language detection (map extensions to canonical names; optionally integrate linguist-like detection)
Introduce token counting via a Rust tokenizer (tiktoken-rs or tokenizers), behind a feature flag
Add a “Line Count Tree” with optional threshold to mirror Repomix’s token-count-tree UX, then later swap lines→tokens
Expand tests to cover new summary content deterministically

Much of this is already being tracked in their respective issues

Open Questions

Which tokenizer/encoding should we support first (o200k_base vs cl100k_base)?
How should we handle binary and generated files in metrics? (Repomix defaults exclude large/binary by default)
Where to surface metrics in non-Markdown outputs (JSON, plain)?

A High-Level Overview of Address Spaces: Their Place in ClangIR and LLVM

David Rivera — Fri, 24 Oct 2025 23:04:55 +0000

For a couple of months now, I've been on a mission to get immersed in a highly impactful open source project. As a student, getting into compilers has always been the kind of thing that sparked my interest — ever since building a programming language from scratch about a year ago, powered by LLVM and other somewhat esoteric dependencies like GNU Bison. I've been wanting to get into the trenches and see how production-level development is really done by the big companies.

ClangIR was one of the projects that caught my attention. Here's a brief overview:

ClangIR essentially aims to modernize Clang's well-established codegen pipeline by representing high-level semantics through its own dialect in MLIR. One of the main problems with the current state of the pipeline is that LLVM IR by itself is very low-level — it's mainly designed around representing a program to be run on a CPU, and it often drops high-level constructs like polymorphism, STL concepts, coroutines, and things revolving around that. Preserving these high-level semantics will potentially enable better optimizations and even better diagnostics through MLIR's capability of attaching AST nodes directly to the IR.

If you'd like to know more I highly recommend watching this LLVM talk which covers the fundamentals and motivation behind the project:

I've been contributing to this project since summer, mainly involved in porting X86 intrinsics and modeling them in the CIR dialect. I've been helping with certain lowering fixes related to these and have explored the structure of the CUDA runtime from Clang's perspective.

In the context of this blog, I'd like to emphasize ClangIR's potential to also make it easier to model C/C++-derived programming languages (mainly CUDA, HIP, OpenACC, and many more!) whose offloading models are directly represented in some of the core upstream MLIR dialects — see: MLIR Dialects. We have the potential of connecting these front-end representations to the robust infrastructure that revolves around MLIR.

Note also that I might utilize compiler jargon and present certain table-gen, cir, and LLVM IR

To me, this looked like a strong enough reason to get into the project. On paper, I'd be contributing to an upstream LLVM subproject (this is gold), which has the potential to revolutionize the current state of C++. As a student, this is such an amazing opportunity because of the learning possibilities I strongly value. The cool thing about this whole process is that we already have a reference for how to perform this implementation — the Codegen library has been around for around 15 years and in most cases is our main source of truth. We also have a wide variety of tests to prove the equivalence of the IR we generate.

Offloading Programming Languages and the Concept of Address Spaces

As we've witnessed the slowdown of Moore's Law over the last 20 years, we've seen the necessity of relying on different hardware to perform computations beyond traditional CPUs. In some of these programming models that revolve around targeting heterogeneous hardware, the developer is free to choose whether a function is executed on the host(CPU) or device(Other accelerator). Conceptually, this necessity arose because of the decrease in single-core performance over recent years. I found this very interesting post that dates back to 2015 documenting this trend: https://www.karlrupp.net/2015/06/40-years-of-microprocessor-trend-data/

Back in 2007, the invention of CUDA made us realize how important and powerful offload models can become. In heterogeneous computing, the concept of host and device is very relevant — certain programs may benefit from performing computations with low data dependency on the device (GPU), while others may be faster on traditional CPUs.

For GPUs: Take matrix multiplication as an example — while mathematically the order of multiplications is irrelevant, under IEEE 754 floating-point arithmetic the order does affect the result due to rounding errors. GPUs exploit parallelism by distributing these computations across threads, which can lead to small numerical differences depending on execution order.

On the other hand, for CPUs: tasks with very sequential control flow, operations on small data sets, memory-intensive tasks with complex branching, and many more may benefit from running on the CPU.

__global__ void add(float *A, float *B, float *C, int N) {
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i < N) C[i] = A[i] + B[i];
}

void main_offload(int N) {
    float *A, *B, *C;
    // 1. Allocate memory accessible by both CPU/GPU (Unified Memory)
    cudaMallocManaged((void**)&A, N * sizeof(float));
    cudaMallocManaged((void**)&B, N * sizeof(float));
    cudaMallocManaged((void**)&C, N * sizeof(float));

    // 2. Offload/Execute on GPU
    add<<< (N+255)/256, 256 >>>(A, B, C, N);

    // 3. Wait for GPU and access results on CPU
    cudaDeviceSynchronize(); 

    // 4. Cleanup
    cudaFree(A); cudaFree(B); cudaFree(C);
}

Why Address Spaces Matter

This is where address spaces come into play. Compilers need an abstract way of reasoning about where memory is located — not just for correctness, but also for potential optimizations. Think about it: if the compiler knows a pointer refers to GPU memory versus CPU memory, it can make very different decisions about caching, prefetching, and access patterns.

Pointers are conceptually the best candidate to hold this information, and that's exactly how it's modelled in LLVM. By attaching address space information to pointer types, we give the compiler the context it needs to generate efficient code for heterogeneous systems.

Notice how address spaces are bound to pointers in this snippet:


define dso_local void @_Z3fooPU3AS1i(ptr addrspace(1) noundef %arg) #0 {
entry:
  %arg.addr = alloca ptr addrspace(1), align 8
  store ptr addrspace(1) %arg, ptr %arg.addr, align 8
  ret void
}

Two Flavors: Language-Specific vs. Target Address Spaces

LLVM presents two types of address spaces: Language-Specific and Target. In the following table, I present their fundamental differences:

Aspect	Language-Specific Address Space	Target Address Space
Level of Abstraction	High-level concept used by a front-end like Clang to represent memory qualifiers from the source code.	Low-level implementation detail defined by the backend, representing actual physical memory regions.
Purpose	To provide language-specific alias information and guide front-end optimizations, such as mapping `local` and `global` memory in OpenCL.	To represent distinct hardware memory spaces for code generation and target-specific optimizations.
Representation	Stored as language-defined enumerations, like `clang::LangAS` in Clang, with specific names (e.g., `opencl_local`).	Stored as integer identifiers in the LLVM IR, whose meaning is interpreted by the backend.
Mapping	The front-end maps the language-specific address spaces to integer identifiers in the LLVM IR.	The target backend provides the semantic meaning for the integer identifiers. For a CPU, all may map to address space 0. For a GPU, different integers map to distinct memory regions.
Example	The OpenCL `__local` keyword is a language-specific address space that indicates memory is shared by a workgroup.	On a GPU, the `__local` address space might map to a target-specific integer, such as `3`, that represents on-chip, workgroup-shared memory.

The Implementation Journey

With this high-level overview, let's talk about the process of implementing this. I have to note first that address spaces were already implemented in the incubator, but we went through an interesting redesign after receiving some amazing feedback from the maintainers. Some of the code had to be drastically changed — and honestly, I think that was the best part of this process. I didn't need to blindly follow whatever we had implemented a couple of months ago. Rather, I had to come up with reasonable solutions that were going to go through a significant feedback process.

Starting with Target Address Spaces

Initially, we implemented target address spaces. While I could expand on the details and some of the decisions we debated and what we ended up shipping, I'd like to show you the structure of the Address Space attribute and its place in the Pointer Type.

//===----------------------------------------------------------------------===//
// TargetAddressSpaceAttr
//===----------------------------------------------------------------------===//

def CIR_TargetAddressSpaceAttr : CIR_Attr< "TargetAddressSpace",
                                         "target_address_space"> {
  let summary = "Represents a target-specific numeric address space";
  let description = [{
    The TargetAddressSpaceAttr represents a target-specific numeric address space,
    corresponding to the LLVM IR `addressspace` qualifier and the clang
    `address_space` attribute.

    A value of zero represents the default address space. The semantics of non-zero
    address spaces are target-specific.

    Example:
    ```
{% endraw %}
mlir
    // Target-specific numeric address spaces
    !cir.ptr<!s32i, addrspace(target<1>)>
    !cir.ptr<!s32i, addrspace(target<10>)>
{% raw %}

    ```
  }];
}

Here's how it looks in practice with actual C code:

void foo(int __attribute__((address_space(1))) *arg) {
  return;
}

And its respective lowering to CIR — notice how the address space is attached to the pointer types:

cir.func dso_local @_Z3fooPU3AS1i(%arg0: !cir.ptr<!s32i, target_address_space(1)> loc(fused[#loc3, #loc4])) inline(never) {
  %0 = cir.alloca !cir.ptr<!s32i, target_address_space(1)>, !cir.ptr<!cir.ptr<!s32i, target_address_space(1)>>, ["arg", init] {alignment = 8 : i64} loc(#loc8)
  cir.store %arg0, %0 : !cir.ptr<!s32i, target_address_space(1)>, !cir.ptr<!cir.ptr<!s32i, target_address_space(1)>> loc(#loc5)
  cir.return loc(#loc6)
} loc(#loc7)

The attribute flows through the entire representation — from the function parameter, to the allocation, to the store operation. This consistency is crucial for maintaining correctness throughout the compilation pipeline.

Implementation Journey: From Concept to Code

The minimum viable goal I set for myself was straightforward: ensure that address spaces are correctly represented in the underlying IR when code flows through the CIR pipeline. Sounds simple, right? But once you start implementing an attribute in MLIR, you realize there's a whole chain of decisions to make:

First, how should the assembly format look? You need to design syntax that's both human-readable and consistent with MLIR conventions. Second, how will your attribute be parsed? The parser needs to handle the syntax you've designed and convert it into the internal representation. Third, and this is where things get really interesting in the context of Clang — we're consuming information directly from the AST (Abstract Syntax Tree). So the question becomes: how do we bridge that gap? How do we transform high-level AST nodes carrying address space information into the concrete CIR operations we need to generate?

These were the core challenges I tackled during this contribution. It's the kind of work that takes you on a journey from a high-level language construct all the way down to its underlying structural representation — exactly the kind of compiler work I find fascinating.

What I Learned: Navigating Large Codebases

I think one of the most valuable skills I developed through this process is learning to navigate massive codebases. Clang and LLVM are huge — we're talking millions of lines of code built up over nearly two decades. Finding the right abstraction, understanding where your changes fit into the existing architecture, and tracing how data flows through multiple layers of transformation — these are skills you can only develop by doing. And honestly, this experience will stick with me forever.

For anyone interested in the technical details and implementation specifics, I'd highly recommend checking out my PR: https://github.com/llvm/llvm-project/pull/161028

Scope and Future Work

I want to be transparent: what we landed covers just a foundational portion of what address spaces truly encompass. There's a lot I didn't tackle yet — The implementation of Language Specific Addresses, how conversions work between different address spaces(I'm almost done with this), how they interact with type qualifiers, and how they propagate through even lower layers of abstraction during LLVM IR generation and eventually code generation.

I kept the scope manageable for a reason. Between school, other commitments, and this being a side project, I haven't been able to immerse myself in this full-time as much as I'd like (yeah, school has definitely been taking a toll on me lately). But I'm proud of what we shipped, and it lays the groundwork for future iterations.

Reflections on Open Source

Just to wrap up, I want to highlight something that still feels surreal to me: the power of open source as a learning platform. Through this project, I've had the opportunity to collaborate with engineers from AMD, NVIDIA, and Meta — people who work on production compilers and toolchains used by millions.

If you had asked me a year ago whether I'd have the skills or confidence to contribute to a project like this, to engage in design discussions with industry professionals, or to have my code reviewed by compiler experts — I probably wouldn't have believed it was possible. Yet here we are.

That's the magic of open source: it doesn't care about your resume or your credentials. It cares about your willingness to learn, your ability to take feedback, and your commitment to shipping quality work. For any student reading this who's on the fence about diving into a large open source project — just do it. The learning curve is steep, but the growth is exponential.

OSD600 Lab 5: Git rebases

David Rivera — Mon, 13 Oct 2025 05:29:53 +0000

This week we went through some foundational topic in git operations and in general a very important concept when contributing to open source...

Git rebases are essential for maintaining a clean and linear project history. They allow you to integrate changes from one branch into another by applying commits from one branch onto another, which helps in simplifying the commit history and making it easier to follow. Rebasing can also help resolve conflicts early and keep your codebase up-to-date with the latest changes from other branches.

In my personal experiences, I started utilizing these in production-ready projects, initially at Kinaxis or when working in LLVM when addressing different feedback that could be merged into one commit. In conclusion, these, among many others, are the fundamental benefits.

Rewriting your commits on top of another's branch commits (like syncing to main):
When you want to update your feature branch with the latest changes from the main branch, a rebase allows you to apply your commits on top of the updated main branch. This keeps your commit history linear and avoids unnecessary merge commits.

For squashing commits:
Git rebases can be used to combine multiple commits into a single, cohesive commit. This is useful for cleaning up your commit history before merging into a main branch, making it easier to review and understand the changes.

OSD600 Lab 3: Git merges and parallel branches + Some 0.2 updates

David Rivera — Sun, 28 Sep 2025 22:59:38 +0000

This week we went through the process of merging different branches and generally working in parallel with different features. Explicitly performing merges was something I Was not familiar with. In most of the foss projects I have contributed on, it is pretty much a standard that your code would get into the main branch through a pull request(which, in theory is a merge on approval).

There's two small features I implemented for my project represented in the following issues:
Issue-31
issue-32

These were pretty straightforward to implement and required minimal effort on my side, perhaps the most repetitive thing done was adjusting the tests to some new members I added on the core datastructures, but aside from that It went very smooth.

The imporance of parallel branches
It is often common that as developers we have to embark on tackling multiple things at the same time when working on a project. To maximize our productivity git provides a lot of facilities, and branches would be the best abstraction for developers to utilize. Imagine in a large project, it may arise that multiple people contribute to the same file. To solve many of the changes that may arise because of this, git provides a very comprehensive conflict resolution system, which marks with separators (<<<). After that it is up to us to represent the solution of this conflict.

Conflicts were pretty common when I was working at my Co-op at Kinaxis. Sometimes we'd have these codebase-wide changes that introduced or replaced certain data types (eg, replace int to int64_t). Thankfully nowadays IDE's contain amazing features to make these conflict resolution as smooth as possible.

While all of these are not new for me at all, in the lecture we had last Friday I saw some pretty useful techniques to work with branches I was not familiar with at all.

Conclusion
Working in parallel is a fundamental thing that thankfully, git abstracts for us amazingly. It also provides certain facilities to for instance duplicate branches that will allow us to further experiment and deliver better-quality code.

Some brief updates on Release 0.2

This week went amazingly in regards to the projects I'm currently contributing to. For Clangir: I'll be diving deep into this project as I was allowed to upstream the work into the main LLVM repo! -> (I'll expand on what this represents next week).

I also spent a decent chunk of my week working on IREE, and I was able to make amazing progress in one of the issues I chose with support from the maintainers. While contributing to this project wasn't very straightforward, I DID LEARN A LOT. In fact, it's a blessing that it's been this complicated. (Again I'll talk more about this next week).

Regarding my foss journey so far I can confidently say I'm helping to shape compiler infrastructure at the age of 21, which is amazing.

OSD600: Lab 2

David Rivera — Sat, 20 Sep 2025 02:03:19 +0000

My initial idea for this blog was to advise on how to navigate large codebases; however, in the blog's coming weeks, when I start contributing to meaningful projects, either that I already know or where I'll do my first contribution, perhaps I'll definitely have that inspiration. I'll keep this one short this time.

I undertook the task of implementing the --recent specified in lab 2 flag in Abdulgafar's repo.

I have to give props to him, considering how simple it was to navigate his code. I overall found most of it very well written and kept stuff relatively simple and easy to understand for a first-time contributor to that project.

I also received a PR in my repo. I'm very impressed by the quality of the code provided by Parker in that PR, especially considering the relative sophistication I had introduced in certain parts of the code, and all things considered, hell, there was also a CI/CD pipeline aspect involved as well as adding regression tests in the whole process. So at the end props to him as well.

Now that Parker has been exposed to that process, I believe I might need to relax some of the testing infrastructure I'm currently enforcing. This is obviously not something you'd do if you were deploying this app to production; however, when adding a new feature, there are certainly some things that need significant changes in the tests themselves. The main goal of that is to reduce the friction for other developers to contribute at the end.

Why I Chose Rust for My OSD600 Project (And What I Learned)

David Rivera — Sat, 20 Sep 2025 01:09:01 +0000

As part of the course OSD600 (Open Source Development), we were asked to create a tool scratch that could provide context to LLMs based on a git repository. The tool should index a codebase and provide the contents, file structure in (initially) Markdown format. I found this pretty cool given we could choose the programming language we wanted.

This is initially a toy project. Modern IDEs have exceptional agentic tools that can index huge codebases super-fast with tailored responses according to the developer's needs.

I really wanted to push myself with this project; therefore, I chose Rust. Now, I'd say I'm very proficient in C++, having a solid track record of open-source contributions in a large codebase like LLVM. The only risk that this posed was that it was a collaborative course. Therefore, my code needs to be expanded by other students, which implies several things. Most students nowadays are acquainted with the "Pop" programming languages that include "Python, JavaScript, Java, etc.” There's indeed nothing wrong with that; however, moving from those programming paradigms to Rust could pose a significant challenge. Still, I decided to leap forward.

The Rust Programming Language

Rust learns a lot from languages from the C family (C/C++). It not only provides explicit memory management but also provides safety over it. Think of it: one of the main benefits of memory allocations in modern C++ is the introduction of smart pointers. Smart pointers are based on the following paradigm: RAII (Resource acquisition is initialization), which implies that a resource (such as dynamic memory) is allocated through its constructor and deallocated when its destructor is called afterwards. This is the same principle that applies to all objects in Rust intrinsically. This basically allows us to represent each object with lifetimes, which are crucial to avoiding dangling pointers, invalid memory access and other very common problems present in C/C++.

For example

struct MyResource;

impl Drop for MyResource {
    fn drop(&mut self) {
        println!("My resource is being cleaned up!");
    }
}

fn main() {
    let _r = MyResource;
    println!("My resource is in use.");
} // The _r variable goes out of scope here, and `drop` is automatically called.

I will not go into detail on every single Rust "goodie"; however, the example above illustrates the benefits and the enhancement of the developer's experience when transitioning from C++ to Rust. In fact, I missed perhaps the most important thing, which is the borrow checker, but as I mentioned earlier, I do not want to make this complicated.

My previous experience with the language

I had written some casual programs, and it was certainly a pleasant experience. Although sometimes it took a bit longer to get a prototype working, that's expected given the language design itself. In one of these small projects, I tried to replicate the famous game Pong using the SDL2 lib. In another project, I attempted to write a Lexer generator. The previously mentioned project was dropped since I started investing my time heavily in Open Source projects of my interest.

To conclude this section, what exactly did I expect from the language? I expected a smooth development experience without weird stack traces and other obscure error messages coming from my compiler when writing C++; I'd like to say in advance that it definitely matched my expectations, kind of a shallow thing to say, but most times my code compiled, it ran without any memory-related problems.

Now, Let's talk about the tool itself.

Every "serious" project requires careful planning. One thing I had in my mind for sure is that I didn't really want to write things from scratch altogether. Certain things like command line argument parsing and such are amazing in the Rust ecosystem. I'll denote the Crates I utilized and its functionalities and relevance in my project.

Core CLI & Framework
• clap - Command-line argument parsing and help generation.

Git Integration
• git2 - Git repository operations and metadata extraction
• chrono - Date/time formatting for git commit timestamps

File Processing
• globset - Pattern matching for file filtering (.rs, src/*)
• ptree - Directory tree visualization in terminal output (loved this one)

In the following diagram, I'll demonstrate how the main logic and these dependencies were glued together.

I'd like to mention that out of all dependencies assembled, there wasn't any particular one that was difficult to use. My development experience was very smooth, and perhaps a large portion of the time was spent thinking about how to write tests related to each isolated functionality I added. I'm aware tests and a CI/CD pipeline was not required however as I believe I mentioned earlier, It's the best we can utilize to keep track of the changes we perform as we progress.

Lessons learned and the road ahead: As I highlighted at the very beginning of this blog, it's always good to come back and push yourself out of your comfort zone. I realize I could have spent less time writing this in a simpler way. I believe great thinkers, engineers, and technical people are exposed to many technologies to build a solid foundation of their knowledge. Using Rust for this assignment will help me strengthen my skills in system programming, and who knows, I may want to contribute to the Rust compiler at some point :).

The foundation of this tool is now set up, and support to any person willing to contribute to my project will always be there. I cannot hide my excitement but in the following weeks, it is where the fun will begin. IREE and other great projects that align with my skillset are the next stop in my open source journey. It's time to make compilers great.