What is Intermediate Representation - A Gist

#beginners #llvm #ir

Here's a problem statement: What if you have a programming language specification and want to write a compiler for it? What about if you have two languages? Or, three? Or more?

While each language is quite different to each other, all of them will have some similarities under the hood. Sure, there may be simple changes like basic syntax and file structure. And, there may be more complex changes like variable hoisting rules, lifetimes and garbage collection but, when everything is stripped away, your code has to be eventually converted into something machine readable anyway.

#include <iostream>

int main(void) {
    const std::string message = "Hello, world!";
    std::cout << message << std::endl;

    return 0;
}

const message = 'Hello, world!';
console.log(message);

These two snippets might look different, and that's because they are probably the farthest apart (apart from perhaps a functional language) as you can get in programming languages without going into the deep end. And yet, an ideal IR aims to unify these in a single, complete and non-lossy format.

So, why not try to write a generic compiler for everything? The refinement of this idea could be implemented using intermediate representation, or IR.

What is IR?

Contrary to popular belief, IR is not necessarily a programming language but could either be a data-structure (abstract-syntax trees) or code used internally by a compiler to represent the true source code.

The LLVM project is probably the most popular example of a widely-used IR: the LLVM IR.

LLVM consists of three major parts: the frontends, the IR and the backends. The frontends' job is to take source code from the native language and convert it into LLVM IR. And the job of the various backends is to generate machine code from LLVM IR that targets the required instruction set.

Why IR?

The biggest, and most prevalent reason to use an IR is common optimisations. If the true language source code can be converted into an IR, a general set of optimisations can be performed on that IR. This is opposed to the traditional method which would mean writing language-specific optimisations for every compiler frontend.

The second major benefit of IR is that it enables just-in-time (JIT) and ahead-of-time (AOT) compilation depending on the use case.

Here's an example of the crux of the LLVM IR for a simple C "Hello, world" program.

#include <stdio.h>

int main(void) {
    printf("Hello, world!\n");
    return 0;
}

@.str = private unnamed_addr constant [15 x i8] c"Hello, world!\0A\00", align 1

; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @main() #0 {
  %1 = alloca i32, align 4
  store i32 0, ptr %1, align 4
  %2 = call i32 (ptr, ...) @printf(ptr noundef @.str)
  ret i32 0
}

declare dso_local i32 @printf(ptr noundef, ...) #1

DEV Community

What is Intermediate Representation - A Gist

What is IR?

Why IR?

Top comments (0)

Read next

Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning

Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities

There and Back Again: The AI Alignment Paradox

Large Language Models Can Self-Improve At Web Agent Tasks