AMD ML Complete Stack

#gpu #cpu #ai #llm

I wrote 6 lines of Triton…

and it turned into thousands of GPU instructions.

Python → TTIR → TTGIR → LLVM → AMDGCN → HSACO

👉 a + b → buffer_load_b128

👉 mask → v_cmp + conditional execution

Here’s the truth:

Your code is NOT what runs on the GPU.

The compiler builds an entire execution pipeline in between.

I dumped every stage and traced one kernel end-to-end 👇

After this, ML compilers don’t feel like “magic” anymore.

DEV Community