Most people think they are running Python when they train ML models.
They are not.
Python is only the interface.
The real execution happens somewhere completely different β inside an ML compiler stack.
π§ What actually happens?
When you write something like:
matmul β add β relu
It looks simple.
But internally, the system transforms it into multiple layers:
- Python (model definition)
- Graph (tensor operations)
- Execution plan (optimized structure)
- Kernels (GPU/CPU instructions)
- Hardware execution
At no point does the GPU βrun Pythonβ.
It runs compiled kernels.
βοΈ Why ML Compilers exist
Because raw model code is inefficient for hardware.
Without a compiler:
- Too many kernel launches
- Unnecessary memory transfers
- No operator fusion
- Poor GPU utilization
With a compiler:
- Operations are fused
- Memory movement is reduced
- Execution is optimized for hardware
π₯ Key concepts covered
This article builds the foundation for:
- MLIR (multi-level IR systems)
- TVM (end-to-end ML compiler stack)
- GPU kernel execution model
- Operator fusion & memory planning
- Compilation pipeline design
π§ Roadmap (what youβll learn)
- Tensors, shapes, memory layout
- CPU vs GPU execution model
- Compiler basics (IR, lowering, passes)
- ML compiler optimizations
- Real systems (TVM, MLIR, XLA)
Top comments (0)