At first I thought Artificial Intelligence was mainly about networks and training models.. After learning about transformers and other things like GPUs and memory systems and compilers and hardware optimization I realized that modern Artificial Intelligence is actually a huge problem that involves a lot of different systems working together.
One of the things I learned is that Artificial Intelligence models mainly do a lot of math problems like matrix multiplications and tensor operations. That is why GPUs are so important. They can do thousands of math problems at the time, which is something that CPUs cannot do.
Having powerful hardware is not enough.
Modern Artificial Intelligence is heavily limited by how we can move data around. GPUs can do math problems quickly but if the data they need to do those math problems does not get to them quickly enough then they just sit there doing nothing. I learned that moving data around can be more expensive than doing the math problems.
I also learned about kernels and compiler optimization and runtimes. There are tools like TVM and MLIR and TensorRT and Triton that help make Artificial Intelligence models work better on types of hardware by reducing how much data needs to be moved around and by making better use of the GPU.
The important thing I learned is this:
Artificial Intelligence is no longer just about building smarter models. It is about using hardware to make intelligence that can be scaled up.
Looking at Artificial Intelligence from a systems perspective made it a lot more interesting, to me than I thought it would be.
Top comments (0)