DEV Community

Julien Simon
Julien Simon

Posted on • Originally published at julsimon.Medium on

Video: SLM inference on AWS Graviton

Video: SLM inference on AWS Graviton4

CPU inference? Hell yes.

In this episode, Lorenzo Winfrey, Jeff Underhill, and I discuss thereโ€™s hope beyond huge closed models and expensive GPU instances. Yes, AWS Graviton4 packs a punch and is possibly the most cost-effective platform for SLM inference. To prove our point, I show how to quantize and run our Llama-3.1-SuperNova-Lite model on a small Graviton4 instance. You wonโ€™t believe the text generation speed ๐Ÿ˜ƒ

Top comments (0)