DEV Community

Julien Simon
Julien Simon

Posted on • Originally published at julsimon.Medium on

Video: SLM inference on AWS Graviton

Video: SLM inference on AWS Graviton4

CPU inference? Hell yes.

In this episode, Lorenzo Winfrey, Jeff Underhill, and I discuss there’s hope beyond huge closed models and expensive GPU instances. Yes, AWS Graviton4 packs a punch and is possibly the most cost-effective platform for SLM inference. To prove our point, I show how to quantize and run our Llama-3.1-SuperNova-Lite model on a small Graviton4 instance. You won’t believe the text generation speed 😃

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay