Video: SLM inference on AWS Graviton

#llm #ai #openai #opensource

Video: SLM inference on AWS Graviton4

CPU inference? Hell yes.

In this episode, Lorenzo Winfrey, Jeff Underhill, and I discuss there’s hope beyond huge closed models and expensive GPU instances. Yes, AWS Graviton4 packs a punch and is possibly the most cost-effective platform for SLM inference. To prove our point, I show how to quantize and run our Llama-3.1-SuperNova-Lite model on a small Graviton4 instance. You won’t believe the text generation speed 😃

DEV Community

Video: SLM inference on AWS Graviton

Video: SLM inference on AWS Graviton4

Top comments (0)