DEV Community

Julien Simon
Julien Simon

Posted on • Originally published at julsimon.Medium on

Video deep dive: Advanced distributed training with Hugging Face LLMs and AWS Trainium

Following up on my recent “Hugging Face on AWS accelerators” deep dive, this new video zooms in on distributed training with NeuronX Distributed Optimum Neuron and AWS Trainium.

First, we explain the basics and benefits of advanced distributed techniques like tensor parallelism, pipeline parallelism, sequence parallelism, and DeepSpeed ZeRO. Then, we discuss how these techniques are implemented in NeuronX Distributed and Optimum. Finally, we launch an Amazon EC2 Trainium-powered instance and demonstrate these techniques with distributed training runs on the TinyLlama and Llama 2 7B models.

Of course, we share results on training time and cost, which will probably surprise you!

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more