DEV Community

Pitchers
Pitchers

Posted on

Scaling LiveKit Egress for Recordings (Private Meetings + Livestream Platform)

Hi everyone,

I'm currently designing the architecture for a live streaming and private meeting platform and would appreciate some guidance on scaling recording infrastructure using LiveKit Egress.

I haven't implemented autoscaling yet and want to design the system correctly before moving forward.

Current Stack

Frontend: Angular

Backend: .NET

Media Server: Self-hosted LiveKit

Infrastructure: AWS (EC2 + containerized services)

Coordination: Redis

OS: Ubuntu EC2 instance

Recording Use Cases

The platform supports two types of sessions:

  1. Private Meetings

Recording uses RoomComposite Egress to capture the entire meeting.

  1. Livestream Classes

Recording uses Participant Egress to record only the instructor stream.

Recording is optional and triggered by the instructor, so demand can vary significantly. For example, several instructors could start recording sessions simultaneously.

Problem I'm Trying to Solve

Since egress workers process recording jobs, I'm trying to design a system that can handle bursts of recording requests without failures.

My concern is handling situations where many recordings start at the same time. Without proper scaling, this could lead to:

recording requests failing

egress workers becoming overloaded

timeouts during recording initialization

What I'm Trying to Achieve

Ideally the system should:

Automatically scale egress workers when recording demand increases

Scale down when idle to reduce infrastructure cost

Handle bursts where many recordings start simultaneously

Support both RoomComposite and Participant egress jobs efficiently

Questions

For developers running LiveKit in production:

What is the recommended strategy to scale LiveKit egress workers?

Should autoscaling be based on:

CPU / memory usage

number of active recordings

number of pending egress jobs

pipelines per worker

Has anyone implemented autoscaling for egress workers successfully on AWS (ECS / EC2 / Kubernetes)?

When LiveKit server load increases (many rooms), how do you typically scale LiveKit media servers alongside egress workers?

Context

I'm still in the architecture design stage, so any suggestions, reference architectures, or lessons learned from production deployments would be extremely helpful.

Thanks in advance!

Top comments (0)