Ankush Banyal for Ant Media

Posted on Feb 18

Designing Video Architecture That Scales With Your Product (Not Against It)

If you’re building a modern app with video, chances are your requirements didn’t stop at “just a video call.”

It usually starts simple:

One-to-one video calls
Then evolves into:

Live streaming

Audience interaction

Real-time gifts, reactions, overlays

That’s when architecture choices start to matter — a lot.

This article walks through how teams typically handle private video calls and interactive live streaming in the same product, what works well in practice, and where things usually break.

Two Video Use Cases That Look Similar — But Aren’t

At a glance, these both involve video:

Private one-to-one calls

One-to-many live broadcasts with interaction

Under the hood, they behave completely differently in terms of:

Bandwidth

Latency

Scaling

Infrastructure cost

Trying to force one solution to handle both almost always leads to compromises.

One-to-One Video Calls: P2P Still Wins

For private calls, the goals are clear:

Lowest possible latency

Direct communication

Minimal backend involvement

The Practical Setup (Still Valid in 2025)

WebRTC peer-to-peer for audio/video

Backend only for signaling, auth, and discovery

STUN + TURN (coturn) for NAT/firewall reliability

This setup has aged well because it does exactly what it should:

Media flows directly when possible

Falls back gracefully when networks get messy

Keeps infrastructure costs predictable

For 1:1 calls, routing media through your backend is usually unnecessary overhead.

Why P2P Doesn’t Scale for Live Streaming

Live streaming changes everything.

If one broadcaster has:

50 viewers

100 viewers

500 viewers

Pure P2P means the broadcaster uploads that many streams.

On mobile, that’s a hard no:

Battery drain

Upload limits

Dropped frames

Crashes under load

This is where many early-stage apps hit their first real wall.

SFU: The Missing Middle Layer

To scale live video properly, you need a Selective Forwarding Unit (SFU).

The idea is simple:

Broadcaster uploads one stream

SFU forwards it efficiently to viewers

Latency stays low

The broadcaster’s device survives

This model is why SFUs power most real-time live platforms today.

Gifts, Reactions, and Why Latency Matters

Live gifts only feel meaningful if:

The broadcaster reacts instantly

Viewers see reactions in sync

Latency stays very low

This is where traditional RTMP → HLS pipelines struggle:

15–30 seconds of delay kills interaction

Gifts feel disconnected from reality

That’s why many teams combine:

WebRTC (via SFU) for interactive viewers

HLS / LL-HLS for large, passive audiences

It’s not either/or — it’s choosing the right tool per audience size.

Running 1:1 Calls and Live Rooms in the Same App

This is a common concern, and yes — it works well if you keep boundaries clear.

What Can Be Shared

Authentication

User identity

Payments and gifting logic

Chat, reactions, UI components

What Should Stay Separate

Media routing paths

Scaling logic

Session lifecycle handling

Trying to reuse the exact same media flow for everything usually leads to tight coupling and painful refactors later.

Where Platforms Like Ant Media Fit In

When teams don’t want to build and maintain all of this from scratch, they often look for solutions that already support multiple streaming models.

For example, platforms like Ant Media Server are commonly used in setups where:

WebRTC P2P is needed for private calls

WebRTC SFU is needed for interactive live streams

HLS or LL-HLS is needed for scale

Mobile clients are first-class citizens

The value isn’t just protocol support — it’s having one backend that can handle different video paths cleanly, depending on the use case.

Whether you build yourself or use an existing platform, the architecture principles stay the same.

Common Mistakes Teams Regret Later

Some patterns show up again and again:

Forcing P2P to handle live broadcasts

Adding gifts on top of high-latency streams

Ignoring TURN usage until production bills arrive

Testing only on good Wi-Fi

Over-optimizing for massive scale too early

Most of these come from trying to simplify too much.

If I Were Starting Fresh Today

I’d design with intent from day one:

WebRTC P2P for private calls

WebRTC SFU for live, interactive streams

HLS / LL-HLS only when scale demands it

Gifts and reactions built as real-time events

Clear separation between call logic and broadcast logic

It’s not the smallest setup — but it’s one that grows without fighting you.

Final Thought

Video isn’t hard because of codecs or APIs.

It’s hard because:

Latency shapes user behavior

Mobile networks are unpredictable

Different use cases need different paths

Get the architecture right early, and everything else — features, scale, monetization — becomes much easier.

Hopefully this saves someone a painful rewrite down the road.

Top comments (1)

Fora Soft • Mar 17

Great breakdown on why SFUs are the go-to for real-time video.

From what we’ve seen at scale, just throwing an SFU at the problem only gets you so far. Once you move past small calls into big broadcast-style scenarios, things get tricky fast. In practice it usually looks like:

Up to ~50–100 people: plain SFU works fine
Hundreds: you start needing regional or cascaded SFUs
Thousands+ viewers: WebRTC ingest feeding transcoding and then CDN distribution

Simulcast or SVC is huge here too. Without adaptive layers, your SFU can hit a bandwidth wall fast.

Curious if the author has experience with cascading SFU architectures or edge SFU deployments?