DEV Community: Ankush Banyal

MoQ: The Next Big Shift in Ultra-Low Latency Streaming?

Ankush Banyal — Wed, 20 May 2026 10:46:44 +0000

Media over QUIC (MoQ) is starting to become one of the most interesting shifts in real-time streaming infrastructure.

For years, the industry has been balancing between:

Ultra-low latency with WebRTC
Scalability with HLS/CDN architectures

MoQ introduces a different direction — combining QUIC-based transport with scalable real-time delivery models that can potentially reduce infrastructure overhead while maintaining very low latency.

Why this matters:

Large-scale interactive streaming becomes more practical
Better transport efficiency compared to traditional approaches
Reduced dependency on heavy peer/session management
More flexibility for modern live experiences like esports, live shopping, auctions, and interactive broadcasts

At Ant Media, we’re actively exploring and supporting MoQ-based workflows alongside WebRTC and LL-HLS architectures.

One important point:
MoQ is still evolving, and adoption will require ecosystem changes — especially around playback support and player integrations. But the direction is extremely promising for the future of scalable real-time media delivery.

The streaming stack is evolving fast:
RTMP → HLS → WebRTC → MoQ

Excited to see where the ecosystem goes next 🚀

MoQ #MediaOverQUIC #WebRTC #LowLatencyStreaming #StreamingInfrastructure #LiveStreaming #QUIC #RealtimeMedia #AntMedia #OTT #Esports

The Next Chapter of Live Streaming: MoQ and SCTE-35 Are Here

Ankush Banyal — Wed, 13 May 2026 10:40:44 +0000

Live streaming has always been a game of trade-offs. You either had low latency or you had scale. You either had seamless ad delivery or you had complex infrastructure. For years, the industry worked around these limitations rather than solving them.

Two technologies are changing that. Media over QUIC (MoQ) and SCTE-35. And at Ant Media, we're building for both.

Why the old stack is showing its age

Today, most streaming platforms run on one of two protocols. WebRTC for real-time, interactive use cases — think live auctions, one-to-one video, sports betting. And HLS/DASH for large-scale broadcast delivery — think concerts, news channels, OTT platforms.

The problem? WebRTC delivers sub-second interactivity but can be expensive and difficult to scale beyond small audiences. HLS and DASH scale beautifully through HTTP CDNs but introduce latency — in practice, often six seconds or more. Ant Media
Every team building a serious streaming platform eventually hits this wall. You pick one and sacrifice the other.

Enter MoQ — one protocol to do both

MoQ combines the low-latency interactivity of WebRTC, the scalability of HLS/DASH, and the simplicity of a single architecture — all built on a modern transport layer. Medium
It's built on QUIC, the same transport protocol behind HTTP/3. MoQ's choice of QUIC as its transport foundation enables faster connection and retransmission times, organises media into a clean Tracks/Groups/Objects hierarchy, and supports web browsers natively through WebTransport and WebCodecs. Ant Media

What does that mean practically? Sub-second join times and internet-scale fan-out — without maintaining thousands of individual real-time sessions. Concerts, sports broadcasts, and live events where you need sub-second latency for a million simultaneous viewers — a CDN relay model makes this economical. Ant Media

The architecture is also simpler to operate. Relays don't need to understand the media. This makes relay infrastructure far simpler and cheaper than the equivalent WebRTC SFU or HLS origin/packager chain. Fan-out is built into MoQ — one upstream subscription from a relay to a publisher can serve thousands of downstream subscribers. Ant Media

Is MoQ production-ready today? Not fully. WebTransport, which MoQ depends on in browsers, represents a fraction of a percent of web page loads versus WebRTC's stable 0.35%. But the momentum is real — Cloudflare, Akamai, Google, Meta, and Cisco are all invested in the standard. And at Ant Media, we showcased MoQ live at NAB Show 2026. We're not waiting for the market to catch up. We're helping build it. Ant Media

SCTE-35 — the unsexy technology that makes you money

While MoQ gets the headlines, SCTE-35 quietly solves one of the biggest revenue problems in live streaming: ad insertion.

SCTE-35 signals are used to identify national and local ad breaks as well as program content like intros, chapters, blackouts, and extensions when a live program runs long. For modern streaming applications, they are included within a transport stream and converted into metadata embedded in HLS and DASH manifests.

In plain terms: SCTE-35 tells your ad system exactly when and where to insert an ad, without interrupting the stream.
Since SCTE-35 is designed for server-side ad insertion (SSAI), ads blend seamlessly into the stream. No buffering. No jarring cuts. The viewer barely notices. The advertiser gets precise placement. The platform gets paid. iotum
With live events, it isn't always possible to know exactly where to insert the ads — for example, you may need to go to an ad break at any time during a game depending on the flow of play. Dynamic SCTE-35 ad markers can automatically detect when an event is going to break and insert the markers in real time.

Ant Media Server supports SCTE-35 natively. Any SSAI platform that supports HLS with SCTE-35 markers will work — such as AWS MediaTailor, Google Ad Manager, Broadpeak, or Yospace. You bring the content, AMS handles the signalling, and your ad platform does the rest.

What this means for you

If you're building a streaming platform today, the infrastructure decisions you make now will determine how much you can scale tomorrow — and how much revenue you can capture along the way.

MoQ gives you a path to sub-second latency at broadcast scale without building and operating an SFU fleet. SCTE-35 gives your platform a professional-grade monetisation layer that works with the ad ecosystem you already rely on.

At Ant Media, our strategy has always been simple: give you a multi-protocol streaming engine that evolves with the industry, without locking you into any single approach.
MoQ and SCTE-35 are not the future. They're what the best streaming platforms are building with right now.

Why MoQ + SCTE-35 Could Change the Future of Live Streaming

Ankush Banyal — Wed, 06 May 2026 10:09:52 +0000

For years, live streaming infrastructure has been split into two worlds.

One side focused on ultra-low latency. The other focused on broadcast-grade reliability and monetization.

If you wanted sub-second delivery, you usually sacrificed traditional broadcast workflows. If you wanted SCTE-35 markers, SSAI, and mature OTT tooling, you accepted higher latency.

Now that may finally be changing.

Between Media over QUIC (MoQ) and modern SCTE-35 workflows, we are starting to see the foundations of a streaming stack that can support both:

real-time interactivity
broadcast-scale delivery
and professional monetization

all at the same time.

And honestly, that is a pretty big shift.

First, What Exactly Is MoQ?

MoQ (Media over QUIC) is an emerging transport protocol being developed to improve how live media is delivered over the internet.

Instead of relying on older HTTP-based delivery methods that were never truly designed for real-time media, MoQ is built on top of QUIC.

That matters because QUIC gives us:

lower latency
better congestion handling
reduced buffering
faster recovery from packet loss
improved scalability for live delivery

The important thing is this:

MoQ is not just "another streaming protocol."

It is an attempt to rethink internet media delivery from the ground up.

Especially for:

live sports
auctions
betting
cloud gaming
interactive broadcasts
multi-view streaming
real-time experiences

Basically, anywhere milliseconds matter.

The Problem With Traditional Streaming

Traditional OTT delivery usually looks something like this:

Camera → Encoder → Packager → CDN → HLS/DASH → Viewer

This works well for scale.

But latency becomes painful.

Even with Low-Latency HLS, many workflows still operate in the 2-6 second range.

For interactive experiences, that is often unacceptable.

Meanwhile, WebRTC solves latency extremely well.

But historically, WebRTC environments have lacked some of the mature broadcast ecosystem features like:

SCTE-35 signaling
SSAI workflows
advanced packaging pipelines
large-scale CDN integration
traditional OTT monetization

This created a gap in the market.

MoQ is interesting because it has the potential to bridge these two worlds.

Where SCTE-35 Fits Into All This

SCTE-35 is one of the hidden building blocks of modern broadcast infrastructure.

Most viewers never hear about it.

But broadcasters rely on it constantly.

SCTE-35 markers tell downstream systems:

when ads should start
when ads should stop
where content boundaries exist
how SSAI systems should behave

Without SCTE-35, dynamic ad insertion becomes much harder.

And monetization at scale becomes messy.

This is why preserving SCTE-35 markers across workflows matters so much.

Especially when moving from traditional broadcast pipelines into internet-native streaming architectures.

Why Developers Should Care

If you are building modern streaming infrastructure today, you are probably seeing the same trend everyone else is:

Users expect television-quality reliability. But they also expect real-time responsiveness.

That combination changes everything.

For example:

Sports Betting

A 5-second delay can completely break the experience.

But sports platforms still need:

ad insertion
monetization
CDN scalability
regional feeds
metadata signaling
Interactive Events

Live polls, synchronized experiences, and audience participation only work when latency stays extremely low.

Creator Platforms

Modern creators increasingly want:

low latency
simulcasting
monetization
dynamic ad insertion
audience engagement

The infrastructure stack is evolving fast because these expectations are colliding.

MoQ Is Still Early, But Important

It is important to be realistic here.

MoQ is still emerging.

The ecosystem is not fully mature yet.

Tooling, interoperability, CDN adoption, and workflow standards are still evolving.

But the direction is very clear.

The industry is actively moving toward:

lower latency
metadata-aware delivery
more efficient transport layers
internet-native broadcasting
scalable real-time media systems

And that is exactly why developers should pay attention now.

Because the teams that understand these architectures early will be the teams building the next generation of streaming platforms.

The Bigger Picture

For a long time, streaming architecture has been fragmented.

Broadcast engineers lived in one ecosystem. Real-time developers lived in another.

Now those worlds are starting to merge.

Protocols like MoQ. Technologies like SCTE-35 passthrough. Low-latency delivery. Modern SSAI. WebRTC. QUIC.

These are no longer separate conversations.

They are becoming part of the same infrastructure stack.

And honestly, that is what makes this moment exciting.

We are finally reaching a point where streaming platforms may not have to choose between:

low latency
scalability
monetization
or broadcast-grade workflows

They may eventually be able to have all four.

Final Thoughts

MoQ will not replace every protocol overnight.

SCTE-35 is not new.

But together, they represent something important:

The industry is moving toward streaming architectures that are both real-time and production-grade.

That shift is going to affect:

OTT platforms
broadcasters
sports streaming
betting platforms
cloud gaming
creator ecosystems
enterprise streaming
and pretty much every company building live video products

The next few years in streaming infrastructure are going to be very interesting.

And this time, the internet might finally be catching up to what live media actually needs.

Picking a Live Streaming Stack in 2026: Ant Media vs Wowza vs Mux vs Janus

Ankush Banyal — Wed, 29 Apr 2026 10:55:17 +0000

A practical buyer's guide for CTOs and developers who are done with marketing pages and just want to know which one to pick.

If you have spent more than a weekend evaluating streaming infrastructure, you already know the problem. Every vendor's homepage promises ultra-low latency, infinite scale, and a developer-first API. None of them tell you the thing you actually need to know: which workload they were designed for, and where the bill stops being predictable.

I have spent years working with teams building everything from live auctions and 24/7 IPTV channels to virtual classrooms, surveillance dashboards, and live shopping platforms. The same four names come up in nearly every conversation: Ant Media, Wowza, Mux, and Janus. They are usually compared as if they are interchangeable. They are not.

This piece is a clear-headed walk through what each one is built for, where each one breaks down, and why for most modern streaming workloads in 2026, Ant Media is the right answer. No hand-waving, no "it depends" cop-outs.

The four players, in plain language

Before any comparison makes sense, you need to know what each of these things actually is. They sit at different layers of the stack.

Ant Media Server is a self-hosted media server. You run it on your own infrastructure (cloud, on-premise, Kubernetes) and you pay a license per running server. It speaks WebRTC, RTMP, HLS, LL-HLS, SRT, RTSP, CMAF, WHIP, WHEP, and more. The headline feature is sub-500ms WebRTC at scale, with everything else (recording, ABR, transcoding, REST API, mobile SDKs) included. PAYG starts at $0.24/hour, annual at $1,068/year per server, perpetual at $2,799 one-time.

Wowza Streaming Engine is the elder statesman. Self-hosted streaming server software that has been around long enough to power a generation of broadcast workflows. Strong on protocol breadth and traditional broadcast pipelines. Pricing has shifted toward per-instance and per-hour models that punish always-on workloads, with starting plans around $195/month and per-hour streaming fees layered on top.

Mux is a fully managed video API. You don't run servers; you call their API, they encode, store, and deliver. Pricing is per minute encoded, stored, and delivered. Optimized for VoD-heavy applications and short-form UGC. Live streaming exists but it is RTMP-in, HLS-out, not WebRTC at sub-second latency.

Janus is a general-purpose open-source WebRTC server developed by Meetecho. It is a gateway with a plugin architecture, not a turnkey media server. You write or extend plugins for your specific use case. Free under GPLv3, but you bring your own everything: signaling, recording strategy, ABR logic, scaling architecture, mobile SDKs.

That distinction matters. Ant Media and Wowza are media servers you deploy. Mux is a SaaS you call. Janus is a toolkit you build with. Comparing their prices line-by-line is misleading; you are comparing very different commitments.

The five questions that decide the answer

Forget feature checklists. The choice almost always comes down to five concrete questions.

What latency do you actually need?

This is the first fork in the road, and people get it wrong constantly.

If your use case is interactive (auctions, betting, live shopping with a host who responds to chat, telehealth, online tutoring with raise-hand, multi-party video) you need WebRTC and you need sub-500ms end-to-end. Anything above one second breaks the interaction model. HLS, even low-latency HLS at 2 to 5 seconds, is too slow.

This rules Mux out for the live interactive half of the workload immediately. Mux delivers HLS; their live latency lives in the 5 to 10 second range, which is fine for sports broadcasts but useless for an auctioneer trying to take bids in real time.

Ant Media delivers ~300ms WebRTC latency reliably at scale. This is the single most important number for interactive use cases, and it is why Ant Media is the default choice for live shopping platforms, betting apps, online classrooms, and remote inspection products.

Janus also does sub-second WebRTC well, but only WebRTC. No HLS, no DASH, no RTMP ingest out of the box. You either build those bridges yourself or pair Janus with another server. For most teams, that means running two stacks where Ant Media runs one.

How predictable is your traffic?

This is where pricing models start to bite, and it is the single biggest source of "we picked wrong" stories I hear.

Mux charges per minute delivered. Every minute a viewer watches is a minute you pay for. This is great when traffic is small and predictable. It is brutal when something goes viral or when you are running a 24/7 channel. Mux's own examples show $574/month for 45,000 monthly active users on short UGC clips, but a single viral 1080p stream at 5 Mbps will burn through bandwidth costs that genuinely surprise people. For 24/7 broadcast, IPTV, long-form VoD libraries, or any workload with high concurrent viewers on long content, the per-minute model is the wrong shape and there is no architectural lever to pull, because you don't run the servers.

Wowza's published pricing has moved toward per-instance hourly fees on top of monthly minimums. For occasional events and webinars this is fine. For a station streaming continuously, the math gets ugly fast, and complaints about Wowza's bill structure are easy to find.

Ant Media charges per running server. PAYG is $0.24/hour per active server with no per-viewer charges, ever. Annual licenses are $1,068/year per server. Perpetual licenses are $2,799 per server (one-time, optional support renewal after year one). The license has no hard limit on viewers or broadcasters. Capacity is bounded only by what your hardware can serve. For high-concurrency or always-on workloads, this is the cheapest of the three by a wide margin, and it is the only model where your costs do not scale linearly with success.

Janus is free in licensing terms. You pay for hardware, ops, and the engineering time to run it. That last cost is the one teams underestimate.

Worked example: 1000 concurrent viewers, 8 hours/day, 30 days

Ant Media (PAYG, single server): $0.24 × 24 × 30 = $172.80/month, plus your VPS.

Mux (1080p delivery, ~5 Mbps): roughly 1 GB per viewer per hour. 1000 viewers × 8 hours × 30 days = 240,000 viewer-hours, or about 240 TB of delivery. At Mux's per-minute rates this comfortably exceeds $4,000 to $6,000/month.

The gap is not 2x or 3x. It is 20x. And it widens further as you scale.

How much engineering do you have to spare?

There is a hidden axis on every comparison chart that vendors don't print: how much of your team's time will this consume?

Mux is the lowest-effort option by an order of magnitude. You upload, you embed a player, you ship. There is almost no operational burden. For a small team building a one-off video feature inside a larger product, this is real value. The trade-off is you are renting their opinions about encoding, storage, and delivery, and the bill scales with success in a way you cannot architect around.

Ant Media sits in the sweet spot. You install a server, configure SSL with one command, wire up your storage backend, and you have a working WebRTC + HLS + recording stack within hours. The REST API is comprehensive. SDKs cover JavaScript, Android, iOS, Flutter, React Native, and Unity. Clustering for horizontal scale is well-documented and battle-tested in production at thousands of companies. You get the operational control of self-hosting without the build-it-yourself overhead of Janus.

Wowza is similar in operational shape to Ant Media but the documentation has aged badly. Engineers I have worked with describe upgrades as painful and the configuration surface as overwhelming. For new builds in 2026, this is a real cost.

Janus is the most engineering-intensive option by far. You are writing or extending C plugins, designing your own signaling layer, building your own recording pipeline, figuring out scaling architecture from scratch, and maintaining mobile SDKs that don't exist out of the box. It is the right choice when you need exactly what Janus does and nothing else, or when you have very specific protocol or extension needs that turnkey servers can't satisfy. It is the wrong choice when you have a deadline.

Where does your data live, and who owns the recordings?

This is the question that separates self-hosted from managed, and it has both compliance and economic dimensions.

With Ant Media (or Wowza or Janus), the streams and recordings flow through your servers, into your storage. Ant Media has native, one-toggle integration with AWS S3, DigitalOcean Spaces, Wasabi, Cloudflare R2, MinIO, Google Cloud Storage, and Azure Blob. You hold the keys, you set the lifecycle rules, you control the access. For regulated industries (telehealth, financial compliance, government, defence) this is non-negotiable. It also means recording playback can be served directly from cheap object storage, decoupled from the live streaming server. That decoupling matters a lot if your VoD load dwarfs your live load. You only run the Ant Media license during live hours.

With Mux, your media lives in their infrastructure. You get great tooling, but you do not control the data plane. Some teams cannot accept this for regulatory reasons. Others cannot accept it for economic reasons: serving the same recording 10,000 times costs you 10,000x the bandwidth bill on Mux, while on a self-hosted Ant Media + S3 setup it is one upload and a per-GB egress charge from your cloud provider.

Janus by itself does not handle recording in any production-ready way; you bolt that on with plugins or external pipelines. This is one of the bigger hidden costs of the Janus path.

What protocols do you actually need to ingest?

Most teams have an existing camera, encoder, or upstream system that emits a specific protocol. Match the server to that, not the other way around.

Ant Media handles RTMP, RTSP (with ONVIF auto-discovery), SRT, WebRTC, WHIP, HLS, and Zixi natively. Whatever you have, it ingests.

Wowza handles RTMP, RTSP, SRT, and WebRTC. Solid breadth.

Mux ingests RTMP and SRT into their live product, and serves HLS only. No camera support, no WebRTC ingest at sub-second.

Janus ingests WebRTC. Anything else needs custom bridging.

If you have IP cameras (huge for surveillance, security, retail analytics, and physical operations use cases) Ant Media is in a class of its own. Native RTSP pull, ONVIF discovery, WebRTC playback in a browser with no plugins. There is no equivalent in the managed-API world.

The matrix, simplified

Here is how I actually think about it when someone asks "which one should we pick?"

Interactive live (auctions, betting, telehealth, classrooms, live shopping): Ant Media. Sub-300ms WebRTC, recording included, REST API, predictable per-server pricing. Nothing else delivers this combination at a comparable price.

Long-form VoD with high concurrent viewers (training platforms, course libraries, sports replays): Ant Media + S3/R2. Mux bandwidth bills become punishing at scale; serving from object storage is dramatically cheaper, and Ant Media's S3 integration makes this trivial.

24/7 linear channels, IPTV, broadcast workflows: Ant Media. Long-running RTMP/HLS pipelines without per-hour or per-viewer punishment.

IP camera surveillance dashboards: Ant Media. Native RTSP/ONVIF, WebRTC playback in browser, no plugins. This is where Ant Media has no real competition.

Hybrid live interactive plus large VoD library: Ant Media. One stack, both workloads, decouple VoD playback to object storage to keep the bill predictable.

Short-form VoD or UGC inside a non-video product (you just need video to work): Mux. Zero ops, fine to pay the markup, ship it.

Multi-party video conferencing with deeply custom protocol logic and an experienced WebRTC team: Janus. Full control, zero license cost.

Existing Wowza deployment that already works: Wowza. Don't migrate for the sake of it.

For roughly 80% of new streaming projects in 2026 (anything interactive, anything camera-fed, anything with sustained traffic, anything requiring data sovereignty) Ant Media is the answer.

Where each one quietly fails

A buyer's guide that only lists strengths is useless. Here is the failure mode for each.

Ant Media fails when you treat it like a SaaS. It is a media server. You have to run it, monitor it, and patch it (new releases roughly every 2 months). If your team has zero ops capacity and you are doing low-volume short-form video, Mux's higher unit price might be worth it for the operational simplicity. For everyone else, the per-server license savings are too large to ignore.

Wowza fails on operational ergonomics in 2026. The documentation lags behind the product, upgrades are painful, the configuration surface is overwhelming, and the per-instance-per-hour pricing punishes 24/7 use. It is still solid for traditional broadcast, but new builds rarely choose it over Ant Media on technical merit anymore.

Mux fails on bandwidth economics at scale and on live interactive use cases. The moment your viewer-minutes get large, the bill curves up sharply with no architectural lever. It also cannot serve sub-second WebRTC, full stop. If your product needs both Mux and a WebRTC server, you are running two stacks and the simplicity argument evaporates. At which point you should have just run Ant Media.

Janus fails on time-to-market. It is brilliant if you want to write a custom WebRTC application from primitives, and it is the wrong tool if you want to ship a streaming product this quarter. Recording, ABR, mobile SDK parity, and operational tooling are all things you will end up building yourself.

Why Ant Media is the right call for most teams

Bringing it together: most teams evaluating streaming stacks in 2026 are building something interactive, something always-on, or something with cameras. All three of those workloads have the same answer.

Ant Media gives you sub-300ms WebRTC, every other protocol you might need, native cloud storage integration, comprehensive REST API and mobile SDKs, recording included, ABR included, and pricing that does not punish you for being successful. The PAYG model means you can start at $0.24/hour with no commitment and scale up to annual or perpetual licenses when you know your shape. Self-hosted means you keep your data, your customers' data, and your architectural flexibility.

The license alone replaces the spend you would otherwise put on Mux bandwidth, Wowza per-hour fees, or six months of Janus engineering. The 14-day free trial is the lowest-risk way to find out.

If you are evaluating right now, the cleanest path is:

Spin up the Enterprise free trial at antmedia.io. Run it against your actual workload, your real cameras, real encoders, real concurrent viewer count. Compare the bill at the end of the month against what Mux's calculator estimates for the same traffic.

Most teams stop the evaluation there.

Final word

Pick the tool whose pricing model rewards your traffic shape, whose protocol set matches what you actually ingest, and whose operational footprint matches your team's capacity. Mux is right for some niches. Janus is right for some niches. Wowza is right when you already run it.

For nearly everything else, and especially anything where latency, scale, or camera ingest matter, Ant Media is the right call.

About the author

Ankush Banyal is a Solutions Specialist at Ant Media. He works with engineering teams on streaming architecture across live commerce, video surveillance, virtual classrooms, and broadcast infrastructure. Reach out at ankush.banyal@antmedia.io or book a call at calendly.com/antmedia/call.

We're at NAB 2026 — And Here's What We've Been Building for Live Streaming at Scale

Ankush Banyal — Wed, 08 Apr 2026 11:21:05 +0000

We're going to Las Vegas 🎰

NAB Show 2026 is almost here — April 19 to 22 at the Las Vegas Convention Center — and the Ant Media team will be there.

If you are attending and working on anything related to live streaming, low latency video, IP camera infrastructure, broadcast workflows, or real-time applications — come find us at our booth. We would genuinely love to talk shop, no sales pitch required.

But before that, let me share some of what we have been working on and thinking about — because NAB is not just a conference, it is a moment to reflect on where the industry is heading.

The problem nobody talks about enough: latency vs scale

Most streaming solutions make you choose. You either get low latency or you get scale. WebRTC gives you sub-half-second latency but historically has been brutal to scale beyond a few hundred viewers. HLS scales beautifully but 8 to 10 seconds of delay makes it useless for anything interactive — auctions, live sports betting, game shows, real-time monitoring.

The thing we have been obsessing over at Ant Media is collapsing that trade-off.

Here is the architecture pattern that actually works at scale:

Publishers (OBS / hardware encoders / WebRTC)
          ↓
    Origin Cluster (ingest + stream metadata)
          ↓
    Edge Cluster (WebRTC delivery to viewers)
          ↓
    Viewers (sub-500ms latency, thousands concurrent)

The key insight is separating ingest from delivery. Origins handle publishers. Edges handle viewers. They talk to each other via a shared MongoDB cluster for stream metadata and routing. Horizontal scaling becomes trivial — add Edge nodes when viewer count grows, add Origins when publisher count grows. They never compete for the same resources.

On a c5.9xlarge (36 vCPU), a single Edge node handles roughly 800 to 830 concurrent WebRTC viewers at 720p before hitting limits. Scale math becomes predictable.

RTSP ingestion — the unsexy backbone of enterprise video

WebRTC gets all the attention. But a huge chunk of real-world video infrastructure runs on RTSP. IP cameras. VMS systems. Security feeds. Industrial monitoring. Every camera in every warehouse, factory, hospital and data center is almost certainly pushing RTSP streams somewhere.

We have been doing a lot of work on high-volume RTSP ingestion — pulling streams from cameras, transcoding or passthrough routing, and distributing to AI clusters or human viewers downstream.

A pattern we see a lot:

200 x IP Cameras (RTSP, 4K, H264)
          ↓
    Ant Media (ingest + transcode)
          ↓ ↓ ↓
    AI Cluster 1 (4K @ 15fps)
    AI Cluster 2 (1080p @ 15fps)
    AI Cluster 3 (4K @ 1fps for snapshots)

The 1fps output is the one people always underestimate. For computer vision workloads that just need periodic frame analysis rather than full video — dropping to 1fps cuts GPU load to almost nothing compared to full rate output. Small detail, big impact on server count.

SRT is quietly becoming the protocol of choice for contribution

If you work in broadcast or live production, you already know this. SRT (Secure Reliable Transport) has become the go-to for contribution links — getting video from the field into your ingest point reliably over unpredictable networks.

We support SRT ingest natively. One thing that bit us recently in a Kubernetes deployment — the default Helm chart was only exposing RTMP port 1935 through the load balancer. Port 4200 UDP for SRT was missing. If you are deploying Ant Media on AKS or any Kubernetes cluster and wondering why your SRT streams are not reaching the server, check your load balancer config and make sure 4200 UDP is exposed.

# Make sure this is in your service config
- port: 4200
  protocol: UDP
  name: srt

Small thing, but it has caught a few teams out.

Kubernetes deployments — the IP assignment question

Since we are talking about Kubernetes — this is something that comes up every single time someone deploys Ant Media on AKS in a private enterprise network.

The question is always: which components consume VNet IPs vs which ones use overlay IPs?

Here is the short answer for Azure CNI Overlay deployments:

Component	IP Type
Origin Pods	CNI Overlay IP
Edge Pods	CNI Overlay IP
MongoDB Pod	CNI Overlay IP
Ingress Controller	VNet IP
Azure Load Balancer	VNet IP
AKS Nodes	VNet IP

With hostNetwork set to false on your AMS pods, all application pods get Overlay IPs. Only the external-facing entry points consume VNet subnet IPs. This matters a lot in enterprise environments where VNet IP space is limited and carefully managed.

Also — if you are not using WebRTC (pure RTMP/SRT/HLS deployments), disable Coturn entirely. It is not needed and it adds unnecessary complexity to the IP routing.

Come talk to us at NAB

We will be at NAB Show 2026, April 19 to 22 in Las Vegas.

If you are working on:

Live streaming infrastructure at scale
Low latency WebRTC delivery
RTSP camera ingestion and distribution
AKS / cloud-native streaming deployments
Broadcast contribution workflows with SRT
AI video analytics pipelines

...come find us. We are happy to talk architecture, share what we have learned, and hear what you are building.

No forced demos. No sales scripts. Just streaming engineers talking about streaming problems. Which honestly is the best kind of conversation.

See you in Vegas 🎲

Ant Media Server is an open source and enterprise live streaming solution supporting WebRTC, RTMP, HLS, SRT, RTSP and more. antmedia.io

Evaluating Low-Latency Streaming Architectures and Protocol Evolution at NAB 2026

Ankush Banyal — Wed, 01 Apr 2026 10:26:57 +0000

The NAB Show has consistently reflected the direction of the media and streaming industry. In 2026, the focus has moved beyond incremental improvements in delivery toward structural changes in transport protocols, real-time processing, and cloud-native infrastructure.

This technical overview outlines the architectural shifts shaping modern streaming systems.

1. Transport-Layer Optimization: The Rise of MoQ

Historically, streaming optimizations were confined to the application layer. In 2026, the industry is moving down the stack to the transport layer.

Media over QUIC (MoQ)

The most significant development is the transition toward Media over QUIC. By utilizing the QUIC transport protocol, MoQ provides the low-latency benefits of WebRTC with the caching and scalability of HTTP-based delivery.

At NAB 2026, production-ready demonstrations (such as those in the West Hall) are showcasing MoQ achieving ~1s latency without the complex signaling overhead found in traditional WebRTC.

Protocol Comparison for Engineers

Protocol	Latency	Delivery Model	Primary Constraint
WebRTC	< 1s	P2P / SFU	Connection overhead at scale
LL-HLS	2–6s	Segmented	TCP head-of-line blocking
MoQ (QUIC)	~1s	Datagram/Stream	Browser implementation maturity

2. Technical Case Study: Ant Media at NAB 2026

A recurring challenge in streaming architecture is maintaining ultra-low latency while scaling horizontally. Ant Media's presence at NAB 2026 (Booth W3317) serves as a technical case study for addressing this through auto-scaling WebRTC clusters.

Key Technical Demonstrations:

Protocol Interoperability: Side-by-side comparisons of Media over QUIC (MoQ) vs. WebRTC, highlighting the reduction in server-side state management when moving to QUIC-based relays.
Auto-Scaling Infrastructure: Demonstrations of one-click, self-managed live streaming services that utilize Kubernetes to scale WebRTC nodes dynamically across multi-cloud environments (AWS, Azure, Google Cloud).
AI-Driven Workflows: Integration of AI sidecars within the Ant Media Server pipeline for real-time video processing, including automated subtitling via Speech-to-Text and Server-Guided Ad Insertion (SGAI).
Ecosystem Integration: Collaborative workflows with partners like SyncWords (AI captioning), Mobiotics (SGAI/SSAI logic), and Spaceport (Free Viewpoint Video capture), showing how modular plugins are replacing monolithic streaming engines.

3. AI as a Pipeline Primitive

AI is no longer an external post-processing step. In 2026, AI components are integrated as sidecar containers directly within the media pipeline.

Core Implementation Areas:

Neural Transcoding: Using AI models to optimize bitrate-to-quality ratios in real-time.
On-the-Fly Inference: Integrating Speech-to-Text and Translation engines as middle-layer services.
9:16 Auto-Cropping: Real-time AI tools that track players or objects in a broadcast and automatically crop 16:9 feeds for vertical mobile viewing at "true broadcast speed" (minimal induction of delay).

4. Modular and Kubernetes-Native Design

Modern architectures are defined by functional decoupling and container orchestration. The industry is moving toward "studio-in-a-box" solutions that are entirely software-defined.

The Modern Streaming Stack:

Ingest Layer: Securely handling RTMP, SRT, or WebRTC ingest via VPC endpoints.
Management Plane: Decoupled logic for stream routing and session management.
Data Plane: Specialized Kubernetes pods for transcoding and AI sidecars.
Delivery Layer: QUIC-based edge nodes or multi-CDN egress.

Cloud-Agnosticism: Standardizing on Helm charts to ensure the stack runs identically on private bare metal or public clouds.
Security: A shift toward zero-trust networking where media servers have no public IP exposure, utilizing private endpoints for all internal traffic.

5. Backend Monetization (SSAI/SGAI)

Client-side ad insertion is increasingly deprecated due to performance issues and ad-blockers.

Server-Side Ad Insertion (SSAI): The server stitches ads directly into the media segments, providing a seamless stream.
Server-Guided Ad Insertion (SGAI): A hybrid approach where the server provides precise instructions to the client, allowing for local interactivity without the overhead of client-side SDKs.

6. Technical Landscape Summary

For engineers observing the 2026 technical landscape, these areas represent the current frontier:

QUIC Interoperability: Testing how different MoQ implementations behave across browser engines.
Wasm at the Edge: Executing lightweight business logic (watermarking, authentication) at the CDN edge using WebAssembly.
K8s Operators for Video: Developing specialized Kubernetes Operators to manage the lifecycle of media-specific workloads.

Conclusion

Streaming systems are evolving into intelligent, modular ecosystems. The next generation of platforms is defined by the integration of low-latency transport, real-time AI inference, and immutable, cloud-native infrastructure.

streaming #videoengineering #architecture #quic #webrtc #cloudnative #antmedia #nabshow

While Everyone Was Buffering, Ant Media Rewrote the Rules of Live Streaming

Ankush Banyal — Wed, 25 Mar 2026 10:53:36 +0000

In a world where a single second of delay can cost you a viewer, a sale, or a life — one streaming engine decided that "low latency" wasn't low enough.

Picture this: a surgeon in Berlin is guiding a procedure happening in real time in Lagos. A sports bettor in Tokyo is watching a penalty kick that's already been decided by the time the stream reaches him. A classroom of 500 students asks their teacher a question — and waits.
In each of these scenarios, latency isn't just an inconvenience. It's the difference between useful and useless.
This is the world that Ant Media Server was built for.

The Latency Problem Nobody Solved — Until Now
For years, the streaming industry accepted a dirty compromise: either you get quality, or you get speed.
HLS, the backbone of most streaming platforms, delivers a clean picture — but carries an 8 to 10 second delay. For pre-recorded content, that's fine. For anything live and interactive, it's a disaster.
Ant Media Server took a different approach. By building its architecture around WebRTC — the same protocol that powers real-time video calls — it delivers streaming latency under 0.5 seconds. Not "almost real-time." Actual real-time.

"The biggest thing for us with Ant Media Server is the zero latency streaming service and really good support from the team."
— Verified Customer Review

Not Just Fast — Remarkably Flexible
Speed without scale is a party trick. What sets Ant Media apart is that it delivers sub-second latency at any scale — from a single IP camera feed to a global broadcast with hundreds of thousands of concurrent viewers.
The platform supports an extraordinary range of protocols out of the box:

WebRTC — real-time interactive streaming under 0.5 seconds
HLS and LL-HLS — broad compatibility and CDN delivery (8–10 seconds)
RTMP, RTSP, SRT, CMAF, WHIP/WHEP, and Zixi
Adaptive Bitrate (ABR) — automatically matches viewer bandwidth
Full SDK support — iOS, Android, Flutter, React Native, Unity, and JavaScript

Whether you are building for mobile, desktop, or embedded devices — the protocol is never the bottleneck.

The Numbers That Matter
MetricValueWebRTC Latency< 0.5 secondsHLS Latency8–10 secondsCompanies Using It2,000+Countries120+Free Trial14 days

Who Is Actually Using It?
The roster of real-world deployments tells the story better than any benchmark.
The German Red Cross uses Ant Media Server to power live aerial drone feeds during emergency rescue operations.
Mojio, a global leader in connected vehicle technology, relies on it for real-time dashcam streaming across large automotive fleets.
Financial and insurance SaaS platforms use it for eKYC and remote inspection workflows where regulatory compliance and sub-second latency are both non-negotiable.
In education, healthcare, live auctions, sports broadcasting, IP surveillance, and interactive entertainment — the use cases are as diverse as the industries themselves.

Enterprise Power, Without Enterprise Complexity
What truly separates Ant Media from the competition is not just the technology — it's the philosophy.
Deploy on AWS, Azure, Google Cloud, Oracle, or on-premise. Run it in a private cloud, an air-gapped network, or a hybrid setup. Scale horizontally with auto-managed clusters or run a single node for a focused use case. The infrastructure bends to your needs, not the other way around.
One of the most consistent themes across hundreds of verified user reviews is how surprisingly easy Ant Media Server is to set up and operate. Clear documentation, well-designed REST APIs, and a thoughtful onboarding experience mean that teams can go from trial to production without months of integration work.
Security is not an afterthought either. Token-based authentication, stream-level access control, SSL/TLS encryption, IP filtering, and watermarking are all built in — critical for industries handling sensitive content or regulated data.

The Bottom Line
The streaming landscape is crowded. But most solutions were designed for a world where "good enough" latency was acceptable, where scale meant sacrifice, and where flexibility came at the cost of simplicity.
Ant Media Server was designed for a different standard.
If you are building anything where what happens on screen needs to match what is happening in the world — in real time, at scale, on any device — there is now a clear answer to which platform you should be evaluating first.

"Streaming means Ant Media Server. What they are providing is really value for money. For every business use case they have the best plans available."
— Verified Customer Review

Get Started
🚀 Start your free 14-day trial: https://antmedia.io
📖 Quick Start Guide: https://docs.antmedia.io/quick-start/
💬 Have questions? Reach out at contact@antmedia.io

Written by Ankush Banyal, Solutions Specialist at Ant Media

The Internet is Moving Toward Real-Time — Are We Ready?

Ankush Banyal — Wed, 11 Mar 2026 10:50:37 +0000

A few years ago, most of the internet was built around static content.

You loaded a webpage.
You watched a video.
You refreshed to see updates.

Everything worked on a simple principle: request → response → wait.

But the internet is changing.

Today, users expect things to happen instantly.

Live sports with sub-second delay
Interactive classrooms where students ask questions in real time
Multiplayer gaming with voice and video
Live auctions where milliseconds matter
Creator streams where audiences react instantly

This shift is pushing the internet toward something very different:

Real-time infrastructure.

The Latency Problem

Most of the video streaming infrastructure that powers the internet today was designed for scale, not interaction.

Protocols like HLS and DASH were revolutionary when they were introduced. They allowed platforms to distribute video to millions of viewers reliably.

But they come with a trade-off.

Typical latency with HLS is:

8–30 seconds

For watching a movie, that’s perfectly fine.

For interactive experiences, it’s a problem.

Imagine:

answering a question in a live class 20 seconds late
placing a bid after the auction already closed
reacting to a goal in a football match after your friends already celebrated

As digital experiences become more interactive, latency becomes the bottleneck.

Enter WebRTC

WebRTC was originally designed for peer-to-peer communication.

It powers things like:

Google Meet
Discord voice chat
Telemedicine platforms
collaborative tools

But something interesting happened.

Developers realized WebRTC could also be used to build ultra-low-latency streaming systems.

Instead of:

10–30 seconds latency

You can achieve:

<500 milliseconds

That changes what kinds of applications become possible.

The Rise of Real-Time Platforms

We’re starting to see a new category of platforms emerging that rely heavily on real-time video infrastructure.

Some examples include:

Live commerce

Shopping streams where viewers buy products instantly.

Interactive education

Teachers and students engaging in live classes.

Gaming and esports

Real-time gameplay broadcasts with audience interaction.

Telehealth

Doctors consulting patients over video.

Live events

Concerts, conferences, and hybrid experiences.

All of these require something traditional streaming wasn’t built for:

two-way interaction at scale.

The Architecture Challenge

Building real-time video systems is not trivial.

Developers suddenly need to think about things like:

WebRTC signaling
media servers
bandwidth optimization
horizontal scaling
load balancing
real-time transport protocols

A single live event with thousands of viewers can generate enormous traffic.

For example:

5000 viewers × 1.5 Mbps = 7.5 Gbps

Handling that efficiently requires smart architecture decisions.

Many teams end up building clusters of media servers that handle:

ingest
transcoding
distribution
real-time delivery

This is where specialized streaming infrastructure platforms enter the picture.

The Developer Experience Matters

Historically, video infrastructure has been complicated.

Developers often had to deal with:

low-level media pipelines
codec tuning
complicated server deployments

The trend today is toward simplifying real-time media infrastructure.

Developers want:

simple APIs
scalable architectures
flexible deployment options
cloud-native infrastructure

Just like databases evolved from complex setups to easy cloud services, video infrastructure is undergoing the same transformation.

The Next Wave of the Internet

We’re slowly moving toward an internet that feels less like watching content and more like participating in experiences.

Instead of passive consumption, users want:

interaction
presence
immediacy

In many ways, real-time video is becoming the new user interface of the internet.

It’s already happening in:

social platforms
remote work
online education
creator economies

And we’re probably still early.

Final Thoughts

When developers talk about the future of the web, the conversation often revolves around things like:

AI
blockchain
decentralized systems

But another transformation is happening quietly in the background:

the shift toward real-time digital experiences.

The infrastructure we build today will define how people interact online tomorrow.

And increasingly, the expectation is simple:

If it’s live, it should feel instant.

LinkedIn Is Moving Beyond Kafka — And Why Platforms Like Ant Media Server Matter More Than Ever in Real-Time Streaming

Ankush Banyal — Wed, 18 Feb 2026 10:58:16 +0000

When LinkedIn — the original creator of Apache Kafka — starts rethinking its streaming architecture, it naturally grabs attention.

Kafka has powered real-time data pipelines for over a decade. It became the backbone of event-driven systems across finance, e-commerce, social platforms, and analytics. So when LinkedIn evolves beyond it, it’s not drama — it’s progress.

But here’s the part that often gets overlooked.

There’s a big difference between real-time data streaming and real-time media streaming.

And that’s where platforms like Ant Media Server quietly play a very different — and very critical — role.

Real-Time Data vs. Real-Time Media

Kafka (and similar systems) are built for event streaming:

Logs

Messages

Notifications

Clickstream data

Backend service communication

Latency here usually means milliseconds to seconds. That’s great for analytics and system coordination.

But when we talk about live sports, auctions, betting, live commerce, virtual classrooms, or interactive events — “real-time” means something completely different.

It means:

Sub-second glass-to-glass latency

Stable video delivery

Adaptive bitrate

Scaling to thousands (or millions) of viewers

Handling unpredictable network conditions

Keeping audio/video perfectly in sync

That’s not a data problem.
That’s a media infrastructure problem.

Where Ant Media Server Fits In

This is where Ant Media Server comes in.

While Kafka moves structured data between systems, Ant Media Server is built specifically for ultra-low latency audio and video delivery using WebRTC and LL-HLS.

For example:

WebRTC delivery with ~0.5 second latency

Adaptive bitrate streaming (ABR)

Horizontal scaling via clustering

Cloud or on-prem deployment

Support for large-scale concurrent viewers

In many modern architectures, you’ll actually see both working together:

Kafka (or another data pipeline) handles:

Bidding events

Chat messages

Notifications

User actions

Ant Media Server handles:

The actual live video stream

Real-time interaction

Viewer delivery at scale

Different layers of the stack. Same real-time ambition.

Why This Matters

As companies push for more immersive, interactive experiences, the definition of “real-time” keeps getting stricter.

It’s no longer enough for data to move quickly.
Users expect video and audio to feel instant.

Whether it’s a live auction where milliseconds impact bids, a sports broadcast where fans can’t tolerate delay, or a virtual classroom where interaction must feel natural — media latency becomes the business differentiator.

That’s where specialized real-time media servers become essential.

The Bigger Picture

LinkedIn evolving beyond Kafka doesn’t mean Kafka failed. It means scale and requirements evolve.

The same applies to media streaming.

As use cases become more interactive and latency-sensitive, companies increasingly look beyond traditional CDN-only models and adopt WebRTC-based infrastructure platforms like Ant Media Server to achieve true low-latency delivery.

Real-time isn’t one technology.
It’s a layered architecture.

And as the stack evolves, both data pipelines and real-time media platforms have their place.

The future of streaming won’t be built on one tool.
It will be built on the right combination of tools — working together.

Designing Video Architecture That Scales With Your Product (Not Against It)

Ankush Banyal — Wed, 18 Feb 2026 10:57:10 +0000

If you’re building a modern app with video, chances are your requirements didn’t stop at “just a video call.”

It usually starts simple:

One-to-one video calls
Then evolves into:

Live streaming

Audience interaction

Real-time gifts, reactions, overlays

That’s when architecture choices start to matter — a lot.

This article walks through how teams typically handle private video calls and interactive live streaming in the same product, what works well in practice, and where things usually break.

Two Video Use Cases That Look Similar — But Aren’t

At a glance, these both involve video:

Private one-to-one calls

One-to-many live broadcasts with interaction

Under the hood, they behave completely differently in terms of:

Bandwidth

Latency

Scaling

Infrastructure cost

Trying to force one solution to handle both almost always leads to compromises.

One-to-One Video Calls: P2P Still Wins

For private calls, the goals are clear:

Lowest possible latency

Direct communication

Minimal backend involvement

The Practical Setup (Still Valid in 2025)

WebRTC peer-to-peer for audio/video

Backend only for signaling, auth, and discovery

STUN + TURN (coturn) for NAT/firewall reliability

This setup has aged well because it does exactly what it should:

Media flows directly when possible

Falls back gracefully when networks get messy

Keeps infrastructure costs predictable

For 1:1 calls, routing media through your backend is usually unnecessary overhead.

Why P2P Doesn’t Scale for Live Streaming

Live streaming changes everything.

If one broadcaster has:

50 viewers

100 viewers

500 viewers

Pure P2P means the broadcaster uploads that many streams.

On mobile, that’s a hard no:

Battery drain

Upload limits

Dropped frames

Crashes under load

This is where many early-stage apps hit their first real wall.

SFU: The Missing Middle Layer

To scale live video properly, you need a Selective Forwarding Unit (SFU).

The idea is simple:

Broadcaster uploads one stream

SFU forwards it efficiently to viewers

Latency stays low

The broadcaster’s device survives

This model is why SFUs power most real-time live platforms today.

Gifts, Reactions, and Why Latency Matters

Live gifts only feel meaningful if:

The broadcaster reacts instantly

Viewers see reactions in sync

Latency stays very low

This is where traditional RTMP → HLS pipelines struggle:

15–30 seconds of delay kills interaction

Gifts feel disconnected from reality

That’s why many teams combine:

WebRTC (via SFU) for interactive viewers

HLS / LL-HLS for large, passive audiences

It’s not either/or — it’s choosing the right tool per audience size.

Running 1:1 Calls and Live Rooms in the Same App

This is a common concern, and yes — it works well if you keep boundaries clear.

What Can Be Shared

Authentication

User identity

Payments and gifting logic

Chat, reactions, UI components

What Should Stay Separate

Media routing paths

Scaling logic

Session lifecycle handling

Trying to reuse the exact same media flow for everything usually leads to tight coupling and painful refactors later.

Where Platforms Like Ant Media Fit In

When teams don’t want to build and maintain all of this from scratch, they often look for solutions that already support multiple streaming models.

For example, platforms like Ant Media Server are commonly used in setups where:

WebRTC P2P is needed for private calls

WebRTC SFU is needed for interactive live streams

HLS or LL-HLS is needed for scale

Mobile clients are first-class citizens

The value isn’t just protocol support — it’s having one backend that can handle different video paths cleanly, depending on the use case.

Whether you build yourself or use an existing platform, the architecture principles stay the same.

Common Mistakes Teams Regret Later

Some patterns show up again and again:

Forcing P2P to handle live broadcasts

Adding gifts on top of high-latency streams

Ignoring TURN usage until production bills arrive

Testing only on good Wi-Fi

Over-optimizing for massive scale too early

Most of these come from trying to simplify too much.

If I Were Starting Fresh Today

I’d design with intent from day one:

WebRTC P2P for private calls

WebRTC SFU for live, interactive streams

HLS / LL-HLS only when scale demands it

Gifts and reactions built as real-time events

Clear separation between call logic and broadcast logic

It’s not the smallest setup — but it’s one that grows without fighting you.

Final Thought

Video isn’t hard because of codecs or APIs.

It’s hard because:

Latency shapes user behavior

Mobile networks are unpredictable

Different use cases need different paths

Get the architecture right early, and everything else — features, scale, monetization — becomes much easier.

Hopefully this saves someone a painful rewrite down the road.