Lalit Mishra

Posted on Feb 5

The Sidecar Pattern: Scaling Mediasoup with Python and Node.js

#python #mediasoup #webrtc #fastapi

The Death of the Monolith

For years, the WebRTC landscape was dominated by monolithic media servers. Tools like Kurento, Janus, and Jitsi Videobridge operated as standalone black boxes. You downloaded a binary, wrote a config file, started the process, and then spoke to it via a proprietary REST or WebSocket API.

While effective, these monoliths imposed a rigid architecture. They dictated your deployment strategy, your scaling model, and often your signaling logic.

Mediasoup represents a fundamental shift in this paradigm. It is not a server; it is a library. It does not listen on port 80 or 443 by default. It does not have a config file. It is designed to be imported directly into a Node.js (or Rust) application, giving the developer granular, programmatic control over every single media track.

Under the hood, Mediasoup spawns C++ subprocesses (Workers) that handle the heavy lifting of packet routing, encryption (DTLS-SRTP), and congestion control (GCC). It binds these C++ workers to V8 (JavaScript) via a highly efficient inter-process communication (IPC) layer.

For better understanding, follow the videos on the YouTube Channel- The Lalit Official
Subscribe this channel and share the videos with your friends.

Here is a meme for guys has good humour:

The Python Engineer's Dilemma

For a team running a mature Python backend (Django, FastAPI, Quart), Mediasoup poses a problem. It is strictly a Node.js or Rust ecosystem tool. There are no official Python bindings.

This leaves you with two bad options and one great one:

The Rewrite: Rewrite your entire backend in Node.js. (Costly, risky).
The Hack: Try to write Python C++ bindings for the Mediasoup core. (Maintenance nightmare).
The Sidecar: Treat the Mediasoup Node.js script as a specialized microservice—a "Media Sidecar"—that runs alongside your Python application.

Research and production experience overwhelmingly favor Option 3. In this architecture, Python remains the System of Record. It handles authentication, database state, room management, billing, and the WebSocket signaling with the client. The Node.js service becomes a "dumb muscle" layer, responsible solely for shuffling RTP packets.

Architecting the Sidecar

The Sidecar Pattern decouples Signaling (Python) from Media (Node.js).

1. The Service Boundary

Your infrastructure consists of two distinct services, likely deployed as separate containers in the same Kubernetes Pod or ECS Task:

Signaling Service (Python/Quart): Manages the WebSocket connection to the browser. It knows who is in the room and what permissions they have.
Media Service (Node.js/Mediasoup): runs the Mediasoup workers. It exposes an internal control API (e.g., gRPC or HTTP/JSON) that is not accessible to the public internet.

2. The Internal Control Protocol

Since the Node.js service is stateless (it only cares about active routers), the communication between Python and Node.js must be fast and strictly typed.

gRPC is the ideal candidate here. You define a .proto file:

service MediaController {
  rpc CreateRouter (RouterRequest) returns (RouterResponse);
  rpc CreateWebRtcTransport (TransportRequest) returns (TransportResponse);
  rpc Produce (ProduceRequest) returns (ProduceResponse);
}

When a user joins a room, Python calls MediaController.CreateRouter(). Node.js spins up a Mediasoup router and returns the RTP Capabilities. Python forwards these to the client. The client never talks to the Node.js API directly; it only sends media (UDP/TCP) to the ports opened by Mediasoup.

The Protocol Dance: How It Works

Let's trace the lifecycle of a user publishing a video stream in this polyglot architecture.

Step 1: Capabilities Handshake

Client (JS): Connects to Python via WebSocket. Sends getRouterRtpCapabilities.
Python: Forwards request to Node.js via gRPC.
Node.js: Calls mediasoupRouter.rtpCapabilities. Returns JSON to Python.
Python: Forwards JSON to Client.

Step 2: Transport Creation

Client: Sends createWebRtcTransport.
Python: Authenticates user. Calls CreateWebRtcTransport on Node.js.
Node.js: Calls router.createWebRtcTransport(). This allocates a UDP port and generates DTLS parameters (fingerprints).
Node.js: Returns the id, iceParameters, iceCandidates, and dtlsParameters to Python.
Python: Stores the transport_id in Redis (mapping UserID -> TransportID) and sends parameters to Client.

Step 3: The Connection

Client: Calls transport.connect({ dtlsParameters }).
Python: Receives DTLS params. Sends ConnectTransport command to Node.js.
Node.js: Calls transport.connect(). The DTLS handshake completes over the UDP link.

Step 4: Producing Media

Client: Calls transport.produce({ kind: 'video', rtpParameters }).
Python: Validates permissions (e.g., "Is this user muted?"). Calls Produce on Node.js.
Node.js: Calls transport.produce(). Returns a producer_id.
Python: Broadcasts "User X started video" to all other participants.

Notice that Python never touches an RTP packet. It only manages the intent of the media.

Advanced Capabilities: Simulcast and SVC

One of Mediasoup's "killer features" is its robust support for Simulcast and Scalable Video Coding (SVC). In a monolithic MCU (Multipoint Control Unit), the server transcodes video for every user, burning massive CPU.

Mediasoup is an SFU (Selective Forwarding Unit). It doesn't transcode. Instead, it relies on the client to send multiple versions of the stream.

Simulcast: The browser sends three distinct streams: High (1080p), Medium (720p), and Low (360p).
SVC: The browser sends one stream split into temporal layers (e.g., base layer at 15fps, enhancement layer adds 15fps to reach 30fps).

In the Sidecar pattern, Python is responsible for the policy.
If User A is on a 3G network, Python (monitoring stats) can instruct the Node.js sidecar: "For Consumer A, force the 'Low' spatial layer."
The Node.js service executes: consumer.setPreferredLayers({ spatialLayer: 0 }).
Mediasoup then intelligently drops the packets for the higher layers, saving bandwidth without CPU-intensive transcoding. This logic is much harder to implement in older servers like Janus, but in Mediasoup, it is a native primitive.

Vertical Scaling: The PipeTransport

A single Node.js process runs on a single CPU core. Mediasoup uses a "Worker" architecture where one Worker = One CPU Core. A Router belongs to a Worker.

The Limit: If you put all users in one Router, you are limited to the capacity of one CPU core (roughly 500 consumers).

The Solution: PipeTransport.
Mediasoup allows you to pipe streams between Routers, even if those Routers are on different Workers (cores) or different servers.

To scale a large room (e.g., 2,000 listeners) on a single machine:

Python creates Router A on Worker 1 and Router B on Worker 2.
Python commands Node.js to create a PipeTransport connecting A and B.
Producer User connects to Router A.
Consumer Users connect to Router B.
Mediasoup pipes the Producer's stream from Core 1 to Core 2 via shared memory (zero copy).

This allows the Python backend to treat a multicore server as a massive cluster, balancing users across cores while maintaining a single logical "Room" state.

Deployment: The Docker Compose Pattern

In production, the Sidecar pattern maps 1:1 with Docker containers.

docker-compose.yml:

services:
  backend-signaling:
    image: my-python-quart-app
    environment:
      - MEDIA_SERVICE_URL=http://media-service:3000
    ports:
      - "443:443"

  media-service:
    image: my-node-mediasoup-app
    # Host networking is often required for WebRTC UDP port ranges
    network_mode: host
    environment:
      - LISTEN_IP=0.0.0.0
      - ANNOUNCED_IP=1.2.3.4

Health Checks & Failure Recovery:
Because the services are decoupled, failure handling becomes robust.

If Python dies: The WebSocket disconnects. The client attempts to reconnect. The media streams (handled by Node.js) might actually stay alive for a few seconds, allowing for a "glitch-free" reload if Python recovers instantly.
If Node.js dies: Python detects the gRPC failure. It invalidates the room state and sends a reconnect signal to all clients, forcing them to restart the negotiation flow on a new (healthy) worker.

Conclusion: The Best of Both Worlds

The Sidecar Pattern is not a workaround; it is an architectural upgrade. By keeping Mediasoup in its native Node.js environment, you benefit from the library's rapid release cycle and performance optimizations without fighting the friction of foreign function interfaces. By keeping your orchestration in Python, you retain the development velocity, rich ecosystem, and maintainability of your existing backend.

You stop building "A Mediasoup App" and start building "A Python Application (that happens to have world-class media powers)."

At the end! I would again request to subscribe my YouTube Channel for better understanding- The Lalit Official
Subscribe this channel and share the videos with your friends.

DEV Community