DEV Community

Cover image for Improve live streaming experience with stream mixing

Posted on • Updated on • Originally published at


Improve live streaming experience with stream mixing

Stream mixing is a technology that combines multiple audio or video streams into one stream on the cloud. It is widely used in live streaming, online education, Live Audio Room, and other scenarios.

Developers can view the screens and hear the voice of all members in a room by playing the mixed stream and do not need to manage each stream in the room.

Benefits of stream mixing

Stream mixing can be widely applied to audio and video fields because of the following benefits to developers.

1. Low costs

In large-scale live streaming or online education scenarios, if multiple parties co-host in a room, all users in the room need to play multiple streams.

With the stream mixing technology, only one stream needs to be played.

That is, in the case of two-party co-hosting, the costs can be reduced by half. In the case of multi-party co-hosting, the costs can be reduced by (n - 1)/n.

If review mechanisms, such as obscene content moderation, are used, the costs can also be reduced by (n - 1)/n because only images of one stream need to be reviewed.

2. Simple code logic

When multiple hosts are cohosting and stream mixing is used, instead of playing and rendering multiple streams, the audience only needs to play the mixed stream and render it.

3. Easy forwarding between different platforms

Without stream mixing, we cannot forward live streaming to Facebook, YouTube, and other live streaming platforms in multi-party co-hosting scenarios because these platforms only have one RTMP address and we cannot forward multiple streams to one address.

4. Multi-party co-hosting supported by web browsers on mobile clients

For iPhone users, Safari does not support simultaneous playing of multiple audio files. When multiple streams are played, only one stream can be played. Stream mixing can solve this problem.

Due to the limit of the mobile phone performance and browser performance, most web browsers on mobile phones can play up to four streams generally.

With the stream mixing technology, the maximum number of streams that can be played is increased dramatically without extra bandwidth or performance consumption.

What is stream mixing

As shown in the following figure, when multiple users in a room publish streams, the server combines the two streams into one stream based on layout configuration. The audience plays the mixed stream to view the screens of user A and user B.

Stream mixing implementation

1. Stream mixing process

  1. The server listens for stream changes in a room.
  2. The host on a client publishes a stream.
  3. When the server detects the first new stream, it starts stream mixing.
  4. Co-hosting starts to publish streams.
  5. The server detects stream addition and updates the stream mixing layout configuration.
  6. Co-hosting stops.
  7. The server detects stream reduction and updates the stream mixing layout configuration.
  8. The room is disbanded, and the stream mixing task stops.

2. Client logic

The client does not need to manage the stream mixing logic.

It only needs to determine whether to play the original stream or the mixed stream based on whether it needs to publish streams, as shown in the following figure.

client logic

3. Server logic

The server needs to listen for stream changes in the room and updates the stream mixing configuration when a stream is added or reduced.

According to requirements, the layout configuration varies with the number of streams. When the number of streams is 0, stream mixing needs to be stopped.

server logic

4. Layout configuration

ZEGOCLOUD provides a layout configuration API. Developers only need to set the position and size of each stream. The following examples show some sample codes.

In the following examples, the resolution of the video screen layout is set as 360 × 640.

Layout 1: two views side by side


Layout 2: four views tiled vertically


Layout 3: one large view tiled with two small views suspended


The layer level of the input stream is determined by the position of the input stream in the input stream list. The further back the order in the list, the higher the layer level.

As shown in the code below, the layer of input stream 2 and input stream 3 is higher than that of input stream 1, so streams 2 and 3 hover over the screen of input stream 1.

For a detailed layout description, see ZEGOCLOUD Document.

Sign up with ZEGOCLOUD, get 10,000 minutes free every month.

Did you know? 👏

Like and Follow is the biggest encouragement to me
Follow me to learn more technical knowledge
Thank you for reading :)

Learn more

This is one of the live technical articles. Welcome to other articles:

Top comments (0)