DEV Community: Leo

Connecting streamtasks over the Internet

Leo — Wed, 21 Aug 2024 12:26:46 +0000

Streamtasks is build on a robust and flexible networking layer.

The networking layer, unlike the classical IP network, does not distinguish between communication inside a process, between processes or between different machines. Everything is handled over a uniform network.

This enables us to connect different computers running instances of streamtasks, to distribute data or workloads.

One interesting application is the connection of two instances, which can not directly connect to each other over the IP protocol.

This is when we need an intermediate server running in the cloud.

One way to solve this problem, is by running a tunneling server like ngrok. We can also solve this problem with streamtasks directly.

This demo will show, how you can run such an intermediate server and connect your instances over websockets. This will allow you to benefit from the software solutions present for websockets, like standardized encryption and nginx.

Security

To securely send data over an internet connection we need to use encryption.

We have great standards for this and don't need to reinvent the wheel.

On the cloud server, you can install streamtasks and run:

streamtasks --serve ws://127.0.0.1:9000 -C

This will run streamtasks and accept websocket connections received on 127.0.0.1:9000.

In order to have encryption, we will use nginx as a proxy and use a TLS certificate to secure our connections.

Our config could look something like this:

server {
    listen 443 ssl;
    server_name yourdomain.com;

    # SSL Configuration
    ssl_certificate /etc/nginx/ssl/yourdomain.com.crt;
    ssl_certificate_key /etc/nginx/ssl/yourdomain.com.key;

    location / {
        proxy_pass http://localhost:9000;

        # Proxy headers for WebSocket
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;

        # Other proxy headers (optional)
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

# Redirect HTTP to HTTPS
server {
    listen 80;
    server_name yourdomain.com;

    location / {
        return 301 https://$host$request_uri;
    }
}

Authentication

Right now there is a rather primitive authentication method integrated into streamtasks. You can specify an authentication token in the URL that, that will be verified by the server. In order for this to work, you must specify this token in the URL for both the client and server.

On the server side:

streamtasks --serve ws://127.0.0.1:9000?auth=1234 -C

This will make sure only clients with the correct credentials are allowed to join.

Connecting

In order to connect to an instance, you must specify the connect URL when running the instance.

In this case we do:

streamtasks --connect wss://yourdomain.com?auth=1234

Try streamtasks!
GitHub: https://github.com/leopf/streamtasks
Documentation/Homepage: https://streamtasks.3-klicks.de
X: https://x.com/leopfff

Podcast Production with streamtasks

Leo — Wed, 14 Aug 2024 09:33:11 +0000

Podcasts are easy to produce. Let's make it even easier!

Usually a podcast has multiple people speaking. Each with their own microphone and their own camera. To compose the podcast, most of the time, it will be cut based on who is speaking.

In this example we will assume, that this podcast has two participants, a camera and a microphone for each, all plugged into one computer.

Our goal is to automate the switching between participants then store the video for each participant and the composed video separately.

We start off with two audio inputs. We mix these inputs using an audio mixer and then encode each audio stream (the stream for P1, P2 and for P1+P2).

Next we store our encoded audio streams using an output container (as a MP4).

That is the audio side done. Now lets add some video. Like with the audio inputs we create video inputs and encode them as h264 video.

To store the video for each participant we add a video track to our output container and connect our encoded video to them. Lets start by just adding video tracks to the individual output files.

Now lets do the exciting part. Lets switch between the video feeds and save that to our p12.mp4 file.

We start by adding a video track to our composed output container and add a media switch to our deployment. We connect each video feed to one of the inputs on the media switch. We connect the output of the switch to the video track input of our output container.

Right now the switch does nothing. To actually switch between video feeds we will use the volume of our microphones to set the control signal of the media switch. The media switch expects a number input on its control inputs and will switch to the input which has the highest value on its corresponding control input. So it will switch to "input 1" if "control 1" has the highest control value.

To extract the audio volume from our microphones we use the audio volume meter task. It outputs a number representing the volume of the audio input. The volume output is then connected to the corresponding control inputs. "control 1" is for P1, which means we use the audio volume of the P1 audio data as the control signal for the video of P1.

Now our automatic switching works, but it is a little slow. Optimally you want to switch just before someone starts speaking. We can't look in the future, but we can look in the past.

There are multiple ways of doing this. One way is to delay the video feeds by 1 second before we switch between them. Another way is to move our control signals into the future. We would actually just change the timestamp of the control messages, but the synchronizer in the media switch will then synchronize the streams, causing the video streams to be delayed by one second. This approach is better since we don't change the timestamps of the video, which we later use in our output container. If we changed the video time, we would need to change it back before putting it into the output container, otherwise the video will not be synchronized with the audio.

This is the config of the timestamp updaters, moving the control messages 1000 ms in the future.

Our final deployment looks like this.

Just like that! Withing minutes you can automate your podcasting setup.

Try streamtasks!
GitHub: https://github.com/leopf/streamtasks
Documentation/Homepage: https://streamtasks.3-klicks.de
X: https://x.com/leopfff

AI Integration with streamtasks

Leo — Thu, 08 Aug 2024 05:29:48 +0000

Streamtasks empowers you to configure real-time data pipelines with ease. By leveraging these capabilities, you can build your video production pipelines in software, seamlessly integrating tools like AI that would otherwise be hard to integrate.

With Streamtasks, the possibilities for incorporating AI into your workflows are vast and varied. I'll showcase some of the ways to harness the power of AI with streamtasks.

The AI tasks are not integrated into the MSI and Flatpak installers, since they would use up to much space and make the installers explode in size. To use the AI tasks, you must install streamtasks manually with pip, as described in the Documentation.

pip install streamtasks[media,inference]

Improving Speech

Two tasks are available to enhance the quality of your audio data: Spectral Mask Enhancement and Waveform Speech Enhancement. Both tasks remove background noise from the audio you supply to them, improving its overall quality.

Spectral Mask Enhancement uses spectral mask enhancement techniques to boost speech clarity. It effectively removes background noise and produces consistent results, but may occasionally introduce additional noise.
Waveform Speech Enhancement is skilled at eliminating background noise, but sometimes truncates the speech signal.

These tasks only accept mono channel, floating point audio data at 16 kHz.️ The SpeechBrain library is used to make speech enhancement work. If you would like to use different models or explore how it works, I recommend visiting the SpeechBrain Huggingface.

In the following example we use an audio input, like a microphone, resample its audio data to mono 16 kHz and apply speech enhancement, before we output it using an audio output.

Warning: do not do this if there is a feedback loop between your audio output and input (between your speakers and your microphone).

In addition we have an audio switch and a radio button UI in front of our audio output, allowing us to switch between the different enhancement methods and test what the different options sound like.

Live Transcription

In addition to speech enhancement SpeechBrain also provides us with great ways of integrating automatic speech recognition (ASR).

We can use the Asr Speech Recognition task to transform our audio data into a text transcription. Like the speech enhancement tasks, it only accepts mono channel floating point audio data at 16 kHz.

In this example, we take data from an audio input, resample it, transcribe it, and then display the resulting transcription using a text display.️

To boost the performance of our speech recognition system, we enhance the quality of speech first. We then employ a switch to toggle through various enhancement methods and determine which one yields the best results.️

I create a dashboard and then arrange the radio buttons and text display in a way that makes it easy to test. In my case, it looks something like this:️

Text to Speech

We can also utilize speech brain's models for text-to-speech functionality.

Let's give it a try!

In this example, I'm using a text input - a text field with a send button - and connecting it to the Tts Fast Speech 2 task. This task will then generate audio data, which we'll output through an audio output.️

LLaMA.cpp

We can utilize LLaMA.cpp (GGUF) models alongside SpeechBrain models. The Llama.cpp Chat task processes text data, generating text outputs as a chat assistant. User messages serve as input, while the assistant's responses are outputted. To test this functionality, we'll employ the Text Input and Text Display tasks as input and output.️

After sending the text "Hello!" to our Llama.cpp Chat task, we get this output:

It works!

LLaMA.cpp + ASR + TTS

Lets now try building something more complicated and put it all together to make a speaking and listening LLaMA chat bot.

In order to create a proper message for our chat bot we have to concatenate the fragments of text the ASR task outputs. We use a String Concatenator task to do this. It receives our text on "input" and a "control" signal. When the control signal goes from low to high (from something smaller than 0.5 to something greater than 0.5) it will output the concatenated text and clear its buffer. We use a Message Detector task and a Calculator task to create this control signal. The Message Detector is configured to have a timeout of 2 seconds, which means that after not receiving any message for 2 seconds it will output 0 and while receiving text it will output 1. By using the calculator we can invert this signal and produce the desired control signal, which goes from 0 to 1 after not receiving a message for 2 seconds. We then use the output of the String Concatenator as the input for our LLaMA chat bot. The chat bot output is transformed into speech using the TTS task, which is then played using an audio output.
In addition, we have 3 text displays. One live display which appends all incoming text fragments and displays them, one for the actual chat bot input, and one for the output of our chat bot.

We then create a dashboard to arrange our text displays.

We've finally achieved our goal of creating a conversational AI - a chat bot that allows us to engage in discussions with it.️

The Future

I'm planning to implement more advanced models into streamtasks, including Whisper and Bark, as well as image generating and segmentation models that are currently missing. These new models will expand the possibilities with the system.

To further integrate AI-generated data with multimedia content, I plan to add support for subtitles to the Output Container task, allowing users to include live transcriptions from ASR tasks in their live streams and video files.

Portability is a priority, as machine learning frameworks are often too large to be included in prebuilt installers. However, emerging frameworks like tinygrad offer smaller sizes that can be reasonably included, making it easier for less technical users to install. I'm planning to eventually switch all inference tasks to one framework, which will further simplify the installation process.️

Try streamtasks!
GitHub: https://github.com/leopf/streamtasks
Documentation: https://streamtasks.3-klicks.de
X: https://x.com/leopfff

Video Production with Streamtasks

Leo — Wed, 07 Aug 2024 01:35:14 +0000

Since the inception of video production, we have faced a significant challenge: processing vast amounts of data. Initially, computers were too slow, leading to the emergence of an industry focused on building specialized hardware boxes for various tasks such as encoding, decoding, transmitting, receiving, switching, and displaying video. These boxes became indispensable, making video production an expensive endeavor.

Today, however, we have powerful general-purpose hardware, such as GPUs and CPUs, capable of handling the video and audio data we need to process. But the industry is still behind on this. Why do video encoder boxes still exist when GPUs can perform these tasks?

So what is the problem?

Integration and Software. We have little bits and pieces of software that can do some things, but not others. For simple recording tasks, tools like OBS suffice. For more complex switching, tools like vMix are used. All those tools have one thing in common: They are not as flexible as plugging in some wires wherever you want. You are stuck with what is handed to you.

The Solution

We make boxes and wires in software. Instead of buying encoder boxes, you encode your video with little programs and connect these little programs with virtual wires. This it the idea of streamtasks. You can even change what codec you want, what resolution, what pixel formats, whatever this little piece of software („Task“) allows you to change. To get a better understanding of how the software works I recommend watching the overview video.

Streamtasks allows you to connect topics, which are displayed as colored circles, with eachother. During operation, the task will process the input data in realtime. For example, a video encoder will encode the raw video it receives and then output the encoded video on the output topic.

What you can do with streamtasks depends on two things:

the tasks available
your imagination

While I cannot improve your imagination, I can make tasks and simplify the process of creating new ones.
Right now, there are over 50 different tasks integrated into streamtasks. These range from very simple to more complex and powerful tasks, many of which are documented here.

The general idea is to give the user as much control as possible. If a task can be split into two simpler ones, it is usually a good idea.

If you can write software, you can write your own tasks. The simplest task can be written in 4 lines of code (even less, if you really try). If you are interested in making your own tasks, I refer you to the documentation: https://leopf.github.io/streamtasks/custom-task.html

AI

One of the greatest advancements of our time is AI, which has not yet been widely integrated into video production. Streamtasks provides a strong framework for integrating AI and other kinds of software into your production pipelines. You can implement AI into you workflows today and build things like:

Live Audio Transcription
Live Face Detection
Face Tracking
Text To Speech

You can watch this Demo of a chatbot with TTS to get an Idea of what is possible.

Distributing Data and Workloads

While our computers are really fast today, there is a limit to their performance. In order to build larger production pipelines that process more data, you will have to distribute data and workload across multiple machines.

This is possible with streamtasks today.

You can have mutliple instances of streamtasks running on different machines and connect them by either using the connection manager UI, or specifying a connect/serve url on the command line.

# the main instance (started with -C) accepting tcp connections on 9002 
streamtasks -C --serve tcp://0.0.0.0:9002
# a sub instance connecting to the main system (in this case running on the same machine)
streamtasks --connect tcp://127.0.0.1:9002

Thats it. Now you can create your pipelines with all tasks available on either machine. The only apparent difference between tasks running different machines is a dotted line connecting their topics.

Each task available on both machines will now be displayed twice in the task selection. Once for each node.

As a result you can now start tasks on different machines and connect them seemlessly.

In this example we have two different audio inputs (like microphones) one running on node/machine "demo1" and the other running on "demo2". This audio is mixed and then played on an audio output on "demo1". Be cautious! The audio data from the audio input on demo2 is being sent over TCP. If you send raw video over a TCP connection you will likely not get the result you wish for. 1080p 30fps RGB24 video is 1492Mb/s of data. If you do not have enough bandwidth, it won't work.

Distributing Workloads

By having tasks run on different machines, you can distribute heavy workloads, like encoding video. This can be done while still extracting useful information like the amount of activity in a video (how many and how much pixels have changed), before the video is encoded.

In the following example we have two video inputs (like cameras). One on each node. The video activity is measured, in order to switch between the video feeds, and the video is encoded on each node. The encoded video data from demo2 is then sent over to demo1, where we switch between the video feeds based on their activity. The most active video is then displayed in a video viewer.

This allows us to use one machine as an encoder for each video feed. When using GPU Hardware encoders, each machine can usually encode much more than just one video feed. As we scale up, we will want to distribute the load across machines.

Distributing Data

Local internet connections are often very limited. Even if you are lucky to get 1Gb/s upload speed, it still does not compare to the kind of speed you can archive from a server running in a datacenter.

When streaming to multiple streaming services at the same time, you might want to first send your video to a server in a datacenter and from there distribute it to multiple streaming services.

In the following examples we have a video input running on our computer. The video data is encoded and then sent to 3 different output containers. These output containers will then each publish the video feed to a streaming service as a rtmp stream (see the live streaming demo).

The streamtasks network is smart enough to only send the video feed once to our server running in the datacenter, despite there being three tasks receiving the data. As a result we are only using a third of the upload bandwidth on our local machine, while streaming to three different services.

Building Dashboards

The more complex your production pipelines are, the more oversight you need in order to monitor them. Streamtasks allows you to create dashboards that consist of windows, each window is the "display" of a task. This can be the video from a video viewer, the text of a text display, the switch of a switch UI.

The following example is a kill switch. The timestamp updater moves the time of the video feed 5s into the future. When synchronized with the other video feeds this causes the actual feed to be pushed back 5s in time. That way you can preview the video in the live view and then switch the output to a static image 5s before it is displayed.

The video viewers each display a video, which can be arranged in the dashboard. The switch UI, used to turn on/off the video, displays a control element.

More Examples

Since you are free to combine any tasks as you wish, you can built awesome things with it. Here are some examples:

You can implement video switching by having multiple video streams and using a switch to switch between them. All you need to actually switch is a way of controlling the switch like radio buttons, a little UI that displays some options and lets you select one. The switch will always switches to the video feed that has the largest number as its control signal.
Reducing the volume of ambient sound (like stadium microphones) when someone is speaking.
Switching the video of podcasts by always switching to the loudest speaker
Playing sounds
Creating chatbots that speak

If you are interested in seeing more working examples, you can watch them on Youtube.

Try streamtasks!
GitHub: https://github.com/leopf/streamtasks
Documentation: https://streamtasks.3-klicks.de
X: https://x.com/leopfff