Daniel Moretti V.

Posted on Sep 22

The Joy of the Unknown: Exploring Audio Streams with Rust and Circular Buffers

I recently found myself diving into the fascinating world of Rust for a challenging project (I come from a web development background): developing a cross-platform microphone and desktop audio recorder. Little did I know that this journey would lead me to rediscover the thrill of working with low-level APIs and data structures. Let me take you through this exhilarating ride!

The Challenge

The task seemed simple enough: create an audio recorder that could capture both microphone input and desktop audio output simultaneously across Windows, Linux, and macOS. However, the complexity quickly became apparent when I realized that Rust doesn't have a high-level API for handling multi-system audio capture, resampling, and stream merging in real-time.

Instead of taking the easy route with a pre-built solution, I decided to embrace the challenge and build it from scratch using Rust and Tauri for the frontend. This decision set me on a path of discovery and innovation.

Capturing Desktop Audio Output Across Platforms

Capturing desktop audio output proved to be a unique challenge on each operating system:

Windows: The solution was straightforward. Windows allows the creation of a direct input device for system desktop audio, making it relatively simple to capture the output.
Linux: The approach was more intricate. I wrote target_os specific code in Rust to create two recording streams, which can be detected by PipeWire. By patching accordingly, I redirected the audio and sound nodes of the respective devices and monitors to these recording streams.
macOS: This was the most challenging platform. I ended up writing custom OS code using foreign function invocations to interact with CoreAudio APIs. By programmatically creating a multi-output device and adding the default output to it, I was able to capture the desktop audio.

The Virtual Rubber Duck Approach

As I navigated this complex terrain, I found myself needing a way to organize my thoughts and troubleshoot problems effectively. This is where the concept of rubber duck debugging came into play. Traditionally, this involves explaining your code line-by-line to an inanimate object (like a rubber duck) to find bugs or clarify logic.

Armed with Neovim and Copilot Chat (shoutout to the amazing CopilotChat.nvim plugin!), I embarked on a journey of collaborative problem-solving. Copilot Chat became my virtual rubber duck, allowing me to bounce ideas back and forth in a ping-pong conversation style.

The Triple Circular Buffer Solution

After much deliberation and experimentation, I landed on an elegant solution: a triple circular buffer structure. This approach allowed me to handle synchronization, resampling, and output simultaneously.

Understanding the Triple Circular Buffer

The key challenge was aligning audio streams that arrive with different amounts of data at different times. Since each incoming data packet might not be enough to resample effectively, I needed to accumulate enough samples to produce a consistent resampled data chunk. Additionally, after resampling, the outputted chunk is often smaller than the original, which can lead to audio distortions if not managed properly.

To address this, I used three separate buffers:

Input Buffer (ring_input): Accumulates microphone input data.
Output Buffer (ring_output): Accumulates desktop audio output data.
Resampler Buffer (ring_resampler): Holds data that needs to be resampled to match the target sample rate.

By aligning the buffers based on time rather than the number of samples, I ensured that when the buffers drain, they are synchronized, regardless of the sample sizes at that moment.

Here's a peek at the code that sets up these buffers:

pub fn with_input_resampler<T, U>(
    &mut self,
    input_device: cpal::Device,
    output_device: cpal::Device,
    target_rate: usize,
    origin_rate: usize,
) -> Result<(), anyhow::Error>
where
    U: CustomSample + 'static,
    T: CustomSample + 'static,
{
    // ... (setup code omitted for brevity)

    let buffer_size = 8192; // Example buffer size
    let ring_output = HeapRb::<f32>::new(buffer_size);
    let ring_input = HeapRb::<f32>::new(buffer_size);
    let ring_resampler = HeapRb::<f32>::new(buffer_size);

    let (mut producer_input, mut consumer_input) = ring_input.split();
    let (mut producer_output, mut consumer_output) = ring_output.split();
    let (mut producer_resampler, mut consumer_resampler) = ring_resampler.split();

    // ... (rest of the implementation)
}

This snippet showcases the creation of three circular buffers (HeapRb) for handling input, output, and resampling. The beauty of this approach lies in its ability to manage different sample rates and ensure real-time alignment of the audio streams.

Handling Synchronization and Alignment

Synchronization between the microphone input and desktop audio output was critical. I managed this by carefully controlling the buffer sizes. When the buffers drain, they do so in alignment, independent of the number of samples at that particular moment. This ensures that the combined audio stream remains coherent and free of distortions.

The Resampling Challenge

One of the most intriguing aspects of this project was dealing with resampling. Working with audio streams of different sample rates means we need to create more or fewer samples to align them. This process can easily lead to misalignment if not handled carefully.

Choosing `FftFixedIn` for Resampling

Since the system is working in real-time, I needed a resampling function that supports synchronous behavior. I chose FftFixedIn from the rubato crate, which provides efficient, high-quality resampling suitable for real-time applications.

Here's how I tackled the resampling challenge:

let mut resampler = FftFixedIn::<f32>::new(
    origin_rate,  // Original sample rate
    target_rate,  // Target sample rate
    1024,         // Chunk size
    2,            // Number of channels
    1,            // Number of threads
)?;
let mut resampler_output_buffer = resampler.output_buffer_allocate(true);
let mut next_input_frames = resampler.input_frames_next();

while consumer_resampler.occupied_len() >= next_input_frames {
    let mut data_buffer = vec![0.0; next_input_frames];
    consumer_resampler.pop_slice(&mut data_buffer);

    resampler.process_into_buffer(&[&data_buffer], &mut resampler_output_buffer, None)?;
    next_input_frames = resampler.input_frames_next();
    let output_data = &resampler_output_buffer[0];
    producer_input.push_slice(output_data);
}

This code snippet demonstrates how I used the FftFixedIn resampler to process audio data in real-time, ensuring that the resampled output aligns perfectly with the other audio stream. By processing chunks of data and pushing the resampled output back into the appropriate buffer, I maintained synchronization across different sample rates.

Performance Considerations

Real-time audio processing is performance-sensitive. To ensure the application runs efficiently and with low latency, I implemented several performance optimizations:

Multithreading: Each main function and stream operates on its own thread. This includes separate threads for the microphone input, system audio output, merging process, and resampling. By isolating these functions, I prevented bottlenecks and ensured smooth, real-time processing.
Thread Synchronization: I used Arc<Mutex> constructs to control the recording state across threads. This allowed safe sharing of state without compromising performance.
Efficient Buffer Management: Choosing appropriate buffer sizes and managing them effectively was crucial for maintaining synchronization and minimizing latency.

Integration with Tauri Frontend

Connecting the Rust backend with the Tauri frontend was an interesting aspect of the project. The integration occurs through a Recorder struct at the level of Tauri commands. Each command locks the Recorder struct using an Arc<Mutex>, firing the process and turning on the recording signal. After the recording starts, we drop the lock on the Recorder so it can be used in other commands, such as stopping the recording.

This approach allows seamless interaction between the frontend and backend, providing a responsive user interface while handling complex audio processing tasks in Rust.

Error Handling and Robustness

In real-time audio processing, robustness is key. I implemented error handling strategies to ensure the application remains stable:

Error Logging: Errors are logged for later analysis, but the streams continue working whenever possible. This prevents minor issues from disrupting the entire audio recording.
Immediate Patches: Thanks to the update mechanisms of Tauri v2, I can deploy patches immediately upon identifying errors, enhancing the application's reliability.

The Joy of Discovery

Throughout this project, I found myself constantly amazed by the elegant solutions that emerged from grappling with low-level concepts. The process of working with circular buffers, managing audio streams, and implementing real-time resampling was both challenging and incredibly rewarding.

It reminded me of why I fell in love with programming in the first place: the thrill of solving complex problems and the satisfaction of seeing your code come to life.

Conclusion

This journey into the world of audio processing with Rust has been an eye-opening experience. It's reminded me of the importance of stepping out of our comfort zones and embracing the unknown. While it's often tempting to reach for high-level abstractions and familiar tools, there's an indescribable joy in rolling up your sleeves and diving into the nitty-gritty details.

So, my fellow developers, I encourage you to seek out projects that challenge you, that make you uncomfortable, and that force you to learn new concepts. It's in these moments of struggle and discovery that we truly grow as programmers and rediscover the passion that drew us to this field in the first place.

Remember, the next time you're faced with a daunting task, don't shy away from it. Embrace the challenge, let your curiosity guide you, and who knows? You might just find yourself falling in love with programming all over again.

Happy coding, and may your buffers always be circular and your streams perfectly aligned!

P.S. If you're interested in innovative data solutions, check out the positions available we have at Mappa.ai. Join us on our mission on mapping and building the most amazing teams possible!

DEV Community

The Joy of the Unknown: Exploring Audio Streams with Rust and Circular Buffers

The Challenge

Capturing Desktop Audio Output Across Platforms

The Virtual Rubber Duck Approach

The Triple Circular Buffer Solution

Understanding the Triple Circular Buffer

Handling Synchronization and Alignment

The Resampling Challenge

Choosing `FftFixedIn` for Resampling

Performance Considerations

Integration with Tauri Frontend

Error Handling and Robustness

The Joy of Discovery

Conclusion

Top comments (0)

Read next

Looking for a Volunteer Android Developer for a Lucid Dreaming App Project

New React Toolkit (LaRose js)

Supercharge Your Android App: 6 Powerful Performance-Boosting Techniques

Kratom for Coders: Natural Focus and Energy Booster for Debugging and Learning JavaScript

The Challenge

Capturing Desktop Audio Output Across Platforms

The Virtual Rubber Duck Approach

The Triple Circular Buffer Solution

Understanding the Triple Circular Buffer

Handling Synchronization and Alignment

The Resampling Challenge

Choosing FftFixedIn for Resampling

Performance Considerations

Integration with Tauri Frontend

Error Handling and Robustness

The Joy of Discovery

Conclusion

Read next

Looking for a Volunteer Android Developer for a Lucid Dreaming App Project

New React Toolkit (LaRose js)

Supercharge Your Android App: 6 Powerful Performance-Boosting Techniques

Kratom for Coders: Natural Focus and Energy Booster for Debugging and Learning JavaScript

Choosing `FftFixedIn` for Resampling