Om Prakash

Posted on May 27 • Originally published at pixelapi.dev

I'll write this blog post now.

#python #ai #showdev #webdev

Cleaning Up Dirty Audio with a REST API: A Developer's Workflow

I record screencasts for documentation. My desk is next to a window, and anyone who has tried to record audio in a home office knows what that means: HVAC hum, keyboard clatter, occasional traffic rumble bleeding through the glass. For the longest time, my post-processing workflow was a mess — I'd export from my recording software, open a separate audio editor, apply a noise profile manually, re-export, and then bring the clean file back into my video editor.

It worked. It also took 15 minutes per video and required me to not forget any of the steps.

A few weeks ago I finally got fed up and looked for a way to automate it. I landed on PixelAPI's Audio Denoise endpoint and wired it into my processing script. It took about an hour from zero to working pipeline, and now the whole thing runs automatically when I drop a file into a folder.

Here's how I did it and what the code looks like.

The Setup

My recording pipeline is a Python script that already handled file organization and thumbnail generation. Adding audio denoising was a matter of adding one more step before the video got assembled.

PixelAPI exposes audio denoising as a plain REST endpoint — you POST a file, you get back a cleaned file. No SDK to install, no configuration wizard, no account dashboard you have to click through to find an API key. You sign up, grab a key, and you're making requests in minutes. They give you free credits to start without requiring a card, which meant I could test it against real recordings before committing to anything.

The Code

Here's the relevant chunk of my pipeline script:

import httpx
import os

PIXELAPI_KEY = os.environ["PIXELAPI_KEY"]

def denoise_audio(input_path: str, output_path: str) -> None:
    url = "https://pixelapi.dev/api/v1/audio/denoise"

    with open(input_path, "rb") as f:
        response = httpx.post(
            url,
            headers={"Authorization": f"Bearer {PIXELAPI_KEY}"},
            files={"audio": (os.path.basename(input_path), f, "audio/wav")},
            timeout=30.0,
        )

    response.raise_for_status()

    with open(output_path, "wb") as out:
        out.write(response.content)

    print(f"Denoised: {input_path} -> {output_path}")

That's the whole thing. I call this function in my pipeline right after the raw recording lands on disk, before anything else touches the file.

I'm using httpx here because I was already using it elsewhere in the project, but requests works fine too — it's a standard multipart POST.

Latency in Practice

The thing that surprised me most was the round-trip time. The docs say sub-3 second latency, and in my experience with typical screencast audio (5–15 minute WAV files) I've been seeing responses in the 1.5–2.5 second range. For an automated pipeline that's running in the background while I go do something else, that's completely fine.

If you're building something user-facing — say, a voice memo app or a podcast recording tool where the user is waiting on the result — that latency is fast enough to feel responsive without needing to architect around async polling. I briefly considered making the denoising step async anyway, but honestly for files under a few minutes the synchronous approach is fine and keeps the code simpler.

Where I Actually Use This

A few scenarios where this has saved me real time:

Screencasts and tutorials. My original use case. The HVAC hum is completely gone. I still try to record in reasonably quiet conditions, but I no longer lose takes because a truck rumbled by outside.

Interview recordings. I do occasional developer interviews for a side project. Remote audio quality varies wildly — some guests are on good setups, some are on laptop mics in live-in kitchens. Running the audio through denoising before editing means I spend less time in the editor compensating for bad source quality.

Podcast episode drafts. Before sending a rough cut to a guest for approval, running it through denoising removes distracting background noise without me having to do it manually per-clip.

Integrating It Into a Watch Script

Here's the folder-watcher version I actually run day-to-day. When a new WAV lands in my raw/ directory, it gets processed automatically:

import time
from pathlib import Path
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

class RecordingHandler(FileSystemEventHandler):
    def on_created(self, event):
        if event.is_directory:
            return
        path = Path(event.src_path)
        if path.suffix.lower() not in (".wav", ".mp3", ".m4a"):
            return

        output = path.parent.parent / "clean" / path.name
        print(f"Processing {path.name}...")
        denoise_audio(str(path), str(output))

if __name__ == "__main__":
    observer = Observer()
    observer.schedule(RecordingHandler(), path="./raw", recursive=False)
    observer.start()
    print("Watching ./raw for new recordings...")
    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()

This runs as a background process. I record, save, and by the time I've opened my video editor the clean version is already sitting in ./clean/ waiting for me.

A Few Practical Notes

File formats. WAV, MP3, and M4A all work. I've mostly used WAV because my recording software exports that by default, but I tested M4A and it works fine.

Error handling. Add proper error handling before putting this anywhere near production. I keep the raw file around until I've verified the denoised version looks right — response.raise_for_status() catches HTTP errors, but you should also handle timeouts and log failures somewhere meaningful.

Batch processing. If you're processing a backlog, just loop over your files. The API handles it without issue; just be sensible about concurrency if you're parallelizing.

The predictable pricing was the other thing that sold me on this for a pipeline context. With usage-based APIs you occasionally get surprised by a bill because something ran more than expected. Having predictable costs per request makes it easy to reason about what the automation will actually cost as volume grows.

If you're dealing with any kind of audio in your app or workflow — recording tools, podcast platforms, transcription pipelines, voice interfaces — denoising at the API level rather than shipping DSP code yourself is a significant complexity reduction. The REST interface means you can add it in any language, any stack, in an afternoon.

DEV Community