Hadi Karaki

Posted on Jun 2

Why I Built a Local-First Media Suite to Fight Subscription Fatigue (And How It Works Under the Hood)

#showdev #react #ai #javascript

We've all been there. You need to quickly compress a video asset before pushing to production, or strip a background from a mockup. You search for an online tool, upload your file, wait in a server queue, and then get hit with a "Please create an account and pay $12/month" wall. Worse yet, if you are handling sensitive client data, you just handed it over to a random third-party server.

I got tired of the "everything-is-a-SaaS" model. I wanted utilities that were fast, entirely private, and lived on my machine.

So, I built Formatif— a native desktop media processing suite split into two local-first applications: Formatif Pro (59 classic media utilities) and Formatif AI (local neural processing).

Here is a deep dive into how I brought heavy media pipelines and machine learning models to the desktop using Electron, Node.js, and ONNX Runtime, without blocking the UI.

The Core Desktop Architecture

As a web developer, my native habitat is the JavaScript/TypeScript ecosystem. To build this across platforms, I used Electron, React, and Tailwind CSS.

The fundamental rule of the desktop architecture is a strict separation of concerns. The React frontend handles user interaction and batch states, while the Electron main process acts as an offline job broker, passing heavy operations onto isolated native processes.

The Flow: React UI (File Paths & Presets) ➔ Preload IPC Bridge ➔ Electron Main Process (Controlled Batch Queue) ➔ Native Binaries & ONNX Inference Engines.

Deep Dive: Inside Formatif Pro (Classic Utilities)

Formatif Pro wraps battle-tested media binaries—FFmpeg, FFprobe, ImageMagick, and Sharp—behind Electron’s Inter-Process Communication (IPC) layer.

1. Safe Process Orchestration

Instead of firing twenty massive encodes at once and freezing the user's computer, Formatif treats batches as a controlled sequential queue.

To prevent shell injection and quoting bugs, the main process never executes giant command strings. Instead, it builds clean argument arrays and executes binaries using execFile-style child processes:

// A simplified look at the native binary runner
function runBinaryJob({ taskId, binary, args, duration, send }) {
  return new Promise((resolve, reject) => {
    const child = execFile(binary, args, { windowsHide: true });
    activeProcesses.set(taskId, child);

    // Watch stdout/stderr for real-time lifecycle tracking
    child.once("exit", () => activeProcesses.delete(taskId));
    child.once("error", reject);
    child.once("close", code => {
      code === 0 ? resolve({ ok: true }) : reject(new Error("Job failed"));
    });
  });
}

2. Giving the UI Real Progress (The Stderr Trick)

If you've ever built an FFmpeg wrapper, you know it reports its execution progress through stderr, not stdout.

To give users a real progress bar instead of a fake loading spinner, the app parses the incoming data stream on the fly:

child.stderr?.on("data", chunk => {
  const seconds = parseFfmpegTime(chunk.toString()); // Looks for "time=00:01:23.45"

  if (seconds && duration > 0) {
    send("progress", {
      taskId,
      progress: Math.min(
        100,
        Math.round((seconds / duration) * 100)
      ),
    });
  }
});

If a user hits Cancel, the main process pulls the task ID from an activeProcesses map and cleanly fires:

child.kill("SIGTERM");

This immediately stops the FFmpeg process and prevents orphaned processes from continuing to consume CPU resources. It also allows the application to clean up any temporary partial files that may have been created during processing.

Deep Dive: Inside Formatif AI (Local Neural Processing)

Formatif AI handles complex neural tasks like image upscaling, background removal, and detail restoration. Instead of routing files to an expensive cloud API, it runs models locally via ONNX Runtime.

1. Centralized Session Caching

Instantiating an ONNX machine learning session is incredibly expensive because the runtime must compile the model graph and allocate hardware memory.

To prevent a massive startup delay on every image in a batch, Formatif AI implements a centralized session cache:

import onnxruntime as ort

_sessions = {}

def get_session(task_type, model_path, providers):
    cache_key = (task_type, tuple(str(p) for p in providers))

    if cache_key not in _sessions:
        # Cache the session so subsequent batch items execute instantly
        _sessions[cache_key] = ort.InferenceSession(
            model_path,
            providers=providers
        )

    return _sessions[cache_key]

The first image pays the initialization cost, but every subsequent image can reuse the already-loaded model session. This dramatically improves throughput when processing large batches and prevents unnecessary GPU memory reallocations.

2. Hardware Fallbacks & Dynamic Precision

Not every user has a high-end dedicated GPU.

On startup, the engine dynamically queries the system hardware to build an optimized execution provider fallback chain. It prioritizes native accelerators such as DirectML (Windows), CoreML (macOS), or CUDA before gracefully falling back to the CPU when necessary.

def provider_chain(preferred):
    available = ort.get_available_providers()

    candidates = {
        "cuda": "CUDAExecutionProvider",
        "coreml": "CoreMLExecutionProvider",
        "dml": "DmlExecutionProvider"
    }

    selected = candidates.get(preferred)

    if selected in available:
        return [selected, "CPUExecutionProvider"]

    return ["CPUExecutionProvider"]

This allows the same application to run efficiently across a wide range of hardware configurations, from high-end workstations with dedicated GPUs to entry-level machines that rely entirely on CPU inference.

To further improve stability, the engine inspects model metadata at runtime and dynamically switches between FP16 and FP32 precision modes. FP16 reduces memory consumption and can significantly increase inference speed on supported hardware, while FP32 provides maximum compatibility on systems that lack native half-precision acceleration.

This adaptive approach helps prevent out-of-memory errors while ensuring that users get the best performance their hardware can safely deliver.

3. Real-World Engineering: Tiled Inference

A model might work perfectly in a demo on a tiny 512px image, but real users throw 4K photos and massive screenshots at it.

Running a large image through an upscaling model all at once can quickly exhaust available memory and trigger an out-of-memory (OOM) crash on a typical laptop.

To solve this, Formatif AI utilizes a tiled inference strategy. Large images are divided into overlapping regions, each tile is processed independently by the local model, and the results are blended back together to produce a seamless final image.

This approach dramatically reduces peak memory usage while allowing high-resolution images to be processed reliably on consumer hardware.

Lessons Learned as a Developer

Building a local-first application fundamentally changed the way I think about resource optimization.

When you're writing software for the cloud, it's often possible to solve performance issues by provisioning more RAM or scaling infrastructure. Desktop applications don't have that luxury. You're a guest on the user's machine.

That means respecting CPU usage, handling file-locking edge cases, managing memory carefully, and cleaning up every process you spawn. Small inefficiencies that might go unnoticed on a server can become very noticeable when they're consuming a user's personal resources.

Ironically, I've become one of the product's biggest users.

Whenever I need to compress UI videos, optimize SVG assets, remove image backgrounds, or upscale screenshots before publishing updates, I end up using Formatif first. It has completely replaced the collection of web-based tools that used to be part of my daily workflow.

Over to You

Building desktop applications with web technologies is more practical today than ever before, especially when computationally intensive work is properly isolated from the UI thread.

If you're interested in exploring a local-first approach to media processing, you can try Formatif for free at formatif.net — no registration or credit card required.

What are your thoughts on the local-first movement?

Have you experimented with ONNX Runtime, Electron, or local AI models in desktop applications? I'd love to hear about your experiences and discuss them in the comments.

Top comments (2)

Echo • Jun 2

Useful framing on the "everything-is-a-SaaS" problem. Three things I'd add from running a similar split (Tauri/Rust + TypeScript) in production:

1) The "Electron main process as offline job broker" pattern is right, but the trap is the IPC contract. The moment you let React send "do X with these options" free-form, the native binary side becomes a string-parser forever. Worth pinning the IPC contract to a typed schema (zod / protobuf / a single typed DTO) before the second feature ships. The Formatif article shows the cleanest version of this — pre-load bridge plus typed queues.

2) Local-first ML with ONNX is a real win for privacy, but the cold-start cost is the silent killer. First run after install can take 30+ seconds. The mitigation isn't a "loading spinner" — it's a one-time model fetch on first launch with a clear "this happens once" UI, plus a warm path on subsequent runs (keep the process alive or warm-up on idle). The article's "without blocking the UI" framing is the right priority; the cost is still real, just deferred.

3) The "no account, no upload" promise is the actual product. Worth surfacing in the README first, not at the bottom. The dev audience that finds this is allergic to "we'll keep your data private" claims that hide the upload step on page 2.

Curious how Formatif handles the cross-device sync problem — once you go local-first, the next user question is "how do I get my presets onto the new laptop".

Hadi Karaki • Jun 2

Thanks for the thoughtful insights! It's awesome to connect with someone who has run a similar native split in production.

To answer your points and your question:

Spot on about the IPC contract trap. Formatif enforces strict type-safety across the bridge precisely to prevent it from devolving into a string-parsing mess.
Totally agree on cold-starts. I handle this by keeping the engine process entirely persistent and warm rather than spinning it up per task. The model lifecycle control is definitely a delicate balance.
Great advice on moving the privacy promise to the absolute top of the README.

As for cross-device sync: Both apps are intentionally designed to be completely lightweight and stateless. Users just drop their media in, adjust their configuration on the fly, and process. Because we don't save states or profiles locally, there's actually nothing to sync or lose when you move to a new laptop, it's just ready to work out of the box anywhere.

Appreciate the feedback!