DEV Community: MaikiDev

Client side audio transcription using Parakeet v3 and WebGPU

MaikiDev — Thu, 16 Apr 2026 10:08:58 +0000

Processing audio files into text usually requires sending personal data to an external server. That approach always bothered me because of the privacy implications and the recurring API costs. As browser technologies advanced over the last few years, I started looking into ways to handle speech recognition locally without relying on external servers at all.

OpenAI released Whisper a while ago and it quickly became the standard for open source transcription. Developers did incredible work porting it to run in the browser using WebAssembly and WebGPU. I initially tried building my project around Whisper. The accuracy is great, but the hardware demands are very high.

Running Whisper locally in a browser tab often causes the entire page to freeze or lag. It heavily requires a dedicated GPU to perform at a reasonable speed. If you try to run a medium Whisper model on a standard laptop CPU, the transcription process can easily take much longer than the audio itself. This makes for a frustrating user experience when someone just wants to transcribe a ten minute meeting.

I started searching for lighter alternatives and discovered NVIDIA Parakeet v3. It is a highly optimized acoustic model designed specifically for speed and efficiency. To get it working within a web environment, I implemented a library called parakeet.js. This setup changed the performance dynamic of my project entirely.
The most noticeable difference between Parakeet and Whisper in the browser is the raw execution speed. Parakeet processes audio files significantly faster. Because the model architecture is vastly more efficient, it does not rely exclusively on heavy WebGPU compute pipelines. It actually runs with very decent speed on a standard CPU.

This is a massive benefit for web development. Most people browsing the web do not have a dedicated graphics card. Being able to transcribe an hour of audio on a basic office laptop using just the processor makes local machine learning much more accessible to the average person.

The efficiency of parakeet.js also extends to mobile devices. Running Whisper on a phone browser usually crashes the tab immediately due to strict memory limits imposed by mobile operating systems. Parakeet has a much smaller memory footprint. I tested it on several recent mobile phones and the models load and run successfully. You can record a voice memo on your phone and transcribe it directly in your mobile browser without uploading anything to a cloud provider.

I put this technology into a web application called Transcrisper. The goal was to make a simple interface where anyone can drop an audio or video file and get text back. The entire pipeline executes locally. Your media file never leaves your hard drive. No server uploads and no backend databases are storing your private conversations.

I implemented speaker diarization so the output identifies exactly when different people are talking in the audio track. This feature is usually locked behind expensive subscription tiers on commercial platforms. The application also generates standard text files and SRT files for video subtitles. Since the heavy lifting happens on the user device, I do not have to pay for server compute time. This means I can offer the tool completely for free with no artificial limits on file size or length.

Managing browser memory is still the main challenge when building client side tools. The browser has to download the model weights on the first visit. I used the Cache API to store these files locally on the hard drive. Subsequent visits load the model directly from the browser cache, which makes the application ready to use instantly without downloading megabytes of data again.

You also have to be careful with garbage collection in JavaScript when passing large audio buffers around. I spent a lot of time optimizing how the audio chunks are fed into the model so the tab does not run out of memory on long podcast episodes.

Moving machine learning to the client side solves major privacy concerns and eliminates expensive server costs. I think we will see many more applications adopt this local first approach as browser standards improve. You can try it out here. I am very interested to hear how it performs on different hardware setups, especially older CPUs and mobile devices. Let me know your thoughts in the comments.

JSON is Costing You Money: Enter TOON - the Format Built for LLMs

MaikiDev — Wed, 19 Nov 2025 08:09:55 +0000

If you are building with Large Language Models (LLMs), you are essentially a logistics manager. Your job is to ship information from your database to an AI’s brain and back again, as efficiently as possible.

For years, we’ve defaulted to JSON because it is the lingua franca of the web. But have you ever looked at a 50-item JSON list inside a prompt window? It’s a sea of repetitive keys, curly braces, and wasted tokens.

In the world of GenAI, context is currency. Every token you waste on syntax is a token you can't use for reasoning, history, or creativity.

Enter TOON (Token-Oriented Object Notation).

It’s not just "another standard". It is a purpose-built syntax designed to fix the specific headaches of communicating with AI. Today, let's pop the hood, look at some complex examples, and see how TOON stacks up against the heavyweights: JSON, YAML, and CSV.

What is TOON?

TOON is a data format designed to be human-readable, machine-parsable, and token-cheap.

The philosophy is simple: "Sparse Trees, Dense Lists".

Most data formats are either Trees (JSON, YAML, XML) or Tables (CSV, SQL). Real-world data is usually a mix of both. You have metadata (Tree) and lists of items (Table).

JSON forces tables to act like trees (repeating keys for every row).
CSV forces trees to act like tables (flattening everything awkwardly).
TOON lets trees be trees and tables be tables.

The Showdown: TOON vs. The World

Let’s look at a realistic scenario: An E-commerce Order Receipt.
We have an Order ID, a Customer (nested object), and a list of Items (array).

1. The JSON Way

JSON is explicit, but it charges you a tax for that clarity. Notice how many times we have to write "product_id", "name", and "price".

{
  "order_id": "ORD-123",
  "date": "2025-11-19",
  "customer": {
    "id": 99,
    "email": "jane@dev.to"
  },
  "items": [
    {
      "product_id": "A1",
      "name": "Wireless Mouse",
      "qty": 1,
      "price": 25.00
    },
    {
      "product_id": "B2",
      "name": "Mechanical Keyboard",
      "qty": 1,
      "price": 120.00
    },
    {
      "product_id": "C3",
      "name": "USB-C Hub",
      "qty": 2,
      "price": 40.00
    }
  ]
}

Problem: As the list grows, the token count explodes linearly with the schema names.

2. The YAML Way

YAML removes the brackets, which helps. But for lists, it’s still repetitive. You still have to define the keys for every item, just with dashes instead of braces.

order_id: ORD-123
date: 2025-11-19
customer:
  id: 99
  email: jane@dev.to
items:
  - product_id: A1
    name: Wireless Mouse
    qty: 1
    price: 25.00
  - product_id: B2
    name: Mechanical Keyboard
    ...

Problem: Still verbose for lists. Also, LLMs notoriously struggle with deep indentation levels in YAML, sometimes losing track of which parent a property belongs to.

3. The TOON Way

Here is that same order in TOON.

order_id: ORD-123
date: 2025-11-19
customer:
  id: 99
  email: jane@dev.to

items[3]{product_id, name, qty, price}:
A1, Wireless Mouse, 1, 25.00
B2, Mechanical Keyboard, 1, 120.00
C3, USB-C Hub, 2, 40.00

Look at the difference.

Top-level props: Look like clean YAML.
The Array (items): Instantly switches to a CSV-style table.
The Header ([3]{...}): Tells the LLM exactly what is coming: "I am sending 3 items, and here is the map to read them."

We just saved the model from reading the word "product_id" three times. If this were a list of 50 items, we would have saved it 49 times. That is massive token efficiency.

Why Not Just Use CSV?

You might ask, "If the list part is just CSV, why not use CSV?"

CSV is flat. It fails the moment you need metadata. If you wanted to send that Order Receipt in CSV, you'd have to duplicate the order_id on every single row:

order_id, date, product_id, name
ORD-123, 2025-11-19, A1, Mouse
ORD-123, 2025-11-19, B2, Keyboard

That is data redundancy—another waste of tokens. TOON gives you the hierarchy of JSON with the density of CSV.

The Advantages

1. Token Efficiency

Benchmarks show TOON uses ~30-50% fewer tokens than standard JSON for mixed-structure data. If you are processing massive logs or long context windows (RAG), that savings pays for itself instantly.

2. Higher Accuracy

This might sound counter-intuitive—isn't JSON the "native language" of code?
Actually, LLMs are pattern matchers. When an LLM reads TOON, it sees a clear, predictable pattern (Headers -> Data). Benchmarks indicate that models like GPT-4 and Claude actually have higher retrieval accuracy (74% vs 70%) when reading TOON compared to JSON. The clutter of JSON brackets can sometimes confuse the model's attention mechanism in deep context windows.

3. Streaming Friendly

Because TOON is line-based, it is exceptionally easy to stream to a frontend. You don't have to wait for a closing } or ] to know a row is finished.

How to Integrate TOON into Your Workflow

You don't need to rewrite your entire backend. TOON is best used as the communication layer between your code and the LLM.

Step 1: The Input

When sending data to the LLM (RAG contexts, few-shot examples), convert your JSON to TOON before inserting it into the prompt.

import { encode } from "@toon-format/toon";

// Compress your data before the LLM sees it
const context = encode(largeLogData, {
  indent: 2,
  delimiter: ',',
  keyFolding: 'off',
  flattenDepth: Infinity
});
const prompt = `Analyze the following logs: \n\n ${context}`;

Step 2: The Output

If you want the LLM to reply in TOON, just show it an example. You don't need a complex system prompt.

User: "Analyze the following logs and return the errors in TOON format."

Model:
errors[3]{code,message,severity}:
500,Database connection failed,High
404,User avatar not found,Low
503,Service unavailable,Critical

Step 3: Parsing

Use the library to turn it back into objects your code can use.

import { decode } from "@toon-format/toon";
const data = decode(llmResponse, {
  indent: 2,
  strict: true,
  expandPaths: 'off'
});

Want to try it right now?

You don't need to install the CLI just to see how it looks. There is a free tool available where you can paste your current JSON blobs and see them magically shrink into TOON. It’s a great way to visualize how much token "fat" you can trim from your prompts.

Final Thoughts

JSON isn't going anywhere—it’s the backbone of the web. But for the specific, high-cost, high-complexity world of Generative AI, it’s showing its age.

TOON treats data the way LLMs treat text: as a structured, flowing stream of information, not a rigid tree of brackets. Give it a try in your next prompt; your token budget (and your sanity) will thank you.