DEV Community: ZeroGPU

💰Don’t Waste Tokens on Data Entry: Tag Customer Reviews Overnight with ZeroGPU Batch API

ZeroGPU — Tue, 16 Jun 2026 16:06:44 +0000

A new cookbook to demonstrate our new Batch API.

Most teams sit on massive backlogs of unstructured text—customer reviews, support tickets, and survey responses. They want to classify it, but doing it one synchronous API call at a time is painfully slow and wildly expensive.

Worse yet, they use over-engineered frontier models for the job. Tagging a review with a sentiment label and a few topics isn't a reasoning problem. It’s repeatable, high-volume work. Using a massive LLM for this is like hiring a rocket scientist to sort mail. You're bleeding budget.

ZeroGPU was built to solve exactly this. With our new Batch API, you hand ZeroGPU a single file of requests and get the results back within a completion window—at a fraction of the cost of synchronous calls.

Our new cookbook walks you through a complete, production-ready example of how to automate this overnight.

What it does

Starting from a raw reviews.csv, ZeroGPU returns a fully categorized tagged.csv with sentiment labels and key topics for every single row.

It runs as a single asynchronous job powered by LFM2.5-1.2B-Instruct—a small, lightning-fast model perfectly tuned for short-form text classification. Thousands of rows get tagged while you sleep, without hitting rate limits or draining your wallet.

How it works in 5 steps:

Prepare your raw CSV.
Build a JSONL file with one request per row.
Upload the file to ZeroGPU.
Create the batch.
Poll and download the results.

💡 Smart Error Handling & Merging: Every result is automatically keyed back to its source row by a custom_id, ensuring the output merges flawlessly back into your original database, no matter what order the API processes them. If a few rows fail? They’re isolated into a separate list so you can retry just those specific rows—no need to re-run the entire dataset.

Because our endpoint is OpenAI-compatible, swapping your current workflow takes minutes. Best of all, the entire guide runs end-to-end in Google Colab with zero local setup required.

The ZeroGPU Philosophy

Run the right model on the right compute. Save the frontier models for true reasoning, and let specialized, efficient small models handle the heavy lifting.

🚀 Run the cookbook in Colab: docs.zerogpu.ai/cookbook/batch-review-tagging
🌐 Learn more about ZeroGPU: zerogpu.ai

🔐 Sanitize a CSV of Customer Feedback with the ZeroGPU Router Plugin

ZeroGPU — Mon, 01 Jun 2026 14:14:30 +0000

Documentation Index

Fetch the complete documentation index at: https://docs.zerogpu.ai/llms.txt
Use this file to discover all available pages before exploring further.

🔐 Sanitize a CSV of Customer Feedback with the ZeroGPU Router Plugin

This notebook demonstrates how to use the zerogpu-router plugin so that Claude Code can scrub personal data out of a raw CSV export, all from a single natural-language prompt. You hand Claude a feedback_export.csv whose free-text column is full of customer names, emails, and phone numbers, and you get back two files: a clean copy that is safe to share, and a PII audit log of exactly what was removed and where. By combining Claude Code's plugin system and ZeroGPU's PII-aware nano models, this notebook walks you through a practical pattern where Claude orchestrates the file work while ZeroGPU does the high-volume, well-defined redaction, so raw PII never has to live in your transcript.

For the full reference, see the Claude Code plugin integration guide.

In this notebook, you'll explore:

Claude Code: Anthropic's agentic coding tool that runs Claude directly in your terminal, with file editing, command execution, and a plugin system that extends sessions with custom slash commands and skills. Here it reads the CSV, loops over every row, and assembles the output files while routing the redaction work to ZeroGPU.
ZeroGPU: An ultra-fast, compute-efficient inference provider for apps and agents. We run purpose-built small and nano language models across an edge-powered network for the high-volume, purpose-specific tasks your app or agent runs constantly. Plug in our OpenAI-compatible API and you're live - zero GPU infrastructure, serverless, auto-scaling by default.

This setup not only demonstrates a practical application of PII redaction at scale, but also provides a flexible framework that can be adapted to other real-world scenarios requiring consistent, auditable handling of sensitive free-text data.

🎥 Watch the Video Guide

</p> <h2> <a name="installation" href="#installation" class="anchor"> </a> 📦 Installation </h2> <p>First, install the ZeroGPU CLI, which is the binary every router skill wraps. You'll also need <a href="https://docs.claude.com/en/docs/claude-code">Claude Code</a> itself (<code>npm install -g @anthropic-ai/claude-code</code>) and Node.js 20 or newer.<br> </p> <p>```bash theme={null}<br> npm install -g zerogpu-cli<br> zerogpu --version</p> <div class="highlight"><pre class="highlight plaintext"><code> Next, start a Claude Code session by running `claude` in your terminal, then add the marketplace and install the `zerogpu-router` plugin. This is what exposes every ZeroGPU command as a Claude Code skill: ```text theme={null} /plugin marketplace add zerogpu/zerogpu-router /plugin install zerogpu-router@zerogpu /reload-plugins </code></pre></div> <p></p> <p>Confirm it's loaded with <code>/plugin</code>. You should see <code>zerogpu-router - enabled</code>. For the full setup, including CI-friendly flags, see the <a href="https://dev.to/integrations/claude-code-plugin">Claude Code plugin integration guide</a>.</p> <h2> <a name="setting-up-api-keys" href="#setting-up-api-keys" class="anchor"> </a> 🔑 Setting Up API Keys </h2> <p>You'll need to set up your ZeroGPU credentials so that every skill call works without re-prompting. This ensures Claude Code can reach ZeroGPU's inference API securely.</p> <p>You can go to <a href="https://platform.zerogpu.ai/dashboard">here</a> to get an API key and Project ID from ZeroGPU. The key starts with <code>zgpu-api-</code> and the Project ID (UUID) is on the project settings page.</p> <p>Sign in once from inside your Claude Code session. You'll be prompted for your API key and Project ID, and both are persisted to your config file:<br> </p> <p>```text theme={null}<br> /zerogpu-router:signin</p> <div class="highlight"><pre class="highlight plaintext"><code> Before you run anything, confirm the CLI is installed and you're signed in. `status` exits `0` and prints your masked API key when everything is wired up: ```bash theme={null} zerogpu --version # CLI is on your PATH zerogpu status # exits 0 and shows your masked API key when signed in </code></pre></div> <p></p> <p></p> <div class="highlight"><pre class="highlight plaintext"><code>ZeroGPU CLI 1.x.x Signed in as project 4ed3e5bb...fd1a API key: zgpu-api-************XXXX </code></pre></div> <p></p> <p>If <code>status</code> reports you're not signed in, run <code>/zerogpu-router:signin</code> again before continuing.</p> <h2> <a name="redact-pii-with-zerogpu" href="#redact-pii-with-zerogpu" class="anchor"> </a> 🔐 Redact PII with ZeroGPU </h2> <p>ZeroGPU is an ultra-fast, compute-efficient inference provider for apps and agents. We run purpose-built small and nano language models across an edge-powered network for the high-volume, purpose-specific tasks your app or agent runs constantly. Plug in our OpenAI-compatible API and you're live - zero GPU infrastructure, serverless, auto-scaling by default. In this section, we will redact PII from a single support comment as an example, so you can see exactly what the model gives back before pointing it at a whole file.</p> <p>The <code>redact-pii</code> skill detects PII spans and replaces each one in-line with an uppercase <code>[LABEL]</code> placeholder. It routes to <code>gliner-multi-pii-v1</code> with <code>mask: "label"</code>.<br> </p> <p>```text theme={null}<br> /zerogpu-router:redact-pii "Spoke to Sarah Chen but my refund never came. Call me at +1 415-555-0182 or email <a href="mailto:dana.morris@gmail.com">dana.morris@gmail.com</a>."</p> <div class="highlight"><pre class="highlight plaintext"><code> ```plaintext Spoke to [PERSON] but my refund never came. Call me at [PHONE_NUMBER] or email [EMAIL]. </code></pre></div> <p></p> <p>Note that only spans the model recognizes as PII are replaced. Names, phone numbers, and emails come back masked; an order number or internal ticket ID would pass through untouched.</p> <p>🎉 <strong>ZeroGPU effortlessly strips the personal data out of free text in one call, providing a cheap, consistent redaction layer for AI integration!</strong></p> <h2> <a name="sanitize-a-csv-of-customer-feedback" href="#sanitize-a-csv-of-customer-feedback" class="anchor"> </a> 🧾 Sanitize a CSV of Customer Feedback </h2> <p><em>This section takes a raw CSV export whose free-text column is full of personal data and produces a clean copy plus a PII audit log, with Claude orchestrating the loop and ZeroGPU doing the redaction on every row.</em></p> <p>Your support tool exports <code>feedback_export.csv</code>. The <code>comment</code> column is open-ended text where customers typed whatever they wanted, including their names, emails, phone numbers, and sometimes billing addresses. Before this file can go to a dashboard, a Slack channel, or a Git fixture, the PII has to come out. Compliance also wants a record of what was scrubbed, not just a clean file.</p> <p>Doing this by hand is error-prone, and one missed phone-number format leaks a customer. Regex is brittle. This recipe does it with a PII-aware model, consistently, across every row.</p> <h3> <a name="step-1-prepare-the-input-csv" href="#step-1-prepare-the-input-csv" class="anchor"> </a> Step 1: Prepare the input CSV </h3> <p>Place your export in the working directory. The recipe assumes a CSV with at least one free-text column to sanitize; all other columns pass through untouched.<br> </p> <p>```csv theme={null}<br> id,date,rating,comment<br> 1001,2026-05-21,2,"Spoke to Sarah Chen but my refund never came. Call me at +1 415-555-0182 or email <a href="mailto:dana.morris@gmail.com">dana.morris@gmail.com</a>."<br> 1002,2026-05-22,5,"Marcus Rivera was super helpful, thanks!"<br> 1003,2026-05-22,1,"Double charged again. Billing email is <a href="mailto:priya.patel@northwind-labs.com">priya.patel@northwind-labs.com</a>, acct under James Okafor."</p> <div class="highlight"><pre class="highlight plaintext"><code> Keep a stable, unique `id` column. It's what links a redacted row back to its audit entries. The `date` and `rating` columns are copied verbatim, and `comment` is the only column the models touch. If you don't have an `id` column, ask Claude to add a row index first. ### Step 2: Kick off the workflow with one prompt In your Claude Code session, in the directory containing the CSV, paste this. That's the whole interaction; everything after it is what Claude does on your behalf. ```text theme={null} Sanitize feedback_export.csv: 1. Redact PII in the `comment` column and write the result to feedback_clean.csv, keeping id, date, and rating unchanged. 2. Produce pii_audit.csv listing every PII entity found, one row per entity, with columns: id, category, label, value. Leave all non-comment columns exactly as they are. </code></pre></div> <p></p> <h3> <a name="step-3-claude-reads-and-parses-the-csv" href="#step-3-claude-reads-and-parses-the-csv" class="anchor"> </a> Step 3: Claude reads and parses the CSV </h3> <p>First, Claude opens <code>feedback_export.csv</code>, identifies the header row, and isolates the <code>comment</code> column as the field to process. It holds the other columns aside to re-attach unchanged. No model calls happen yet; this is just file parsing.</p> <h3> <a name="step-4-per-row-redact-the-comment-with-raw-redactpii-endraw-" href="#step-4-per-row-redact-the-comment-with-raw-redactpii-endraw-" class="anchor"> </a> Step 4: Per row, redact the comment with <code>redact-pii</code> </h3> <p>For each row, Claude sends the <code>comment</code> value to <code>redact-pii</code>, which returns the masked text that goes into the clean sheet.<br> </p> <p>```text theme={null}<br> /zerogpu-router:redact-pii "Spoke to Sarah Chen but my refund never came. Call me at +1 415-555-0182 or email <a href="mailto:dana.morris@gmail.com">dana.morris@gmail.com</a>."</p> <div class="highlight"><pre class="highlight plaintext"><code> ```plaintext Spoke to [PERSON] but my refund never came. Call me at [PHONE_NUMBER] or email [EMAIL]. </code></pre></div> <p></p> <h3> <a name="step-5-per-row-inventory-the-pii-with-raw-extractpii-endraw-" href="#step-5-per-row-inventory-the-pii-with-raw-extractpii-endraw-" class="anchor"> </a> Step 5: Per row, inventory the PII with <code>extract-pii</code> </h3> <p>For the same comment, Claude also calls <code>extract-pii</code>, which returns the PII entities as structured JSON without modifying the text. This is what populates the audit log. Claude tags each returned entity with the row's <code>id</code> so it can be traced back.<br> </p> <p>```text theme={null}<br> /zerogpu-router:extract-pii "Spoke to Sarah Chen but my refund never came. Call me at +1 415-555-0182 or email <a href="mailto:dana.morris@gmail.com">dana.morris@gmail.com</a>." -c identity,contact</p> <div class="highlight"><pre class="highlight plaintext"><code> ```json theme={null} [ { "category": "identity", "label": "person", "text": "Sarah Chen", "score": 0.96 }, { "category": "contact", "label": "phone", "text": "+1 415-555-0182", "score": 0.95 }, { "category": "contact", "label": "email", "text": "dana.morris@gmail.com", "score": 0.99 } ] </code></pre></div> <p></p> <p>Why two calls per row? <code>redact-pii</code> gives you the masked text; <code>extract-pii</code> gives you the itemized list of what was masked. They run on the same PII model but serve different outputs: the shareable file versus the compliance trail. <code>extract-pii</code> defaults to <code>-t 0.5</code> and <code>-c identity,contact</code>; add <code>financial</code>, <code>medical</code>, or <code>credentials</code> if your text contains them, and raise <code>-t</code> to reduce false positives.</p> <h3> <a name="step-6-claude-assembles-the-two-output-files" href="#step-6-claude-assembles-the-two-output-files" class="anchor"> </a> Step 6: Claude assembles the two output files </h3> <p>Claude loops Steps 4 and 5 across every row, then writes both files.</p> <p><code>feedback_clean.csv</code> keeps the same schema as the input, with <code>comment</code> now masked:<br> </p> <p>```csv theme={null}<br> id,date,rating,comment<br> 1001,2026-05-21,2,"Spoke to [PERSON] but my refund never came. Call me at [PHONE_NUMBER] or email [EMAIL]."<br> 1002,2026-05-22,5,"[PERSON] was super helpful, thanks!"<br> 1003,2026-05-22,1,"Double charged again. Billing email is [EMAIL], acct under [PERSON]."</p> <div class="highlight"><pre class="highlight plaintext"><code> `pii_audit.csv` has one row per detected entity, joined to the source row by `id`: ```csv theme={null} id,category,label,value 1001,identity,person,Sarah Chen 1001,contact,phone,+1 415-555-0182 1001,contact,email,dana.morris@gmail.com 1002,identity,person,Marcus Rivera 1003,contact,email,priya.patel@northwind-labs.com 1003,identity,person,James Okafor </code></pre></div> <p></p> <h3> <a name="step-7-verify-before-you-share" href="#step-7-verify-before-you-share" class="anchor"> </a> Step 7: Verify before you share </h3> <p>Run a few quick sanity checks before the clean file leaves your machine:<br> </p> <p>```bash theme={null}</p> <h1> <a name="1-row-counts-match-header-same-number-of-data-rows" href="#1-row-counts-match-header-same-number-of-data-rows" class="anchor"> </a> 1. Row counts match (header + same number of data rows) </h1> <p>wc -l feedback_export.csv feedback_clean.csv</p> <h1> <a name="2-no-obvious-leftovers-should-print-nothing" href="#2-no-obvious-leftovers-should-print-nothing" class="anchor"> </a> 2. No obvious leftovers; should print nothing </h1> <p>grep -E '@|+?[0-9][0-9 ()-]{7,}' feedback_clean.csv</p> <h1> <a name="3-eyeball-the-beforeafter-diff" href="#3-eyeball-the-beforeafter-diff" class="anchor"> </a> 3. Eyeball the before/after diff </h1> <p>diff <(cut -d, -f4- feedback_export.csv) <(cut -d, -f4- feedback_clean.csv)</p> <div class="highlight"><pre class="highlight plaintext"><code> If check 2 surfaces anything, it's almost always a domain-specific identifier the standard PII model doesn't cover (internal hostnames, contract numbers, order IDs, card last-fours), not a missed name or email. For those, add an `extract-entities` pass with your own labels and mask those spans too: ```text theme={null} /zerogpu-router:extract-entities "Order #88231 for acct A-4471 failed." --labels order_id,account_id -t 0.4 </code></pre></div> <p></p> <p>You end up with three files. <code>feedback_export.csv</code> is the original raw PII and is not safe to share. <code>feedback_clean.csv</code> has the same rows with <code>comment</code> masked and is safe to share. <code>pii_audit.csv</code> deliberately contains the original PII values, so treat it as a sensitive artifact: store it like any other secret, and never commit it to a public repo or drop it next to the clean file.</p> <p>🎉 From a single prompt, Claude parsed the CSV, ran <code>redact-pii</code> and <code>extract-pii</code> on every row, and wrote both a shareable clean copy and an auditable PII log, all while the raw personal data stayed out of its reasoning context.</p> <h2> <a name="highlights" href="#highlights" class="anchor"> </a> 🌟 Highlights </h2> <p>This notebook has guided you through setting up and running a Claude Code workflow with ZeroGPU for sanitizing a CSV of customer feedback. You can adapt and expand this example for various other scenarios requiring consistent, auditable handling of sensitive free-text data.</p> <p>Key tools utilized in this notebook include:</p> <ul> <li><strong>Claude Code</strong>: Anthropic's agentic coding tool that runs Claude directly in your terminal, with file editing, command execution, and a plugin system that extends sessions with custom slash commands and skills. Here it reads the CSV, loops over every row, and assembles the output files while routing the redaction work to ZeroGPU.</li> <li><strong>ZeroGPU</strong>: An ultra-fast, compute-efficient inference provider for apps and agents. We run purpose-built small and nano language models across an edge-powered network for the high-volume, purpose-specific tasks your app or agent runs constantly. Plug in our OpenAI-compatible API and you're live - zero GPU infrastructure, serverless, auto-scaling by default.</li> </ul> <p>This comprehensive setup allows you to adapt and expand the example for various scenarios requiring consistent, auditable handling of sensitive free-text data.</p>

Introducing Batch Processing for ZeroGPU

ZeroGPU — Thu, 28 May 2026 14:03:32 +0000

Running AI inference one request at a time works well for real-time product experiences. But many workloads do not need an immediate response. Data enrichment, classification, extraction, content moderation, summarization, and offline analytics often involve hundreds or thousands of requests that can be processed asynchronously.

That is where the ZeroGPU Batch API comes in.

With Batch Processing, you can upload a JSONL file, submit it as a batch job, and retrieve the results when processing is complete. It is designed for large asynchronous workloads where throughput, reliability, and simplicity matter more than instant response time.
Why Batch Processing?

Many AI workflows are naturally asynchronous.
For example, you might want to:

Classify thousands of documents.
Extract structured data from customer records.
Run content moderation over historical user-generated content.
Summarize support tickets, reviews, or research notes.
Process backfills or recurring data pipelines.

Sending each request individually can add unnecessary orchestration complexity. You need retry logic, request tracking, output matching, rate management, and failure handling.

The Batch API gives you a cleaner workflow.

How It Works
Batch Processing in ZeroGPU follows a simple file-based flow:

Create a JSONL input file.
Upload it using the Files API.
Create a batch using the returned file ID.
Poll the batch until it completes.
Download the output and error files.

Each line in the JSONL file represents one request. ZeroGPU processes those requests asynchronously and writes the results back to output files.

A minimal input line looks like this:

{“custom_id”:”request-1",”method”:”POST”,”url”:”/v1/chat/completions”,”body”:{“model”:”your-model-id”,”messages”:[{“role”:”user”,”content”:”Classify this text.”}]}}

The custom_id is returned in the output, so you can match every result back to your original input.

Built For AI Workloads At Scale

The Batch API is especially useful when you need to process a large amount of data without holding open client connections or building your own job orchestration layer.

ZeroGPU currently supports batch jobs for /v1/chat/completions, with JSONL files uploaded through /v1/files.

The core endpoints are:

POST /v1/files to upload input JSONL.
POST /v1/batches to create a batch job.
GET /v1/batches/{batch_id} to check status.
GET /v1/files/{file_id}/content to download results.

This makes batch processing easy to integrate into existing backend systems, cron jobs, data pipelines, and internal tools.

OpenAI-Compatible Shape
ZeroGPU’s Batch and Files APIs are wire-compatible with the OpenAI-style batch workflow, while using ZeroGPU authentication headers:

x-api-key: your-api-key
x-project-id: your-project-id

That means developers familiar with OpenAI batch jobs should feel at home, while still getting ZeroGPU’s routing, project isolation, logging, and model infrastructure.

When Should You Use Batch?
Use the real-time API when your user is waiting for a response.
Use the Batch API when the work can happen in the background.
Good fits include:

Nightly data processing.
Bulk document classification.
Large-scale extraction jobs.
Offline analytics.
Backfills.
Evaluation datasets.
Reprocessing historical data.

Batch jobs are also easier to audit because each request has a stable custom_id, and outputs are written to downloadable files.

Get Started

The fastest way to try it:

Prepare a JSONL file.
Upload it with POST /v1/files.
Create a batch with POST /v1/batches.
Poll for completion.
Download the output file.

You can try the new interactive playgrounds in the ZeroGPU docs:

Upload file: /api-reference/batch/upload-file
Create batch: /api-reference/batch/create-batch
Retrieve batch: /api-reference/batch/retrieve-batch
Download file: /api-reference/batch/download-file

Batch Processing makes it easier to run AI workloads at scale without managing queues, workers, retries, or GPU infrastructure.

ZeroGPU handles the execution. You focus on the data.