DEV Community: Jonathan Martin Paez

Introducing Fervon: a one-person software studio run by AI agent fleets

Jonathan Martin Paez — Sun, 14 Jun 2026 13:17:30 +0000

I've shipped a lot of small products this year — scattered across repos, names and landing pages with nothing tying them together. So I gave them a home: Fervon.

Fervon is a software studio. The unusual part is how it works: it's one builder running fleets of AI agents. I design the product and the architecture, then orchestrate agents to implement, review and ship — often several projects in parallel. Think of it as a small software factory with a single human at the helm.

What's in the forge

Trace — a local-first personal memory app. It captures lightweight signals from your browsing and activity and lets you search everything and scroll a timeline of your digital life: 100% on your machine, no screen recording, no cloud. (The "Rewind that Meta killed", minus the creepy part.)

The open-source tools:

inferbench — download, launch and benchmark local LLM engines from one desktop app. Real tok/s on your own GPU, no simulated numbers.
ClaudeScope — a local dashboard plus full-text search over your Claude Code sessions. Zero deps, zero network.
Lookspan — lightweight, local-first observability for AI agents (spans and traces).
Launchpad — a local launcher that discovers and runs all your projects on unique ports, no collisions.
Pregón — a cross-poster that adapts one update to every social channel. (This very article was published through it.)

Why "Fervon"?

From the Latin fervere — to burn, to boil, fervor. The whole brand is built around the forge: things come out fast, hot, and ready to use. The tagline is "Forged red-hot."

Where it's going

Fervon is a "house of brands": the studio is the label, and each product keeps its own identity (e.g. Trace by Fervon). Free and open tools stay free; a couple of products — starting with Trace — are paid and fully self-serve, no sales calls.

If you like local-first software, AI tooling, or watching someone build in public with an unusual workflow, come along:

👉 https://fervon.dev · code at github.com/fervon

Happy to go deep in the comments on how the agent-fleet workflow actually works.

Making a local-first tool's CSV export audit-ready (and why charts don't belong in a CSV)

Jonathan Martin Paez — Fri, 12 Jun 2026 12:01:00 +0000

"Just add a CSV export" is one of those tickets that sounds like an afternoon and turns into a week once someone says the word audit. I just shipped audit-grade exports across two local-first tools — Lookspan (observability + replay for LLM apps) and ClaudeScope (local analytics for your Claude Code sessions) — and "audit-ready" turned out to mean six concrete things in code. Here they are, with the gotchas.

1. CSV injection is the bug everyone forgets (CWE-1236)

A CSV is just text, so it feels safe. It isn't. If a cell value starts with =, +, -, @, a tab, or a carriage return, Excel and Google Sheets interpret it as a formula when the file is opened. A trace named =cmd|'/c calc'!A1 becomes a live command on the reviewer's machine. This is formula injection, and an "audit" export that triggers it is worse than no export.

The OWASP-recommended fix is to prefix offending values with a single quote so the spreadsheet treats them as text:

function neutralize(value) {
  if (typeof value !== "string") return value;          // numbers stay numbers
  return /^[=+\-@\t\r]/.test(value) ? `'${value}` : value;
}

Two things that bit me: only apply it to strings (otherwise -5 as a number gets mangled), and do it before RFC 4180 quoting, not after.

2. The mojibake tax: prepend a UTF-8 BOM

Excel on Windows still assumes the system code page unless a file starts with a UTF-8 byte-order mark. Without it, café and niño arrive as garbage in exactly the audience (non-US, regulated) most likely to need an audit export. One \uFEFF at the front fixes it. It's ugly; ship it anyway.

3. Provenance and integrity, or it isn't evidence

A bare table of rows proves nothing. An audit artifact needs to answer who/when/what/how-much and let a reviewer verify it wasn't altered. Both tools now emit:

exportedAt (ISO 8601, UTC), the filters that were applied, the row count
a SHA-256 of the exact CSV bytes (via the built-in node:crypto — no dependency)
an explicit truncation flag

That last one matters more than it looks. Both tools cap exports (10k rows). The old behavior silently returned a partial file — an incomplete export that looks complete is the most dangerous thing you can hand an auditor. Now the response carries truncated + totalAvailable, and the report shows it in red.

4. Determinism

Run the export twice on the same data, get a byte-identical file. That means a stable sort, not "whatever the DB returns":

ORDER BY started_at ASC, trace_id ASC   -- tiebreak, or ordering is non-deterministic

Without the secondary key, rows with equal timestamps shuffle between runs and your SHA-256 changes for no reason.

5. Minimize PII by default (GDPR Art. 5)

LLM traces and CLI transcripts are full of personal data and secrets. An audit export should not casually copy raw prompt bodies into a file someone emails around. The default now ships metadata only — ids, timings, token counts, cost, status — and raw attributes require an explicit ?raw=1 / opt-in flag. ClaudeScope's audit CSV is aggregate-per-project by design; the raw bodies stay behind the existing --dump-sessions opt-in. Privacy by default isn't a feature request, it's the safe default.

6. "Can you put a chart in the CSV?"

This was a real ask, and the honest answer is no. A CSV is plain text — rows and commas, no presentation layer. Anyone who "sees charts in a CSV" is actually looking at XLSX (which can embed charts, but needs a library or hand-rolled OOXML) or a report.

Since both tools are zero-dependency and local-first, I went with a self-contained HTML report: one file, no CDN, hand-drawn inline SVG charts (traces/day, cost by framework, token mix), the provenance block from §3, and the data table. It opens in any browser and prints to a clean PDF for evidence. No library, no build step, and it respects the same redaction rules.

GET /api/export/traces?format=html      # Lookspan
claudescope --report audit.html         # ClaudeScope

Takeaway

"Audit-ready" decomposes into boring, testable rules: neutralize formula injection, BOM for Excel, stamp provenance, hash for integrity, sort deterministically, minimize PII, and pick a real format for the visual layer instead of pretending a CSV can do it. None of it is hard — it's just the part that's easy to skip until someone asks you to prove the numbers.

Both tools are MIT, $0, and never phone home. I just opened GitHub Discussions on both — if you do compliance/observability work and have opinions on what an export like this should carry, I'd genuinely like to hear them:

Lookspan → https://github.com/JoniMartin27/lookspan/discussions
ClaudeScope → https://github.com/JoniMartin27/claudescope/discussions

Mission Control: one screen for the folder full of half-running dev servers

Jonathan Martin Paez — Tue, 09 Jun 2026 16:42:51 +0000

If you keep a dozen projects in one folder, you know the ritual. cd into a repo. Try to remember whether it's npm run dev or npm start. Launch it. Launch a second one — and watch both fight over port 5173. An hour later, find the stray node process still holding a port from yesterday.

I got tired of the wall of terminal tabs, so I built Mission Control: a local-only dashboard that auto-detects every dev project in a folder, figures out how to launch each one, and runs them all at once on collision-free ports.

Point it at your projects root. It figures out the rest.

There's no config file to write and no list of project names to maintain. Mission Control scans your projects folder and infers each project's type and dev command from its own files — package.json, framework configs, pyproject.toml. It recognizes Vite (React/Vue/Svelte/Phaser), Next, Astro, Electron, Express/Fastify/Koa, static HTML sites, Python/FastAPI, Telegram bots, npm-workspace monorepos, and backend/ + frontend/ splits.

Run many at once, never a port clash

Every project gets a unique port from a configurable range (default 4000–4099), injected at launch via the right mechanism per framework — PORT env plus the correct CLI flag. Start five servers at the same time and they're all isolated. Pinning, seeding, and clash-reallocation are deterministic, so the same project lands on the same port each run.

And when you press stop, it means stopped: a Windows process-tree kill (taskkill /T /F) so nothing is left holding a port. It guards against double-starts and refuses to clobber a port already held by a foreign process.

Everything in one view

Live logs — ANSI-cleaned, streamed over WebSocket, with filter/follow/clear
Git at a glance — branch, dirty count, ahead/behind, last commit
Health — published npm/PyPI version and GitHub CI status via gh, cached
Friendly failures — missing node_modules? One-click install (npm or uv), streamed live. Drop a new folder in and it animates into the grid via a filesystem watcher, no restart.

What it is NOT

Honesty matters, so: it's not cloud (binds 127.0.0.1, never 0.0.0.0), not multi-user, not a deploy tool, and not telemetry-backed. It launches the same dev commands you'd run by hand. Built and tested primarily on Windows.

It's MIT on GitHub: https://github.com/JoniMartin27/launchpad

inferbench: download, launch & benchmark local LLM engines from one desktop app

Jonathan Martin Paez — Sun, 07 Jun 2026 18:54:28 +0000

If you run LLMs locally, you've probably bounced between half a dozen tools: one to download a model, another to launch the engine, a third to figure out how many tokens/sec you're actually getting on your GPU. inferbench collapses that into a single desktop app.

What it does

Download models and inference engines (llama.cpp & friends) from one place.
Launch an engine against a model with the right flags, no terminal archaeology.
Benchmark real throughput on your hardware — actual tok/s, not marketing numbers. No simulated data: if an engine isn't available, you get an error, not a guess.
Serve & expose over MCP — keep a model resident and expose it to any MCP client over stdio or HTTP. Works for text and image models (Stable Diffusion via sd.cpp).

Why local-first

No cloud, no API keys, no per-token bill, no data leaving your machine. You see exactly what your own GPU can do — useful when you're picking a model for a real workload and need honest numbers.

In a recent smoke test, Qwen2.5-7B hit ~75 tok/s on an RTX 3070 end-to-end through inferbench.

Stack

React + Vite + Electron on the front, Python 3.11 + FastAPI + SQLModel on the back, packaged with a PyInstaller sidecar. Cross-checked model catalog (124 models) verified against Hugging Face.

Try it

https://github.com/JoniMartin27/inferbench

v0.1.1 is out now. Feedback and issues welcome — especially benchmark numbers from hardware I don't have. 🖥️

Cómo saber qué LLM te entra en tu GPU (y a cuántos tok/s) sin adivinar

Jonathan Martin Paez — Fri, 05 Jun 2026 15:09:11 +0000

monté InferBench, una app de escritorio open source que, con un click, descarga el motor, baja el modelo, lo arranca con la config óptima para tu hardware y **mide de verdad TTFT, tok/s, VRAM y calidad. Sin Docker, sin CLI, 100% local.

El problema: demasiadas variables

Correr un LLM en local suena fácil hasta que te enfrentas a la matriz real:

Qué modelo (Llama, Qwen, Gemma, Mistral, Phi…).
Qué cuantización (Q4_K_M, Q5_K_M, Q8_0, IQ2…). Cada una pesa distinto y degrada distinto.
Qué motor (llama.cpp, Ollama, vLLM, SGLang, TGI). Cada uno con sus flags.
Tu hardware (¿cuánta VRAM libre tienes de verdad? ¿la GPU también pinta la pantalla?).

La pregunta que importa —"¿esto me entra y a cuántos tok/s va a ir?"— normalmente se responde a base de prueba y error, descargas de varios GB y OOMs.

Cómo se calcula de verdad (no a ojo)

La clave es la KV-cache, que crece con el contexto y a menudo es lo que te saca de la VRAM. InferBench la calcula exacta desde la metadata del propio GGUF:

kv_per_token = 2 · n_layer · n_head_kv · head_dim · 2 bytes (f16)

Eso captura GQA/MQA correctamente (usar n_head en vez de n_head_kv infla la cuenta varias veces). Con la KV exacta + el tamaño del modelo al quant real, sabe qué contexto máximo te cabe y elige la mejor cuantización que entra.

Por qué medir gana a estimar

Los números inventados no sirven. InferBench corre la inferencia real y:

Descarta una pasada de warmup y mide N muestras (mediana + desviación), no una sola.
Toma el tok/s de los timings internos del motor (predicted_per_second), no del reloj del cliente.
Evalúa calidad con scorers verificables: para el prompt de código, ejecuta lo que genera el modelo en un sandbox y cuenta cuántos tests pasan.

Un dato real de mi equipo, medido con la propia app:

Hardware	Modelo	tok/s	TTFT	VRAM	Calidad
RTX 3070 8GB	Qwen2.5 7B Q4_K_M	75	284 ms	7.96 GB	100/100

Del click al benchmark

Eliges modelo y cuantizaciones.
InferBench descarga el binario del motor (release oficial de llama.cpp, con verificación SHA-256) y el GGUF de Hugging Face.
Arranca el motor con la config óptima para tu hardware.
Corre la suite de prompts midiendo TTFT, tok/s, VRAM y calidad.
Guarda los resultados y te deja compararlos lado a lado.

llama.cpp corre nativo, sin Docker; Ollama / vLLM / SGLang / TGI van por Docker; y también puedes medir APIs cloud (OpenAI, Anthropic, OpenRouter, NVIDIA) con el mismo interfaz.

Local-first de verdad

Tus datos no salen del equipo y la inferencia local cuesta $0. InferBench es parte de un stack local-first junto a un orquestador de agentes y una herramienta de observabilidad — todo sin nube.

Repo: https://github.com/JoniMartin27/inferbench
Descarga (Win/macOS/Linux): https://github.com/JoniMartin27/inferbench/releases/latest

Es open source (MIT). Si lo pruebas, me encantaría feedback honesto: qué motor o modelo te falta y qué se rompe.

Lookspan: local-first observability for AI agents

Jonathan Martin Paez — Wed, 03 Jun 2026 22:40:10 +0000

Most LLM observability tools are SaaS — your prompts leave your machine and you pay per event. Lookspan is the opposite: one command, runs locally, your data never leaves your box, infra cost zero.

npx lookspan   # → http://127.0.0.1:3100

It ingests spans/traces from your agents into a local SQLite database and shows them in a real-time dashboard:

a timeline (waterfall) of where time goes, plus a conversation transcript of each prompt/response
cost tracking per span and trace, latency p50/p95/p99
alerts on errors / cost / latency thresholds

It is MCP-native, with drop-in wrappers for the OpenAI and Anthropic SDKs (observeOpenAI / observeAnthropic) and an OpenTelemetry receiver — point any OTel exporter at it, no Lookspan SDK required.

Newer additions:

Replay a captured prompt against another model and diff cost / latency / output
LLM-as-judge scoring of a trace
Datasets to run a whole test set in batch and compare runs (model A vs B)

Local-first by design: binds to 127.0.0.1, redacts secret-looking values server-side, and your prompts/outputs never leave your machine.

MIT, TypeScript.

npx lookspan

Repo: https://github.com/JoniMartin27/lookspan

Building Lookspan: local-first observability & replay for LLM apps (v0.4.0)

Jonathan Martin Paez — Wed, 03 Jun 2026 18:03:15 +0000

I've been building Lookspan — a local-first observability and replay tool for apps that use LLMs — and wanted to share where it's at after the latest release.

The problem

When your app calls an LLM, what actually happened is mostly a black box: which prompt went out, what came back, which tools fired, and why the output changed between runs. Most observability stacks were built for plain HTTP services, not for the non-deterministic world of LLM calls.

What Lookspan does

Capture spans/traces of your LLM calls — prompts, responses, tool calls. It's MCP-native, so it plugs into the ecosystem instead of locking you in.
Replay & diff — re-run a captured trace and compare outputs side by side. Perfect for catching regressions when you tweak a prompt or swap a model.
LLM-as-judge — score outputs automatically instead of eyeballing them.
Local-first — your traces stay on your machine. No vendor, nothing leaves your laptop.

New in v0.4.0: datasets & experiments

The headline addition is a real evaluation loop:

Define a test set of inputs.
Run a batch through your app.
Judge the results (LLM-as-judge).
See the aggregates — pass rates, diffs, trends.

It turns "I think the new prompt is better" into a number you can actually compare.

The road here

0.2 — multi-agent capture
0.3 — replay/diff + LLM-as-judge
0.4 — datasets & experiments

Try it

npx lookspan

It's on npm: lookspan.

It's still early and I'd love feedback — what would you want from an LLM observability tool you can run entirely locally?

Local-first observability for AI agents, in one command

Jonathan Martin Paez — Tue, 02 Jun 2026 18:17:54 +0000

When an AI agent misbehaves — fails, stalls, or quietly burns tokens — you need
to see the steps. But most observability tools are cloud-first: accounts, API
keys, and shipping your prompts to someone else's servers.

I built Lookspan to be the opposite. It runs on your machine, stores everything
in local SQLite, and starts with one command:

npx lookspan

Open http://127.0.0.1:3100 and you get a real-time dashboard: traces, a span
graph, cost per model, latency percentiles, and alerts.

Send it data

Any language, raw HTTP:

curl -X POST http://127.0.0.1:3100/api/ingest -H "Content-Type: application/json" -d '{...}'

MCP (TypeScript):

npm i @lookspan/mcp

Python (LangGraph, CrewAI, or generic):

pip install lookspan

Already using OpenTelemetry? Point your OTLP exporter at it — no Lookspan SDK:

OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://127.0.0.1:3100/v1/traces

[aquí: GIF de demo + captura del dashboard]

It's MIT and early (v0.1). Repo + roadmap: https://github.com/JoniMartin27/lookspan