<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jonathan Martin Paez</title>
    <description>The latest articles on DEV Community by Jonathan Martin Paez (@jonimatiin).</description>
    <link>https://dev.to/jonimatiin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3965166%2Fd5ef8ebc-fa26-4fc0-9b0a-856ce4c32540.png</url>
      <title>DEV Community: Jonathan Martin Paez</title>
      <link>https://dev.to/jonimatiin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jonimatiin"/>
    <language>en</language>
    <item>
      <title>Introducing Fervon: a one-person software studio run by AI agent fleets</title>
      <dc:creator>Jonathan Martin Paez</dc:creator>
      <pubDate>Sun, 14 Jun 2026 13:17:30 +0000</pubDate>
      <link>https://dev.to/jonimatiin/introducing-fervon-a-one-person-software-studio-run-by-ai-agent-fleets-2gf</link>
      <guid>https://dev.to/jonimatiin/introducing-fervon-a-one-person-software-studio-run-by-ai-agent-fleets-2gf</guid>
      <description>&lt;p&gt;I've shipped a lot of small products this year — scattered across repos, names and landing pages with nothing tying them together. So I gave them a home: &lt;strong&gt;Fervon&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Fervon is a software studio. The unusual part is how it works: it's &lt;strong&gt;one builder running fleets of AI agents&lt;/strong&gt;. I design the product and the architecture, then orchestrate agents to implement, review and ship — often several projects in parallel. Think of it as a small software factory with a single human at the helm.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's in the forge
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Trace&lt;/strong&gt; — a local-first personal memory app. It captures lightweight signals from your browsing and activity and lets you search everything and scroll a timeline of your digital life: 100% on your machine, no screen recording, no cloud. (The "Rewind that Meta killed", minus the creepy part.)&lt;/p&gt;

&lt;p&gt;The open-source tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;inferbench&lt;/strong&gt; — download, launch and benchmark local LLM engines from one desktop app. Real tok/s on your own GPU, no simulated numbers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ClaudeScope&lt;/strong&gt; — a local dashboard plus full-text search over your Claude Code sessions. Zero deps, zero network.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lookspan&lt;/strong&gt; — lightweight, local-first observability for AI agents (spans and traces).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Launchpad&lt;/strong&gt; — a local launcher that discovers and runs all your projects on unique ports, no collisions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pregón&lt;/strong&gt; — a cross-poster that adapts one update to every social channel. (This very article was published through it.)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why "Fervon"?
&lt;/h2&gt;

&lt;p&gt;From the Latin &lt;em&gt;fervere&lt;/em&gt; — to burn, to boil, fervor. The whole brand is built around the forge: things come out fast, hot, and ready to use. The tagline is &lt;strong&gt;"Forged red-hot."&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it's going
&lt;/h2&gt;

&lt;p&gt;Fervon is a "house of brands": the studio is the label, and each product keeps its own identity (e.g. &lt;em&gt;Trace by Fervon&lt;/em&gt;). Free and open tools stay free; a couple of products — starting with Trace — are paid and fully self-serve, no sales calls.&lt;/p&gt;

&lt;p&gt;If you like local-first software, AI tooling, or watching someone build in public with an unusual workflow, come along:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://fervon.dev" rel="noopener noreferrer"&gt;https://fervon.dev&lt;/a&gt;&lt;/strong&gt; · code at &lt;strong&gt;github.com/fervon&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Happy to go deep in the comments on how the agent-fleet workflow actually works.&lt;/p&gt;

</description>
      <category>buildinpublic</category>
      <category>ai</category>
      <category>opensource</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Making a local-first tool's CSV export audit-ready (and why charts don't belong in a CSV)</title>
      <dc:creator>Jonathan Martin Paez</dc:creator>
      <pubDate>Fri, 12 Jun 2026 12:01:00 +0000</pubDate>
      <link>https://dev.to/jonimatiin/making-a-local-first-tools-csv-export-audit-ready-and-why-charts-dont-belong-in-a-csv-4jla</link>
      <guid>https://dev.to/jonimatiin/making-a-local-first-tools-csv-export-audit-ready-and-why-charts-dont-belong-in-a-csv-4jla</guid>
      <description>&lt;p&gt;"Just add a CSV export" is one of those tickets that sounds like an afternoon and turns into a week once someone says the word &lt;em&gt;audit&lt;/em&gt;. I just shipped audit-grade exports across two local-first tools — &lt;a href="https://github.com/JoniMartin27/lookspan" rel="noopener noreferrer"&gt;Lookspan&lt;/a&gt; (observability + replay for LLM apps) and &lt;a href="https://github.com/JoniMartin27/claudescope" rel="noopener noreferrer"&gt;ClaudeScope&lt;/a&gt; (local analytics for your Claude Code sessions) — and "audit-ready" turned out to mean six concrete things in code. Here they are, with the gotchas.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. CSV injection is the bug everyone forgets (CWE-1236)
&lt;/h2&gt;

&lt;p&gt;A CSV is just text, so it feels safe. It isn't. If a cell value starts with &lt;code&gt;=&lt;/code&gt;, &lt;code&gt;+&lt;/code&gt;, &lt;code&gt;-&lt;/code&gt;, &lt;code&gt;@&lt;/code&gt;, a tab, or a carriage return, Excel and Google Sheets interpret it as a &lt;strong&gt;formula&lt;/strong&gt; when the file is opened. A trace named &lt;code&gt;=cmd|'/c calc'!A1&lt;/code&gt; becomes a live command on the reviewer's machine. This is &lt;em&gt;formula injection&lt;/em&gt;, and an "audit" export that triggers it is worse than no export.&lt;/p&gt;

&lt;p&gt;The OWASP-recommended fix is to prefix offending values with a single quote so the spreadsheet treats them as text:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;neutralize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;          &lt;span class="c1"&gt;// numbers stay numbers&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;=+&lt;/span&gt;&lt;span class="se"&gt;\-&lt;/span&gt;&lt;span class="sr"&gt;@&lt;/span&gt;&lt;span class="se"&gt;\t\r]&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="s2"&gt;`'&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things that bit me: only apply it to &lt;strong&gt;strings&lt;/strong&gt; (otherwise &lt;code&gt;-5&lt;/code&gt; as a number gets mangled), and do it &lt;em&gt;before&lt;/em&gt; RFC 4180 quoting, not after.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The mojibake tax: prepend a UTF-8 BOM
&lt;/h2&gt;

&lt;p&gt;Excel on Windows still assumes the system code page unless a file starts with a UTF-8 byte-order mark. Without it, &lt;code&gt;café&lt;/code&gt; and &lt;code&gt;niño&lt;/code&gt; arrive as garbage in exactly the audience (non-US, regulated) most likely to need an audit export. One &lt;code&gt;\uFEFF&lt;/code&gt; at the front fixes it. It's ugly; ship it anyway.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Provenance and integrity, or it isn't evidence
&lt;/h2&gt;

&lt;p&gt;A bare table of rows proves nothing. An audit artifact needs to answer &lt;em&gt;who/when/what/how-much&lt;/em&gt; and let a reviewer verify it wasn't altered. Both tools now emit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;exportedAt&lt;/code&gt; (ISO 8601, UTC), the filters that were applied, the row count&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;SHA-256 of the exact CSV bytes&lt;/strong&gt; (via the built-in &lt;code&gt;node:crypto&lt;/code&gt; — no dependency)&lt;/li&gt;
&lt;li&gt;an explicit &lt;strong&gt;truncation flag&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one matters more than it looks. Both tools cap exports (10k rows). The old behavior silently returned a partial file — an incomplete export that &lt;em&gt;looks&lt;/em&gt; complete is the most dangerous thing you can hand an auditor. Now the response carries &lt;code&gt;truncated&lt;/code&gt; + &lt;code&gt;totalAvailable&lt;/code&gt;, and the report shows it in red.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Determinism
&lt;/h2&gt;

&lt;p&gt;Run the export twice on the same data, get a byte-identical file. That means a stable sort, not "whatever the DB returns":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;started_at&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trace_id&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;   &lt;span class="c1"&gt;-- tiebreak, or ordering is non-deterministic&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without the secondary key, rows with equal timestamps shuffle between runs and your SHA-256 changes for no reason.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Minimize PII by default (GDPR Art. 5)
&lt;/h2&gt;

&lt;p&gt;LLM traces and CLI transcripts are full of personal data and secrets. An audit export should not casually copy raw prompt bodies into a file someone emails around. The default now ships &lt;strong&gt;metadata only&lt;/strong&gt; — ids, timings, token counts, cost, status — and raw attributes require an explicit &lt;code&gt;?raw=1&lt;/code&gt; / opt-in flag. ClaudeScope's audit CSV is aggregate-per-project by design; the raw bodies stay behind the existing &lt;code&gt;--dump-sessions&lt;/code&gt; opt-in. Privacy by default isn't a feature request, it's the safe default.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. "Can you put a chart in the CSV?"
&lt;/h2&gt;

&lt;p&gt;This was a real ask, and the honest answer is &lt;strong&gt;no&lt;/strong&gt;. A CSV is plain text — rows and commas, no presentation layer. Anyone who "sees charts in a CSV" is actually looking at &lt;strong&gt;XLSX&lt;/strong&gt; (which can embed charts, but needs a library or hand-rolled OOXML) or a report.&lt;/p&gt;

&lt;p&gt;Since both tools are zero-dependency and local-first, I went with a &lt;strong&gt;self-contained HTML report&lt;/strong&gt;: one file, no CDN, hand-drawn inline &lt;strong&gt;SVG&lt;/strong&gt; charts (traces/day, cost by framework, token mix), the provenance block from §3, and the data table. It opens in any browser and prints to a clean PDF for evidence. No library, no build step, and it respects the same redaction rules.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET /api/export/traces?format=html      # Lookspan
claudescope --report audit.html         # ClaudeScope
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;"Audit-ready" decomposes into boring, testable rules: neutralize formula injection, BOM for Excel, stamp provenance, hash for integrity, sort deterministically, minimize PII, and pick a real format for the visual layer instead of pretending a CSV can do it. None of it is hard — it's just the part that's easy to skip until someone asks you to prove the numbers.&lt;/p&gt;

&lt;p&gt;Both tools are MIT, $0, and never phone home. I just opened &lt;strong&gt;GitHub Discussions&lt;/strong&gt; on both — if you do compliance/observability work and have opinions on what an export like this should carry, I'd genuinely like to hear them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lookspan → &lt;a href="https://github.com/JoniMartin27/lookspan/discussions" rel="noopener noreferrer"&gt;https://github.com/JoniMartin27/lookspan/discussions&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;ClaudeScope → &lt;a href="https://github.com/JoniMartin27/claudescope/discussions" rel="noopener noreferrer"&gt;https://github.com/JoniMartin27/claudescope/discussions&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>opensource</category>
      <category>security</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Mission Control: one screen for the folder full of half-running dev servers</title>
      <dc:creator>Jonathan Martin Paez</dc:creator>
      <pubDate>Tue, 09 Jun 2026 16:42:51 +0000</pubDate>
      <link>https://dev.to/jonimatiin/mission-control-one-screen-for-the-folder-full-of-half-running-dev-servers-3il6</link>
      <guid>https://dev.to/jonimatiin/mission-control-one-screen-for-the-folder-full-of-half-running-dev-servers-3il6</guid>
      <description>&lt;p&gt;If you keep a dozen projects in one folder, you know the ritual. &lt;code&gt;cd&lt;/code&gt; into a repo. Try to remember whether it's &lt;code&gt;npm run dev&lt;/code&gt; or &lt;code&gt;npm start&lt;/code&gt;. Launch it. Launch a second one — and watch both fight over port 5173. An hour later, find the stray &lt;code&gt;node&lt;/code&gt; process still holding a port from yesterday.&lt;/p&gt;

&lt;p&gt;I got tired of the wall of terminal tabs, so I built &lt;strong&gt;Mission Control&lt;/strong&gt;: a local-only dashboard that auto-detects every dev project in a folder, figures out how to launch each one, and runs them all at once on collision-free ports.&lt;/p&gt;

&lt;h2&gt;
  
  
  Point it at your projects root. It figures out the rest.
&lt;/h2&gt;

&lt;p&gt;There's no config file to write and no list of project names to maintain. Mission Control scans your projects folder and infers each project's type and dev command from its &lt;em&gt;own&lt;/em&gt; files — &lt;code&gt;package.json&lt;/code&gt;, framework configs, &lt;code&gt;pyproject.toml&lt;/code&gt;. It recognizes Vite (React/Vue/Svelte/Phaser), Next, Astro, Electron, Express/Fastify/Koa, static HTML sites, Python/FastAPI, Telegram bots, npm-workspace monorepos, and &lt;code&gt;backend/&lt;/code&gt; + &lt;code&gt;frontend/&lt;/code&gt; splits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Run many at once, never a port clash
&lt;/h2&gt;

&lt;p&gt;Every project gets a unique port from a configurable range (default 4000–4099), injected at launch via the right mechanism per framework — &lt;code&gt;PORT&lt;/code&gt; env plus the correct CLI flag. Start five servers at the same time and they're all isolated. Pinning, seeding, and clash-reallocation are deterministic, so the same project lands on the same port each run.&lt;/p&gt;

&lt;p&gt;And when you press stop, it means stopped: a Windows process-tree kill (&lt;code&gt;taskkill /T /F&lt;/code&gt;) so nothing is left holding a port. It guards against double-starts and refuses to clobber a port already held by a foreign process.&lt;/p&gt;

&lt;h2&gt;
  
  
  Everything in one view
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Live logs&lt;/strong&gt; — ANSI-cleaned, streamed over WebSocket, with filter/follow/clear&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Git at a glance&lt;/strong&gt; — branch, dirty count, ahead/behind, last commit&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Health&lt;/strong&gt; — published npm/PyPI version and GitHub CI status via &lt;code&gt;gh&lt;/code&gt;, cached&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Friendly failures&lt;/strong&gt; — missing &lt;code&gt;node_modules&lt;/code&gt;? One-click install (npm or &lt;code&gt;uv&lt;/code&gt;), streamed live. Drop a new folder in and it animates into the grid via a filesystem watcher, no restart.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What it is NOT
&lt;/h2&gt;

&lt;p&gt;Honesty matters, so: it's not cloud (binds &lt;code&gt;127.0.0.1&lt;/code&gt;, never &lt;code&gt;0.0.0.0&lt;/code&gt;), not multi-user, not a deploy tool, and not telemetry-backed. It launches the same dev commands you'd run by hand. Built and tested primarily on Windows.&lt;/p&gt;

&lt;p&gt;It's MIT on GitHub: &lt;a href="https://github.com/JoniMartin27/launchpad" rel="noopener noreferrer"&gt;https://github.com/JoniMartin27/launchpad&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>tools</category>
      <category>opensource</category>
    </item>
    <item>
      <title>inferbench: download, launch &amp; benchmark local LLM engines from one desktop app</title>
      <dc:creator>Jonathan Martin Paez</dc:creator>
      <pubDate>Sun, 07 Jun 2026 18:54:28 +0000</pubDate>
      <link>https://dev.to/jonimatiin/inferbench-download-launch-benchmark-local-llm-engines-from-one-desktop-app-h5a</link>
      <guid>https://dev.to/jonimatiin/inferbench-download-launch-benchmark-local-llm-engines-from-one-desktop-app-h5a</guid>
      <description>&lt;p&gt;If you run LLMs locally, you've probably bounced between half a dozen tools: one to download a model, another to launch the engine, a third to figure out how many tokens/sec you're &lt;em&gt;actually&lt;/em&gt; getting on your GPU. &lt;strong&gt;inferbench&lt;/strong&gt; collapses that into a single desktop app.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Download&lt;/strong&gt; models and inference engines (llama.cpp &amp;amp; friends) from one place.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Launch&lt;/strong&gt; an engine against a model with the right flags, no terminal archaeology.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmark&lt;/strong&gt; real throughput on &lt;em&gt;your&lt;/em&gt; hardware — actual tok/s, not marketing numbers. No simulated data: if an engine isn't available, you get an error, not a guess.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serve &amp;amp; expose over MCP&lt;/strong&gt; — keep a model resident and expose it to any MCP client over stdio or HTTP. Works for &lt;strong&gt;text and image&lt;/strong&gt; models (Stable Diffusion via sd.cpp).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why local-first
&lt;/h2&gt;

&lt;p&gt;No cloud, no API keys, no per-token bill, no data leaving your machine. You see exactly what your own GPU can do — useful when you're picking a model for a real workload and need honest numbers.&lt;/p&gt;

&lt;p&gt;In a recent smoke test, Qwen2.5-7B hit &lt;strong&gt;~75 tok/s on an RTX 3070&lt;/strong&gt; end-to-end through inferbench.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;p&gt;React + Vite + Electron on the front, Python 3.11 + FastAPI + SQLModel on the back, packaged with a PyInstaller sidecar. Cross-checked model catalog (124 models) verified against Hugging Face.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://github.com/JoniMartin27/inferbench
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;v0.1.1 is out now. Feedback and issues welcome — especially benchmark numbers from hardware I don't have. 🖥️&lt;/p&gt;

</description>
      <category>llm</category>
      <category>localllm</category>
      <category>benchmark</category>
      <category>ai</category>
    </item>
    <item>
      <title>Cómo saber qué LLM te entra en tu GPU (y a cuántos tok/s) sin adivinar</title>
      <dc:creator>Jonathan Martin Paez</dc:creator>
      <pubDate>Fri, 05 Jun 2026 15:09:11 +0000</pubDate>
      <link>https://dev.to/jonimatiin/como-saber-que-llm-te-entra-en-tu-gpu-y-a-cuantos-toks-sin-adivinar-171</link>
      <guid>https://dev.to/jonimatiin/como-saber-que-llm-te-entra-en-tu-gpu-y-a-cuantos-toks-sin-adivinar-171</guid>
      <description>&lt;p&gt;monté &lt;a href="https://github.com/JoniMartin27/inferbench" rel="noopener noreferrer"&gt;InferBench&lt;/a&gt;&lt;strong&gt;, una app de escritorio open source que, con un click, descarga el motor, baja el modelo, lo arranca con la config óptima para tu hardware y **mide de verdad&lt;/strong&gt; TTFT, tok/s, VRAM y calidad. Sin Docker, sin CLI, 100% local.&lt;/p&gt;

&lt;h2&gt;
  
  
  El problema: demasiadas variables
&lt;/h2&gt;

&lt;p&gt;Correr un LLM en local suena fácil hasta que te enfrentas a la matriz real:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Qué modelo&lt;/strong&gt; (Llama, Qwen, Gemma, Mistral, Phi…).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qué cuantización&lt;/strong&gt; (Q4_K_M, Q5_K_M, Q8_0, IQ2…). Cada una pesa distinto y degrada distinto.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qué motor&lt;/strong&gt; (llama.cpp, Ollama, vLLM, SGLang, TGI). Cada uno con sus flags.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tu hardware&lt;/strong&gt; (¿cuánta VRAM libre tienes de verdad? ¿la GPU también pinta la pantalla?).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;La pregunta que importa —&lt;strong&gt;"¿esto me entra y a cuántos tok/s va a ir?"&lt;/strong&gt;— normalmente se responde a base de prueba y error, descargas de varios GB y OOMs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cómo se calcula de verdad (no a ojo)
&lt;/h2&gt;

&lt;p&gt;La clave es la &lt;strong&gt;KV-cache&lt;/strong&gt;, que crece con el contexto y a menudo es lo que te saca de la VRAM. InferBench la calcula &lt;strong&gt;exacta&lt;/strong&gt; desde la metadata del propio GGUF:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kv_per_token = 2 · n_layer · n_head_kv · head_dim · 2 bytes (f16)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Eso captura GQA/MQA correctamente (usar &lt;code&gt;n_head&lt;/code&gt; en vez de &lt;code&gt;n_head_kv&lt;/code&gt; infla la cuenta varias veces). Con la KV exacta + el tamaño del modelo al quant real, sabe qué contexto máximo te cabe y elige la mejor cuantización que entra.&lt;/p&gt;

&lt;h2&gt;
  
  
  Por qué medir gana a estimar
&lt;/h2&gt;

&lt;p&gt;Los números inventados no sirven. InferBench corre la inferencia real y:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Descarta &lt;strong&gt;una pasada de warmup&lt;/strong&gt; y mide &lt;strong&gt;N muestras&lt;/strong&gt; (mediana + desviación), no una sola.&lt;/li&gt;
&lt;li&gt;Toma el tok/s de los &lt;strong&gt;timings internos del motor&lt;/strong&gt; (&lt;code&gt;predicted_per_second&lt;/code&gt;), no del reloj del cliente.&lt;/li&gt;
&lt;li&gt;Evalúa &lt;strong&gt;calidad con scorers verificables&lt;/strong&gt;: para el prompt de código, &lt;strong&gt;ejecuta&lt;/strong&gt; lo que genera el modelo en un sandbox y cuenta cuántos tests pasan.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Un dato real de mi equipo, medido con la propia app:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Hardware&lt;/th&gt;
&lt;th&gt;Modelo&lt;/th&gt;
&lt;th&gt;tok/s&lt;/th&gt;
&lt;th&gt;TTFT&lt;/th&gt;
&lt;th&gt;VRAM&lt;/th&gt;
&lt;th&gt;Calidad&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RTX 3070 8GB&lt;/td&gt;
&lt;td&gt;Qwen2.5 7B Q4_K_M&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;75&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;284 ms&lt;/td&gt;
&lt;td&gt;7.96 GB&lt;/td&gt;
&lt;td&gt;100/100&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Del click al benchmark
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fww4q2uy7pv0hz7xyfwgy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fww4q2uy7pv0hz7xyfwgy.gif" alt="demo" width="760" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Eliges modelo y cuantizaciones.&lt;/li&gt;
&lt;li&gt;InferBench descarga el binario del motor (release oficial de llama.cpp, con verificación SHA-256) y el GGUF de Hugging Face.&lt;/li&gt;
&lt;li&gt;Arranca el motor con la config óptima para tu hardware.&lt;/li&gt;
&lt;li&gt;Corre la suite de prompts midiendo TTFT, tok/s, VRAM y calidad.&lt;/li&gt;
&lt;li&gt;Guarda los resultados y te deja compararlos lado a lado.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;llama.cpp corre &lt;strong&gt;nativo, sin Docker&lt;/strong&gt;; Ollama / vLLM / SGLang / TGI van por Docker; y también puedes medir APIs cloud (OpenAI, Anthropic, OpenRouter, NVIDIA) con el mismo interfaz.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local-first de verdad
&lt;/h2&gt;

&lt;p&gt;Tus datos no salen del equipo y la inferencia local cuesta $0. InferBench es parte de un stack local-first junto a un orquestador de agentes y una herramienta de observabilidad — todo sin nube.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/JoniMartin27/inferbench" rel="noopener noreferrer"&gt;https://github.com/JoniMartin27/inferbench&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Descarga (Win/macOS/Linux):&lt;/strong&gt; &lt;a href="https://github.com/JoniMartin27/inferbench/releases/latest" rel="noopener noreferrer"&gt;https://github.com/JoniMartin27/inferbench/releases/latest&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Es open source (MIT). Si lo pruebas, me encantaría feedback honesto: qué motor o modelo te falta y qué se rompe.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>localai</category>
      <category>opensource</category>
      <category>python</category>
    </item>
    <item>
      <title>Lookspan: local-first observability for AI agents</title>
      <dc:creator>Jonathan Martin Paez</dc:creator>
      <pubDate>Wed, 03 Jun 2026 22:40:10 +0000</pubDate>
      <link>https://dev.to/jonimatiin/lookspan-local-first-observability-for-ai-agents-go3</link>
      <guid>https://dev.to/jonimatiin/lookspan-local-first-observability-for-ai-agents-go3</guid>
      <description>&lt;p&gt;Most LLM observability tools are SaaS — your prompts leave your machine and you pay per event. &lt;strong&gt;Lookspan&lt;/strong&gt; is the opposite: one command, runs locally, your data never leaves your box, infra cost zero.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx lookspan   &lt;span class="c"&gt;# → http://127.0.0.1:3100&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It ingests spans/traces from your agents into a local SQLite database and shows them in a real-time dashboard:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a &lt;strong&gt;timeline (waterfall)&lt;/strong&gt; of where time goes, plus a &lt;strong&gt;conversation transcript&lt;/strong&gt; of each prompt/response&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cost tracking&lt;/strong&gt; per span and trace, latency p50/p95/p99&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;alerts&lt;/strong&gt; on errors / cost / latency thresholds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is &lt;strong&gt;MCP-native&lt;/strong&gt;, with drop-in wrappers for the OpenAI and Anthropic SDKs (&lt;code&gt;observeOpenAI&lt;/code&gt; / &lt;code&gt;observeAnthropic&lt;/code&gt;) and an OpenTelemetry receiver — point any OTel exporter at it, no Lookspan SDK required.&lt;/p&gt;

&lt;p&gt;Newer additions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Replay&lt;/strong&gt; a captured prompt against another model and diff cost / latency / output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM-as-judge&lt;/strong&gt; scoring of a trace&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Datasets&lt;/strong&gt; to run a whole test set in batch and compare runs (model A vs B)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Local-first by design: binds to 127.0.0.1, redacts secret-looking values server-side, and your prompts/outputs never leave your machine.&lt;/p&gt;

&lt;p&gt;MIT, TypeScript.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx lookspan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Repo: &lt;a href="https://github.com/JoniMartin27/lookspan" rel="noopener noreferrer"&gt;https://github.com/JoniMartin27/lookspan&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>observability</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Building Lookspan: local-first observability &amp; replay for LLM apps (v0.4.0)</title>
      <dc:creator>Jonathan Martin Paez</dc:creator>
      <pubDate>Wed, 03 Jun 2026 18:03:15 +0000</pubDate>
      <link>https://dev.to/jonimatiin/building-lookspan-local-first-observability-replay-for-llm-apps-v040-441a</link>
      <guid>https://dev.to/jonimatiin/building-lookspan-local-first-observability-replay-for-llm-apps-v040-441a</guid>
      <description>&lt;p&gt;I've been building &lt;strong&gt;Lookspan&lt;/strong&gt; — a local-first observability and replay tool for apps that use LLMs — and wanted to share where it's at after the latest release.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;When your app calls an LLM, what actually happened is mostly a black box: which prompt went out, what came back, which tools fired, and why the output changed between runs. Most observability stacks were built for plain HTTP services, not for the non-deterministic world of LLM calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Lookspan does
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Capture&lt;/strong&gt; spans/traces of your LLM calls — prompts, responses, tool calls. It's &lt;strong&gt;MCP-native&lt;/strong&gt;, so it plugs into the ecosystem instead of locking you in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replay &amp;amp; diff&lt;/strong&gt; — re-run a captured trace and compare outputs side by side. Perfect for catching regressions when you tweak a prompt or swap a model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM-as-judge&lt;/strong&gt; — score outputs automatically instead of eyeballing them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local-first&lt;/strong&gt; — your traces stay on your machine. No vendor, nothing leaves your laptop.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  New in v0.4.0: datasets &amp;amp; experiments
&lt;/h2&gt;

&lt;p&gt;The headline addition is a real evaluation loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Define a &lt;strong&gt;test set&lt;/strong&gt; of inputs.&lt;/li&gt;
&lt;li&gt;Run a &lt;strong&gt;batch&lt;/strong&gt; through your app.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Judge&lt;/strong&gt; the results (LLM-as-judge).&lt;/li&gt;
&lt;li&gt;See the &lt;strong&gt;aggregates&lt;/strong&gt; — pass rates, diffs, trends.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It turns "I think the new prompt is better" into a number you can actually compare.&lt;/p&gt;

&lt;h2&gt;
  
  
  The road here
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;0.2&lt;/strong&gt; — multi-agent capture&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0.3&lt;/strong&gt; — replay/diff + LLM-as-judge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0.4&lt;/strong&gt; — datasets &amp;amp; experiments&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx lookspan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's on npm: &lt;a href="https://www.npmjs.com/package/lookspan" rel="noopener noreferrer"&gt;lookspan&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It's still early and I'd love feedback — what would you want from an LLM observability tool you can run entirely locally?&lt;/p&gt;

</description>
      <category>llm</category>
      <category>observability</category>
      <category>ai</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Local-first observability for AI agents, in one command</title>
      <dc:creator>Jonathan Martin Paez</dc:creator>
      <pubDate>Tue, 02 Jun 2026 18:17:54 +0000</pubDate>
      <link>https://dev.to/jonimatiin/local-first-observability-for-ai-agents-in-one-command-13hg</link>
      <guid>https://dev.to/jonimatiin/local-first-observability-for-ai-agents-in-one-command-13hg</guid>
      <description>&lt;p&gt;When an AI agent misbehaves — fails, stalls, or quietly burns tokens — you need&lt;br&gt;
to see the steps. But most observability tools are cloud-first: accounts, API&lt;br&gt;
keys, and shipping your prompts to someone else's servers.&lt;/p&gt;

&lt;p&gt;I built Lookspan to be the opposite. It runs on your machine, stores everything&lt;br&gt;
in local SQLite, and starts with one command:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npx lookspan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Open &lt;a href="http://127.0.0.1:3100" rel="noopener noreferrer"&gt;http://127.0.0.1:3100&lt;/a&gt; and you get a real-time dashboard: traces, a span&lt;br&gt;
graph, cost per model, latency percentiles, and alerts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Send it data
&lt;/h2&gt;

&lt;p&gt;Any language, raw HTTP:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -X POST http://127.0.0.1:3100/api/ingest -H "Content-Type: application/json" -d '{...}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;MCP (TypeScript):&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npm i @lookspan/mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Python (LangGraph, CrewAI, or generic):&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install lookspan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Already using OpenTelemetry? Point your OTLP exporter at it — no Lookspan SDK:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://127.0.0.1:3100/v1/traces
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;[aquí: GIF de demo + captura del dashboard]&lt;/p&gt;

&lt;p&gt;It's MIT and early (v0.1). Repo + roadmap: &lt;a href="https://github.com/JoniMartin27/lookspan" rel="noopener noreferrer"&gt;https://github.com/JoniMartin27/lookspan&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>typescript</category>
      <category>observability</category>
    </item>
  </channel>
</rss>
