Sushil Kulkarni

Posted on Mar 25

I Built a Developer Tool Entirely with AI — Here's the Honest Breakdown of Every Tool, Every Decision, and Every Mistake

#mcp #ai #showdev #tooling

How Pixdom went from a frustrating gap in my workflow to a fully-shipped CLI + MCP server — and what the toolchain actually looked like from the inside.

There's a moment every developer using Claude has had.

You ask it to generate a LinkedIn post card. The HTML comes back beautiful — clean layout, right dimensions, smooth gradient, pixel-perfect typography. You stare at it in your terminal. Then you open a browser, paste the HTML into a file, open it, take a screenshot, crop it, resize it, convert it to JPEG, and finally — finally — have something you can actually post.

Every. Single. Time.

That friction was the entire reason I built Pixdom.

The Problem I Was Actually Solving

I kept running into the same loop: Claude generates rich HTML output → I manually screenshot it → I resize it → I convert it → I lose 10 minutes. Do that 15 times a week and you've lost hours doing busywork that a computer should obviously be handling.

The gap wasn't Claude's fault. Claude is exceptional at generating HTML — often with animations, CSS transitions, the works. The gap was that nothing closed the loop after the HTML existed. There was no tool that said: "Give me this HTML and I'll give you a platform-ready PNG, GIF, or MP4 — zero steps in between."

The manual path, if you've never clocked how bad it actually is, looks like this: open the file in Chrome → start a screen recording → wait for one full animation cycle → stop recording → open Canva → trim to exactly one loop → export as GIF → upload. That's six steps and fifteen minutes of pure friction, every single time.

So I built Pixdom.

Pixdom is a CLI tool and MCP server that converts any HTML — whether Claude-generated, hand-written, or fetched from a live URL — into platform-ready images and animated assets. One command. No screenshots. No manual resizing. No format hunting.

pixdom convert --html "<h1>Hello</h1>" --profile linkedin-post --output launch.jpg

That's it. Done.

Or if you have an animated HTML file:

# Auto mode — Pixdom detects the element, duration, and FPS from the page itself
pixdom convert --file hero-animation.html --format gif --auto --output ./hero.gif

Before rendering, --auto prints a summary so you know exactly what it found:

Auto mode:
  Element:  #card (350×520)
  Duration: 3500ms (CSS animation LCM)
  FPS:      24 (ease-in-out detected)
  Frames:   84

No guessing. No --duration 3500 flags you have to figure out yourself. The tool reads the CSS animation cycle and picks the right frame rate.

But Here's the Part Nobody Talks About: How I Actually Built It

Most project writeups skip the unsexy part — the workflow, the tooling, the decisions made at 11pm when nothing is working. This isn't that kind of writeup.

I want to talk about the actual development stack I assembled to build Pixdom using Claude Code as the primary engineer. Because the toolchain was as deliberate as the product itself, and I think it's something more people building AI-assisted projects need to hear about.

Here's what I used and — more importantly — why.

The Toolchain: Five Tools, One Coherent System

1. Claude Code — The "Engineer" on the Team

Claude Code was the primary implementation engine for Pixdom. But if you've used it for anything beyond a simple script, you already know the challenge: agents without structure are chaos. Give Claude Code a loose prompt and a big codebase, and it'll drift. It'll change things you didn't ask it to change. It'll forget decisions made three sessions ago.

That's not a bug in Claude Code — it's a constraint of how LLMs work. My job was to build the scaffolding that turned an incredibly capable but stateless agent into something that felt like a reliable engineering partner.

Everything else in this toolchain exists to solve that one problem.

2. OpenSpec — The Spec Layer That Kept Everything Sane

The first thing I did before writing a single line of product code was set up OpenSpec.

I evaluated two options: OpenSpec and SpecKit. Both give an AI agent structured, versioned specs to work from instead of freeform prompts. I chose OpenSpec because it's Node.js-native (zero friction with my pnpm monorepo), designed brownfield-first, and — crucially — generates native Claude Code slash commands directly into .claude/commands/. Claude Code understands the spec system natively. No translation layer needed.

The workflow it enables:

/opsx:propose "Add platform profile presets for LinkedIn, Twitter, Instagram"

That starts a structured spec proposal. OpenSpec creates a living document describing what to build, why, and the acceptance criteria. Claude Code reads that before touching any code. The agent isn't guessing. It has a brief.

When implementation is done: /opsx:apply. When everything passes: /opsx:archive. The change moves to the archive folder. The spec history is your audit trail.

Over the course of building Pixdom, I ran 13 change cycles this way — from initial type definitions through a full security audit. Every feature was spec'd, implemented, and archived.

That's what OpenSpec gave me: engineering memory for a stateless agent.

3. markdownlint-cli2 — The Agent's Proofreader

OpenSpec generates markdown. Claude Code writes to markdown files. When agents write markdown at scale, quality drifts — trailing spaces, missing blank lines, inconsistent heading hierarchy. These feel minor until they cause a parsing error at 1am.

I wired markdownlint-cli2 as a Stop hook — it runs automatically at the end of every Claude Code session on all spec files. Zero extra steps. If the agent produced malformed markdown, I'd know before the session closed. Small thing. Saved me from a genuinely annoying class of bugs.

4. rtk — Token Budget as a First-Class Concern

Here's something most AI-assisted development guides don't mention: token consumption is a real engineering variable, not just a billing concern.

When Claude Code runs on a large codebase, it can consume enormous context windows — long git diffs, full file reads, repeated context reloading. Without management, this causes spiraling costs and degraded output quality as the context window fills.

rtk (Rust Token Killer) solves this. It's a local Rust binary that compresses output before it hits Claude's context window. When I run rtk git diff instead of git diff, the diff goes through rtk's compression pipeline first — preserving semantic meaning while dramatically reducing token footprint.

And critically: rtk is 100% local. No external server. No account. No telemetry. Your code never leaves your machine.

5. agentdiff — Verifying What the Agent Actually Did

This one I built as a custom Claude Code slash command, and it might be the most underrated piece of the whole stack.

The problem: you hand Claude Code a spec and it implements 8 tasks. But how do you actually verify it did what you asked and only what you asked?

/agentdiff runs a token-compressed git diff, passes it to Claude with the original spec tasks, and gets back a structured report:

✅ Tasks implemented as specced
⚠️ Changes not covered by the spec (untracked drift)
❌ Tasks in the spec that don't appear in the diff (missed work)

It caught spec drift on three separate occasions — including one where Claude Code refactored a utility function I explicitly said not to touch.

agentdiff is accountability for a coder who can't be held accountable through normal means.

6. Git — The Unglamorous Foundation

Everything above only works because git is the source of truth beneath it. Every OpenSpec change ends in a commit. agentdiff compares against HEAD. The archive maps to git history.

My commit rhythm: one commit per /opsx:apply. No squashing. The history tells the story of what was built and why.

What Pixdom Actually Does (The Full Picture)

Since I'm writing the README and the post at the same time, here's the honest feature breakdown — not marketing copy, just what works.

Platform profiles are the feature I use most. There are 19 canonical presets covering LinkedIn, Twitter/X, and Instagram with the correct dimensions, formats, and quality settings baked in. Instead of remembering that a LinkedIn post should be 1200×1200 JPEG at quality 90, you just write --profile linkedin-post and move on.

# LinkedIn post from a live URL
pixdom convert \
  --url https://your-portfolio.com/project \
  --profile linkedin-post \
  --output ./linkedin.jpeg

# Twitter header
pixdom convert \
  --url https://myapp.com \
  --profile twitter-header \
  --output ./header.jpeg

Element-level capture is the other one I use constantly. If you have a dashboard HTML file and you only want the chart, you don't have to crop anything:

pixdom convert --file dashboard.html --selector "#chart" --format png --output ./chart.png

The MCP integration is what makes it genuinely useful inside a Claude Code workflow. After a one-time install:

pixdom mcp --install

You can hand it off entirely from inside a session:

Use pixdom to convert https://myapp.com to a linkedin-post JPEG. Save to ~/assets/linkedin.jpg.

Or with HTML generation — Pixdom's generate_and_convert tool asks Claude to write the HTML first, then renders it. One tool call. The loop closes entirely inside the terminal.

The MCP server ships with real security defaults that I didn't have at the start and had to add in the security audit: output is sandboxed to ~/pixdom-output/, file inputs are restricted to an allowlist of directories, SSRF protection is on by default, and API keys go into the OS keychain (macOS Keychain, Linux Secret Service, Windows Credential Locker) before falling back to plaintext with a warning. I'll get to why that security audit existed in a moment.

The Security Pass That Changed Everything

At spec #13, I ran a full security audit on the codebase. Four CVEs came back — all caught and patched before the public release.

The critical one (CWE-22) was a path traversal vulnerability in the MCP server: there was no sandboxing on where it could write files. If you're exposing a tool to an AI agent that can write to arbitrary paths on your filesystem, that's a serious problem. The fix was the sandboxed output directory.

The high severity one (CWE-312) was the API key being stored in plaintext in ~/.claude.json — hence the OS keychain migration.

There were two more: a medium severity issue where MCP file inputs could read from anywhere on the filesystem (fixed by the allowlist), and a low severity listener leak where signal handlers were being registered per-render, causing MaxListenersExceededWarning in long sessions.

The audit also caught a zero-day CVE in Playwright itself — CVE-2025-59288 — that I patched by pinning to ≥v1.55.1. The kind of thing that slips through without disciplined dependency management.

The spec-driven process made remediation clean: 29 tasks, zero untracked changes, verified by agentdiff. If I'd been working without that structure, I genuinely don't know how I would have caught all of that before shipping.

Lessons That Actually Apply to Your Project

Specs are not bureaucracy — they're precision. Every time I skipped writing a spec and just asked Claude Code to "implement X", the output required more cleanup than if I'd written the spec first. The 20 minutes writing the spec saves 2 hours of diff-reading.

Local-first is a principle, not a preference. Every tool in your AI development stack that touches your code is a potential leak surface. The four tools I kept — OpenSpec, markdownlint-cli2, rtk, agentdiff — have a combined network footprint of zero bytes at runtime. That was the standard I held everything to. Understand exactly where your data goes before you integrate a tool.

Token budgets are architectural decisions. rtk isn't a cost-cutting measure — it's what lets Claude Code operate at full quality on a 5-package monorepo without hitting context limits. Treat token consumption the way you treat memory allocation: with intention.

Commit discipline is what makes agent work auditable. An AI agent without frequent commits is a black box. With commits, it's a partner. One commit per spec cycle, no squashing, and the history tells the story.

The gap after generation is where the value lives. Claude generates HTML. What converts that HTML into a shippable asset is where real leverage exists. Pixdom is that gap for me. Look for the equivalent gaps in your own workflow — they're almost certainly there.

What's Next

Pixdom is live on npm right now — npmjs.com/package/pixdom — npm install -g pixdom and you're running.

npm install -g pixdom
pixdom --version

The CLI is shipped. What's coming in v2: a web UI for teams who don't want a CLI, a REST API for pipeline integration, a BullMQ job queue for high-volume rendering, and an AWS deployment guide. The CLI is the foundation. The service layer is next.

If you try it, I'd love to hear what you run into. And if any of the v2 roadmap items are useful to you now, open an issue on GitHub and say so — prioritization follows actual interest.

What's the hardest part of your AI-assisted development workflow right now? Drop it in the comments — I'm betting a lot of us are solving the same problems in isolation.

Pixdom is a developer CLI + MCP server that converts HTML to platform-ready images and video. Built with Claude Code, OpenSpec, rtk, agentdiff, and markdownlint-cli2. Source: github.com/sushilkulkarni1389/pixdom

Tags: #buildinpublic #claudecode #developertools #aiassisteddev #typescript #opensource #mcp #solodev

Top comments (1)

Sushil Kulkarni • Mar 25

The cover image was created using pixdom -

pixdom convert --file pixdom-cover.html --format png --width 1000 --height 420 --output pixdom-cover-dev-to.png
✔ Loading page
✔ Capturing screenshot
✔ Writing output
✔ Done in 1.3s → /home/smk/Downloads/pixdom-cover-dev-to.png