Sushil Kulkarni

Posted on Mar 23

I got tired of manually converting HTML to GIFs, so I built an open-source CLI to do it instantly

#automation #cli #opensource #showdev

Converting HTML into an animated GIF or a perfectly sized social media image used to break my flow every single time.

The process was always the same: record my screen, drag the video into something like Canva, manually trim the timeline, export, realize the dimensions were wrong for the platform, repeat. Slow, manual, and completely disconnected from how I actually work.

What I wanted was something I could fire from my terminal — or hand off entirely to an AI coding agent — and just get the asset back. No GUI. No context-switching. No ceremony.

So I built Pixdom.

What is Pixdom?

Pixdom is a developer CLI tool and an MCP (Model Context Protocol) server. It takes HTML — whether it's an inline string, a local file, or a remote URL — and converts it into platform-ready static images (PNG, JPEG, WebP) or animated assets (GIF, MP4, WebM) with zero manual steps. It also accepts existing images directly via --image, running them through Sharp without spinning up a browser at all.

Under the hood, it runs a Playwright + Sharp + FFmpeg pipeline. I built it specifically for solo developers and AI-assisted workflows where you want rendering to be a step in the process, not an interruption to it.

npm install -g pixdom

One command. Both the pixdom CLI and the pixdom-mcp binary are installed and ready to go.

The features I'm most proud of

1. Smart Auto Mode

The thing I hated most was manually calculating duration and framerate for a CSS animation. Guess too low and the GIF cuts off. Guess too high and you get a 40MB file. The --auto flag removes that entirely.

It scores DOM elements to detect the main content, calculates the animation cycle length, and even parses CSS easing functions (ease-in-out vs linear) to set an appropriate FPS automatically.

# Before: manual flags, constant guesswork
pixdom convert --file page.html --selector "#card" --duration 3500 --fps 24 --output out.gif

# After: let the tool figure it out
pixdom convert --file page.html --format gif --auto --output out.gif

Before it renders, --auto prints exactly what it decided and why:

Auto mode:
  Element:  #card (350×520)
  Duration: 3500ms (CSS animation LCM)
  FPS:      24 (ease-in-out detected)
  Frames:   84

You can see the reasoning, override any value if you disagree, or just let it run.

2. 19 platform profile presets

I got tired of Googling "LinkedIn post dimensions 2024" every few weeks. So I baked 19 canonical platform profiles directly into the tool.

pixdom convert --file page.html --profile linkedin-background --output banner.png
pixdom convert --file page.html --profile twitter-video --format mp4 --output promo.mp4

Need a LinkedIn carousel background? --profile linkedin-background. Twitter video? --profile twitter-video. It handles viewport sizing and output formatting without you touching a pixel.

3. Full MCP server integration — built for AI agents

This is the part I'm most excited about, and honestly the reason the project exists in its current form.

Pixdom ships with a built-in MCP server that connects directly to Claude Code. Running pixdom mcp --install automatically writes the server config to ~/.claude.json. No manual JSON editing.

Once connected, you have two tools available to your AI:

convert_html_to_asset — takes HTML and renders it locally using Playwright
generate_and_convert — calls Claude to write the HTML first, then renders it

So you can literally prompt your agent:

"Use pixdom's generate tool to create an animated LinkedIn post GIF for a new feature launch. Profile: linkedin-post, format: gif, auto: true. Save to ~/out.gif."

The AI writes the HTML and renders the final animated asset. End to end, no manual steps.

4. Security hardening

Since this tool renders arbitrary HTML and remote URLs, I treated security as a first-class concern, not an afterthought. I ran a 60-point security review before even thinking about publishing.

What that looks like in practice:

SSRF protection: blocks file:// schemes, private network ranges, and cloud metadata IPs
Chromium runs in a sandboxed mode by default
MCP file inputs and outputs are restricted to specific sandboxed directories (~/pixdom-output/)

If you're building tools that render untrusted content, I'm happy to write up what I found and how I addressed it — drop a comment if that's useful.

How it's structured (the architecture)

I built this as a pnpm workspace monorepo. Each package has a narrow responsibility:

Package	What it does
`@pixdom/core`	Playwright + Sharp + FFmpeg rendering pipeline
`@pixdom/detector`	CSS animation cycle detection, auto-mode logic
`@pixdom/profiles`	Platform profile registry and resolution
`@pixdom/types`	Zod schemas, shared types, error codes
`apps/cli`	Commander.js CLI, progress reporting, shell autocomplete
`apps/mcp-server`	MCP tools for Claude Code integration

The CLI is built with Commander.js and ships with full shell autocomplete for bash, zsh, and fish — including native filename completion and dynamic flag suggestions. Small thing, but it makes the tool feel polished to use daily.

What I got wrong along the way

The first version of auto-detection was embarrassingly naive. I was just taking the longest animation-duration value I could find in the stylesheet and calling that the cycle length. It worked maybe 60% of the time.

The real fix was building a scoring model that weighs animation complexity, element hierarchy, and easing type together. It took a few iterations to get right, and there are still edge cases I'm working through — particularly with scroll-triggered animations that don't run on load.

I'm also still not happy with how I'm handling very large HTML files with external dependencies. The current approach works but it's not elegant. That's on the roadmap.

What's coming in v2

v1 is out. Here's what I'm focused on next:

Broader AI tool support — pixdom mcp --install --tool all to configure for Gemini, Codex, and Cursor in one command
HTML input validation — content sniffing to warn before wasting render time on a non-HTML file
Web UI + REST API — for teams that want a shared rendering service rather than per-developer installs
BullMQ job queue — for scalable cloud deployments where you're processing assets at volume

One last thing worth saying

Pixdom was designed, written, and debugged with Claude Code — which is also what created the problem it solves. Claude generated the animated HTML. Claude helped build the tool to render it. There's a certain loop in there that felt worth acknowledging.

Try it now

Pixdom is live on npm and the repo is public.

npm install -g pixdom
pixdom --help

npm → npmjs.com/package/pixdom
GitHub → github.com/sushilkulkarni1389/pixdom

If this is solving a problem you've run into — CLI rendering, AI agent workflows, automated social asset generation — install it and tell me where it breaks. Bug reports, edge cases, and honest feedback are more useful to me right now than anything else.

If you find it useful, a ⭐ on the repo goes a long way. It's what gets the project in front of other developers who'd actually use it.

A few things I'd genuinely like your take on in the comments:

Does the MCP integration approach make sense to you, or is there a simpler interface you'd want?
Are there platform profiles missing from the 19 that you'd use regularly?
Is the auto-detection concept useful, or would you rather just control duration manually?

I read every comment and respond to every issue. If something doesn't work, open one.

Top comments (2)

renuka deshmukh • Mar 23

MCP is really a cool feature

Sushil Kulkarni • Mar 23

Right? It's the piece I was most excited to ship. Once Claude can call generate_and_convert directly, the whole pipeline just disappears. No switching context, no manual steps. Give it a try and let me know what you build!