Mike Solomon

Posted on Dec 16, 2025

Autodogfooding Autodock on AWS

#devops #tooling #aws #showdev

Yesterday, Autodock's environment provisioning time went down from ~30 seconds to 3 seconds 🎊

This was a messy feature, involving cron jobs, webhooks, and various third-party services. Autodock runs on EC2, and the basic idea is to use EC2 ASG lifecycle hooks to customize servers as they transition between states so that they're ready for users when env.launch is called.

On top of that, I wanted to preview the change before merging it main. Sounds like a job for... Autodock!

I booted up an Autodock dev server, synced my code, got Autodock up and running, and Claude used existing webhook conventions in the repo as a model for a new webhook to receive AWS lifecycle events.

Autodock's example gallery of thousands of real deployments to staging has a Doppler section, and it was able to get Claude to provision a temporary token for the Autodock box.

It was then able to create the lifecycle endpoint for EC2 ASG and register it with AWS using Autodock's auto-exposed ports.

Then, Claude ran a small PoC against AWS using Autodock (deployed on Autodock) in order to generate a comprehensive set of fixtures about lifecycle events. AWS Documentation isn't great here, so the agent needed to observe AWS and construct the webhook iteratively, locking in knowledge with tests.

Claude got a bit too excited when the whole system finally worked, but hey, who can blame it?

It then tore down the PoC, used the fixtures to write a bunch of tests, and the PR was ready and bug-free.

Now, if you launch an env using Autodock, it'll take 3 seconds!

Autodock is now faster (and cooler) than GitHub Codespaces. Using its MCP to drive the full dev lifecycle, tests and all, took less than a half-hour and was mostly one-shotted by Opus 4.5.

I know this article sounds a bit market-y but, what can I say, I believe in the project. In addition to building Autodock as a product, I'm a power-user on numerous production repos, including Autodock itself. Give it a shot!

Top comments (5)

Daniel Nwaneri • Dec 16 '25

Using Autodock to build Autodock is the ultimate dogfooding test. If your
dev tool can't deploy itself, something's off.

The 30s → 3s improvement is huge. That's the difference between "wait for
environment" and "environment is just there."

Love that Claude got "a bit too excited" when it finally worked. Honestly,
same energy when my MCP deployments finally click after debugging CORS for
the third time.

The AWS lifecycle hooks iterative construction is interesting - agent
observes AWS, builds fixtures, locks in knowledge with tests. That's a
pattern I've been using for edge MCP servers too: let the agent explore
the API surface, capture what works, codify it.

One thing I'm curious about: with Opus 4.5 one-shotting most of this, how
much of the success is "Autodock's MCP abstractions" vs "Opus 4.5 being
really good"? Like, would this work with Sonnet or Haiku, or is the heavy
model critical?

Asking because I'm doing similar agent-driven deployments on Workers, and
I'm trying to figure out where the intelligence needs to live (protocol
design vs model capability vs explicit guardrails).

Great work. The "faster than GitHub Codespaces" line is a bold claim but
you're backing it up with real usage.

Mike Solomon • Dec 16 '25

The heavy model is definitely not critical. I've tested with Sonnet (not Haiku) and get similar results.

Experience dictates that the more important thing is RAG, which is what I've spent the most time on.

Once a week(ish) , I use GPT to analyze a corpus of complex production deployments and I have it note bullet points about failures. This knowledge base is what's used to construct the MCP responses.

In that way, you can think of Autodock's knowledge base like context7, but for deployment failures. Luckily I've f'ed up deploying countless stacks, so it has good raw material :-D

For example, a classic one is CORS problems on POST requests. Even Opus 4.5 struggles to detect this, but if you tell a simple model where in the deployment process to keep an eye out for it, it will do so remarkably well.

Re Cloudflare, one thing to experiment with if you're not already is their AI tools. I've heard they're excellent, and they probably have built the product around solid assumptions about where agentic knowledge should live.

Thanks for your insightful comments! If you're able to share what you're working on, I'd love to take a look. And if you wind up trying Autodock and have feedback, I'd love to hear it 🙏

Daniel Nwaneri • Dec 16 '25

Mike. the RAG approach makes so much sense. That's what I've been missing.

I've been building MCP servers with decent protocol design but zero systematic capture of failures. Your weekly GPT analysis → bullet points → knowledge base is exactly the missing piece.

The CORS example hits home because I see this constantly:

Local: works fine
Deploy to Workers: POST fails
Agent fixes CORS headers
Still fails (checking Origin on OPTIONS, not POST)
Wastes 3-4 iterations before getting it

Your "tell the model WHERE to look" vs hoping it figures it out is way smarter.

I'm deep in Cloudflare already - Workers AI, Vectorize, D1. Running production MCP on FPL Hub (500K+ queries/day, $8/mo). The edge deployment model works great for API-first stuff but hits walls with anything needing a full Linux environment.

Been wrestling with this constraint. Autodock can run complex multi-service setups. Workers can't. But Workers deploy globally in seconds and cost almost nothing.

Feels like the answer is "both for different stages" - like developing on Mac, deploying to Linux. Autodock for dev/staging where you need full environment. Workers for production where edge performance and cost matter.

Quick questions:

Is your RAG knowledge base per-repo or universal?
What's your prompt structure for the weekly failure analysis?
Ever seen dev box → production drift issues?

Planning an article comparing dev boxes vs edge for MCP. Would love to cite your work (and get your review if you're up for it).

Also I'm going to try Autodock on my repos this week. Curious if my "designed for Workers" code is actually portable or if I've baked in assumptions.

This conversation is reshaping how I'm thinking about failure capture. Really appreciate the depth here 🙏

Mike Solomon • Dec 16 '25

That sounds great! Please respond with the link to the article when it's live!

My RAG knowledge base is per-repo, and it only builds upon AUTODOCK.md if that already exists, so it's super isolated.
It's really simple and it varies, but I've found that the most important thing is to make the prompt like a two-liner and then give the LLM access to log files. Not the logs, but log files, as it will then do all of the grepping based on common failure modes.
All the time :) No one has asked me to solve this with Autodock yet, and I haven't felt a need to solve it with Autodock yet as opposed to, say, using doppler. But I'm thinking all the time about how the project can grow to help fix this as well. ATM, though, it's just in the staging/testing space.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.