The 7 things your indie-hacker AI agent product needs before you open the waitlist
If you spent the last 90 days building an AI agent product as a solo founder, you have a working demo, a Stripe test mode, a Gumroad listing, and a Twitter thread. The thing you don't have is a production-readiness checklist written for you — every other checklist on the internet assumes you have a platform team, a Datadog budget, and an SRE on call. You have a MacBook, a credit card, and 18 hours a week.
This is that checklist. It is the 7 things I check in 90 minutes on a $149 production-readiness review of an indie agent product, condensed.
I am going to skip the generic "use Langfuse" advice. If you have not instrumented anything, the list below is what to add first, in order, with the cheapest tool for each.
Why this list is different
Three things distinguish the indie-builder agent failure mode from the enterprise one:
- You ship Friday night. The customer who finds the bug is the one who paid you $29.
- You do not have a runbook. The agent does the wrong thing once, you read 800 lines of stack trace at 11pm.
- You do not have a refund automation. A bad week-1 cohort can bury your App Store / Product Hunt / Indie Hackers reputation for months.
The enterprise checklist optimizes for "detect the failure in under 5 minutes." The indie checklist optimizes for "do not wake up to a Twitter shitstorm on day 6."
The 7 pre-launch checks (90 minutes total)
1. Idempotency on every side-effecting tool (15 min)
If your agent sends an email, charges a card, creates a file, or writes to a database, the same input must produce the same output every time it is called — including after a retry, a timeout, or a manual re-run.
The cheapest check: search your code for the function names of your side-effecting tools (send_email, charge, create_*, update_*). For each one, ask: "if I call this twice with the same args, what happens?" If the answer is "I send the email twice," you have a 2 AM incident in your future.
The fix: add an idempotency key. The key is usually a hash of (user_id, intent, day_bucket). You check the key against a small Redis / SQLite table before executing. Rejected duplicates return the cached result.
I have written about this more in Why Your AI Agent Sent That Email Twice if you want the deeper read.
2. Per-session token cap (10 min)
Set a hard ceiling on tokens consumed per session. A solo builder running GPT-4-class models on a $29/month plan can be ruined by a single user who triggers a 50-step agent loop.
The cheapest check: find the place where you assemble the conversation history before each LLM call. Is there a max_tokens parameter on the API call? Is there a max_messages or max_steps parameter on your agent loop? If either is missing, you do not have a cap — you have a prayer.
The fix: a single MAX_TOKENS_PER_SESSION = 50_000 constant near your agent entry point, and a MAX_AGENT_STEPS = 12 constant. Both raise a BudgetExceeded exception that you catch and return a friendly error to the user.
The deeper read on the cost-explosion shape is in Your AI Agent Bill Is Probably 10x-700x Higher Than It Needs To Be. 88% of indie agents in 2026 fail not because the model is bad, but because the bill kills the runway.
3. Three log lines per side effect (15 min)
Every time the agent sends an email / charges a card / writes a file, it must log three lines:
[intent] what the user asked for
[post-verify] what the world looks like AFTER the side effect
[outcome-assert] what you would check later to know it worked
Not three lines of structured JSON. Three grep-able log lines. You will read these at 2 AM from tail -f, not from a Grafana dashboard. The shape is documented in Your AI Agent Returns 200 and Is Wrong: The Silent-Success Drift Pattern. The summary: the dangerous agent failure is not the crash, it is the success that quietly does the wrong thing.
4. Manual kill switch (10 min)
You need a way to turn the agent off in under 60 seconds without a redeploy. The cheapest version is a feature flag in a JSON file on S3 / a Redis key / a Stripe subscription webhook. The point is: a customer DMs you at 6 PM saying "your agent just sent my entire customer list a marketing email," and you have 60 seconds to stop it.
A real production agent product has a status page. A solo-builder agent product has a IS_AGENT_ENABLED constant you can flip from your phone.
5. The 3 test inputs that always run before you ship (15 min)
Every indie agent product has 3 inputs that, if they break, break the whole product. They are different for every agent, but they always exist. Find them. Write them down. Run them before every deploy.
For a customer-support agent: (a) a refund request, (b) a request that should be escalated to a human, (c) a request that requires a tool the agent does not have.
For a research agent: (a) a single-source question, (b) a multi-source question, (c) a question with no good answer.
For a coding agent: (a) a one-line change, (b) a multi-file refactor, (c) a request that needs human judgment.
Put these in a file called PRE_PROD_SMOKE.md. Run them. Every. Single. Time.
6. Rate limit per user, not per IP (15 min)
A single power user will burn your API budget. If you rate-limit by IP, that user gets a VPN and burns it again. Rate limit by user_id (or api_key) and by cost (tokens spent), not by requests (request count). One long agent loop = one "request" but $4 of API cost. You need a budget-shaped limit.
The cheapest check: do you have a rate limit at all? If you do, is it per-user or per-IP? If you do not, you are two weeks from a $4,000 OpenAI bill you cannot pay.
7. The "I have been rate-limited" page (10 min)
When the rate limit fires, what does the user see? If the answer is "a 500 error from the OpenAI library," you are leaking platform internals to your customers. If the answer is "an empty page," you are losing them forever.
The cheapest version: a static HTML page at /rate-limited that says "you are doing this too fast, here's a 60-second countdown, here's what you can do in the meantime." Five minutes to write. Saves you from "the app just stopped working for me" tweets.
The week-1 check-in (5 things to look at on day 6)
You opened the waitlist. 40 people signed up. 12 of them ran the agent more than 3 times. 2 of them asked for a refund. Here is what to look at:
- The cost-per-user distribution. Is the median user costing you $0.05 and the 90th percentile costing you $4? If the tail is fat, you have a power-user problem and a pricing problem.
-
The "completed but wrong" rate. For 10 random completed sessions, read the
[outcome-assert]log line and verify it matches what the user got. If 3 out of 10 are wrong, you have a silent-success drift problem. - The "tool call failure" rate. For 10 random sessions, count the tool calls that returned an error. If the agent is papering over tool errors with hallucinated results, you have a state-graph invention problem.
- The "I do not know" rate. How often does the agent say "I do not know" or escalate to a human? If it is below 2%, the agent is probably hallucinating. If it is above 30%, the agent is useless.
- The "first-session success" rate. Of the 40 signups, how many had a successful first session? If it is below 60%, the onboarding is broken. If it is above 90%, the agent is probably too conservative.
The deeper read on the 3-layer observability model is in What Your AI Agent's Tool Calls Actually Look Like in Production. You need to see all 3 layers — the LLM call envelope, the tool attempt, and the side-effect verification — to debug anything in production.
The 3 misconfig patterns I keep seeing
These are the three things that look fine in development and burn you in production:
A. Retry-on-timeout without idempotency key
You added a retry decorator to your LLM call. The LLM call timed out. The retry succeeded. But the tool call inside the LLM call (the one that charged the card / sent the email) was the part that timed out — the retry re-ran the tool call, the customer got charged twice. This is the most common week-1 incident.
B. Streaming response with side effects before the stream completes
You stream the agent's response to the user. The stream is "Sure, I'll send that email to your customer list right now — sending now — done." But the done happens at the end of the stream. If the user closes the browser at "sending now," the email was already sent but they did not see the confirmation. You have a customer who thinks the email was not sent and an email that was. This is a chargeback waiting to happen.
C. Test mode is not actually test mode
Your Stripe is in test mode. Your agent code calls stripe.Charge.create in test mode. But your agent also calls sendgrid.send in production mode, and sendgrid.send is what fails. The 500 error you see in your logs is the Stripe test call. The actual production failure is the SendGrid call. You debug the wrong system for 6 hours.
What to do if you find a problem
If you run this list and find 2-3 things you do not have, you are in the same shape as 90% of indie agent builders in 2026. The fix is not "buy Langfuse." The fix is a 90-minute human read of your code, your logs, and your 3 most common user flows — exactly the shape of a production-readiness review.
The point of this article is not "you need a consultant." The point is "here is the checklist, here is the order, here are the 7 things that are actually load-bearing for an indie agent product." If you can run this list yourself and ship all 7, you are ahead of most teams with 5 engineers and a $50k Datadog bill.
If you cannot — if you find that you do not have time, or you are not sure which of the 7 you actually have, or you read the week-1 check-in section and realized you do not have the data to answer any of the 5 questions — that is exactly the moment a 90-minute read is cheaper than a week of debugging. The link at the top is the 90-minute read.
Good shipping.
Sources
- Master of Code, "45% of AI-Generated Code Ships With a Security Flaw," May 2026
- DigitalApplied, "88% of agent failure rate, $340K avg direct cost," 2026
- Wiz, "20% of vibe-coded apps in production have serious vulnerabilities," May 2026
- Predict/Medium, "5 mechanisms of LLM cost explosion, 717x worst case," May 2026
- Tom's Hardware, "Per-token prices fell 2 years straight, per-task cost is the only unit that moved," May 2026
- Datadog, "5% LLM call spans error, 60% caused by rate-limit exceedance," Feb 2026
- Oso, "Prompt injection is not the real risk; over-privileged actions are," 2026
Top comments (0)