Diven Rastdus

Posted on Apr 23 • Originally published at astraedus.dev

I ran an AI QA agent on my app before talking to a single user. It found 11 issues, 4 were blockers.

#devops #testing #ai #claude

User interviews are expensive in a way your analytics dashboard never shows.

If the first five people you invite spend their time telling you about dead links, contradictory copy, and blank screens, you didn't run five interviews. You ran five unpaid QA sessions.

That was the risk I was staring at.

Arc is a diary app built around an AI that reads your writing over time and reflects back patterns you can't see yourself. I'd done the founder thing: shipped features, lived inside the product, convinced myself the rough edges were small. But founder eyes are cooked. Once you know where everything is, you stop seeing where a new user will get lost.

So before I talked to anyone, I ran a frontend QA agent against the live product with one question: would a new user survive the first five minutes?

The setup

Not a code review. I didn't want lint. I wanted first-contact truth.

I pointed a QA agent at the live landing page and web app, gave it a thin test account, and told it to walk the product like a new user: land on the site, sign in, try to write, hit the Mirror, hit the graph, try the keyboard shortcuts, and tell me where friction shows up first.

The result:

31 screenshots
11 ranked issues
4 blockers that had to be fixed before interviews
a same-day delta pass that came back INTERVIEW-READY

The interesting part wasn't that the agent found bugs. Of course it found bugs. The interesting part was what kind.

It didn't tell me "this React component is messy." It told me "your first 30 seconds are lying about what the product is."

What it found

1. The landing page and the app were selling two different products

The landing page said Arc was "An AI that reads your whole story and shows you who you're becoming."

The signed-out app said "Your Arc Journal, on any browser. Read every note, write new ones, export the lot."

That's not a copy inconsistency. That's a positioning break.

A first-time visitor clicking from the landing page into the app wasn't meeting The Mirror. They were meeting what sounded like a generic file viewer.

This is exactly the thing a founder stops seeing because both versions sound reasonable in isolation. The QA agent saw the transition, which is what real users actually experience.

2. Four core routes were dead

The QA brief told the agent to try Mirror, Constellation, River of Time, Compose, Focus Mode, and Cmd+K.

Four of those routes returned 404s: /app/river, /app/compose, /app/focus, and /app/insights.

That matters more than it sounds. Early users guess URLs. They click stale nav items. They paste links from memory. A dead route tells them the product is abandoned.

The agent didn't just say "some routes are broken." It gave exact paths, exact repro steps, exact screenshots. That turned the fix list into a shipping list.

3. Interview analytics were silently dead

This one wasn't visual, but it was probably the highest-value catch.

PostHog was firing bad requests on every page load: config.js returning 404, /flags returning 401. I was about to run user interviews with broken telemetry.

If you care about learning velocity, that's brutal. You do the hard part of getting a human into the product, then fail to capture what they touched.

In the delta pass, the check got sharper: 197 network requests across both sites, zero PostHog failures, Vercel Analytics as the only telemetry left firing.

That's the difference between "I think the analytics bug is gone" and "the live site is clean."

4. Empty states made the product look dead

The test account was deliberately below the 10-entry threshold that makes Arc's graph and reflection surfaces interesting.

That was the right setup, because the agent found what an early user would actually see: a sparse graph with almost no visible structure, and a Mirror tail that felt like nothing was happening.

For a product whose promise is "your inner world, mapped in real time," that empty state is poisonous. Users don't infer the future product you're building. They judge the screen in front of them.

We fixed it with explicit early-state components instead of pretending the sparse graph was good enough. The graph now says the constellation is still forming. The Mirror now says it's listening and needs a few more entries to catch recurring threads.

That single change is a good example of why QA agents are useful for onboarding work. They're ruthless about the emotional read of a screen. Users won't say "your threshold logic needs a better intermediate state." They'll say "I opened the graph and it looked empty."

5. The landing page had a proof gap right above pricing

The agent also caught something I'd mentally filed under "design polish" but was really a trust problem.

Midway down the landing page, the "How it works" section had blank phone frames. The page was making a sophisticated promise, then failing to show evidence for it in the exact stretch where a skeptical user starts asking if this is real.

That one didn't block interviews the way the route failures did. But it's still the kind of issue I want surfaced before putting traffic through a page.

What it didn't catch

This matters.

The QA agent was excellent at first-contact friction. Dead routes, contradictory copy, quiet failures, empty-state reads.

It couldn't tell me whether the writing experience would make someone want to come back for 30 days. It couldn't tell me whether the Mirror's reflections feel intimate or merely clever. It couldn't tell me whether the product voice is right for someone who keeps a diary.

That still takes real users.

The point isn't to replace interviews. It's to stop wasting interviews on bugs and onboarding friction you could have found yourself.

Interview readiness changed in one afternoon

The first report's verdict: two focused hours of fixes, then go.

That was the right call.

The four blocker fixes shipped that afternoon. Then the QA agent ran a delta verification pass against the live site. The second verdict came back INTERVIEW-READY.

That pass confirmed four things:

the signed-out hero now matched the Mirror framing
the four dead routes redirected to live pages
the broken PostHog traffic was gone
the early-state graph and Mirror screens now explained themselves

That sequence is the whole pattern.

Don't run a QA agent so you can admire the report. Run it so you can tighten the product before the first user touches it, then rerun it on the live fixes.

The prompt template

This is the exact structure I used, with private details swapped for placeholders. Works against any deployed Next.js app or web product.

Context: <your product> is my long-term bet. Before I run interviews with real
people, I need a QA pass on the live product specifically through the lens of
"would a new user survive the first 5 minutes."

Your target: <YOUR_APP_URL>
Test account (if needed): <YOUR_TEST_EMAIL> / <YOUR_TEST_PASSWORD>
Landing page: <YOUR_LANDING_PAGE_URL>

Walk through as a first-time user would:

1. Land on the landing page. Does it tell me what the product is in under 10
   seconds? Would I sign up?
2. Sign in. Walk through onboarding. Where is the first friction?
3. Try to complete the core action for the first time. Does the UI invite me
   in, or feel empty?
4. Navigate the core routes: <LIST YOUR CORE ROUTES>. Does any feel broken
   or empty-state-bad?
5. Check the main affordances and shortcuts: <LIST THEM>.
6. Does anything crash, error-toast, or quietly fail?

Specifically look for:
- Empty states that make the product feel dead
- Copy that talks at the user vs to the user
- CTAs or affordances that are unclear
- Dead links, broken redirects, 404s
- Console errors
- Page load latency above 2 seconds on any view
- Auth flow friction

Output: structured pass/fail report. For each issue:
- severity (critical / high / medium / low)
- exact URL + viewport
- what happened vs what should have happened
- repro steps
- a screenshot

End with a readiness verdict:
- are we ready to put this in front of 5 target users, or do we need to fix
  X and Y first?
- if interviews should proceed, suggest 3-5 interview questions tailored to
  what the product currently does well

Report under 1500 words. Facts + screenshots over prose.

I build production AI systems for founders and engineering teams. astraedus.dev

DEV Community