<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: toshipon</title>
    <description>The latest articles on DEV Community by toshipon (@toshipon).</description>
    <link>https://dev.to/toshipon</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3844722%2F9122caf1-5bfe-4ed1-8170-0f93aab205c2.png</url>
      <title>DEV Community: toshipon</title>
      <link>https://dev.to/toshipon</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/toshipon"/>
    <language>en</language>
    <item>
      <title>How I Built a Full-Stack Security Audit Skill for Claude Code</title>
      <dc:creator>toshipon</dc:creator>
      <pubDate>Sat, 11 Apr 2026 12:46:14 +0000</pubDate>
      <link>https://dev.to/toshipon/how-i-built-a-full-stack-security-audit-skill-for-claude-code-4nkk</link>
      <guid>https://dev.to/toshipon/how-i-built-a-full-stack-security-audit-skill-for-claude-code-4nkk</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;"I want to run a security audit, but every time I have to start from zero."&lt;/p&gt;

&lt;p&gt;That feeling gets old fast when you're building across a full-stack setup like &lt;strong&gt;Vercel + Supabase + Next.js + iOS&lt;/strong&gt;. Each layer comes with its own security concerns, and just remembering what to check can be exhausting.&lt;/p&gt;

&lt;p&gt;OWASP guidelines are comprehensive, but they’re also huge. And some critical settings — especially in Vercel and Supabase dashboards — can’t be fully inspected from the CLI alone.&lt;/p&gt;

&lt;p&gt;So I built a Claude Code &lt;strong&gt;Custom Skill&lt;/strong&gt; called &lt;code&gt;security-audit&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Claude Code Custom Skills let you package reusable procedures and domain knowledge for a specific task. Instead of reconstructing the audit process from memory every time, I can now run a reproducible &lt;strong&gt;6-phase full-stack security review&lt;/strong&gt; from Next.js to iOS with a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/security-audit all
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key difference is that it doesn’t stop at CLI and SQL checks. It also uses &lt;strong&gt;Chrome MCP&lt;/strong&gt; to inspect dashboard-only settings automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Finished Skill Looks Like
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/security-audit              &lt;span class="c"&gt;# Choose target interactively&lt;/span&gt;
/security-audit all          &lt;span class="c"&gt;# Full-stack end-to-end audit (recommended)&lt;/span&gt;
/security-audit nextjs       &lt;span class="c"&gt;# Next.js application only&lt;/span&gt;
/security-audit vercel       &lt;span class="c"&gt;# Vercel infrastructure only&lt;/span&gt;
/security-audit supabase     &lt;span class="c"&gt;# Supabase backend only&lt;/span&gt;
/security-audit ios          &lt;span class="c"&gt;# iOS app only&lt;/span&gt;
/security-audit web          &lt;span class="c"&gt;# Next.js + Vercel + Supabase&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the part I like most: it turns a vague, easy-to-postpone task into something I can actually run on demand.&lt;br&gt;
Instead of thinking, "I should probably do a security review soon," I can just start from a standard entry point and let the process unfold.&lt;/p&gt;

&lt;p&gt;The Skill is structured into six phases:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;Focus&lt;/th&gt;
&lt;th&gt;Main inspection methods&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. Information Gathering&lt;/td&gt;
&lt;td&gt;Project structure, trust boundaries, data flows&lt;/td&gt;
&lt;td&gt;Grep / Glob&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. Next.js Audit&lt;/td&gt;
&lt;td&gt;Server Actions, Middleware, CVEs&lt;/td&gt;
&lt;td&gt;Grep / Bash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3. Vercel Audit&lt;/td&gt;
&lt;td&gt;Env vars, Deployment Protection, WAF&lt;/td&gt;
&lt;td&gt;CLI + Chrome MCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4. Supabase Audit&lt;/td&gt;
&lt;td&gt;RLS, function privileges, Auth settings&lt;/td&gt;
&lt;td&gt;SQL + Chrome MCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5. iOS Audit&lt;/td&gt;
&lt;td&gt;Keychain, ATS, biometrics&lt;/td&gt;
&lt;td&gt;Grep / Bash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6. Cross-Layer Analysis&lt;/td&gt;
&lt;td&gt;Auth flow consistency, token lifecycle&lt;/td&gt;
&lt;td&gt;Cross-cutting review&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Phase Structure
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Phase 1: Information Gathering
    │
    ├── Phase 2: Next.js Application
    │   Server Actions / Middleware / CSP / CVE
    │
    ├── Phase 3: Vercel Infrastructure
    │   Env vars (CLI) / Deployment Protection (Chrome MCP)
    │   / WAF (Chrome MCP) / Git Fork Protection (Chrome MCP)
    │
    ├── Phase 4: Supabase Backend
    │   RLS (SQL) / Function privileges (SQL) / Auth settings (Chrome MCP)
    │   / Security Advisor (Chrome MCP)
    │
    ├── Phase 5: iOS App
    │   Keychain / ATS / Biometrics / Privacy Manifest
    │
    └── Phase 6: Cross-Layer Analysis
        Auth flow consistency / token lifecycle
        / API transport security / continuity of data protection
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  How I Built It in 3 Steps
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Step 1: Research — learn from the best existing Skills
&lt;/h3&gt;

&lt;p&gt;I didn’t start by writing.&lt;br&gt;
I started by studying the best Skills I could find.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;What I learned&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Trail of Bits skills&lt;/strong&gt; (4.5k stars)&lt;/td&gt;
&lt;td&gt;A Skill should stay focused on one responsibility. Reference files should be separated out.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Anthropic best practices&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Keep &lt;code&gt;SKILL.md&lt;/code&gt; under ~500 lines, use progressive disclosure, write in imperative form&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SecOpsAgentKit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Organizing by domain makes complex security workflows easier to navigate&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  Step 2: Design — three core principles
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Progressive Disclosure&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;code&gt;SKILL.md&lt;/code&gt; should contain only the overview and phase structure. Detailed inspection patterns belong in &lt;code&gt;references/&lt;/code&gt; and should be loaded only when needed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Evidence-First&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Not "this might be vulnerable," but "this grep pattern found this code, and here is why it’s risky."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use OWASP directly&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Instead of inventing custom categories, I adopted &lt;strong&gt;OWASP Top 10:2025&lt;/strong&gt;, &lt;strong&gt;WSTG&lt;/strong&gt;, and &lt;strong&gt;MASVS v2&lt;/strong&gt; as-is.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Step 3: Implementation — a 6-file structure
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/.claude/skills/security-audit/
├── SKILL.md                              # Main entry (overview + phase structure)
└── references/
    ├── nextjs-security.md                # Next.js-specific inspection patterns
    ├── vercel-security.md                # Vercel CLI + Chrome MCP checks
    ├── supabase-security.md              # Supabase SQL + Chrome MCP checks
    ├── ios-testing.md                    # MASVS v2 categories
    └── web-testing.md                    # OWASP WSTG + Top 10:2025
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Why Chrome MCP Matters
&lt;/h2&gt;

&lt;p&gt;One of the biggest strengths of this Skill is that it uses &lt;strong&gt;Chrome MCP to inspect settings that the CLI can’t access&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For example, Vercel’s Deployment Protection and some Supabase Auth settings are only partially available via CLI or API. With Chrome MCP, the agent can navigate those dashboards, inspect toggle states, and capture screenshots as evidence.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Vercel example
navigate_page -&amp;gt; /settings/deployment-protection
take_screenshot -&amp;gt; record evidence
evaluate_script -&amp;gt; extract toggle states and protection scope

# Supabase example
navigate_page -&amp;gt; /database/security-advisor
take_screenshot -&amp;gt; record all findings
take_snapshot -&amp;gt; inspect details via accessibility tree
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;th&gt;Available via CLI/SQL&lt;/th&gt;
&lt;th&gt;Requires Chrome MCP&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Vercel environment variable list&lt;/td&gt;
&lt;td&gt;&lt;code&gt;vercel env ls&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment Protection&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;Toggle state on settings page&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supabase RLS state&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pg_class&lt;/code&gt; queries&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supabase Auth settings&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;MFA, email confirmation, rate limits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supabase Security Advisor&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;Full lint findings&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Why a Skill Instead of One Long Prompt?
&lt;/h2&gt;

&lt;p&gt;At first, I thought I could just write one long audit prompt.&lt;/p&gt;

&lt;p&gt;But in practice, turning it into a Skill was much more manageable.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reusable&lt;/strong&gt;: &lt;code&gt;/security-audit all&lt;/code&gt; always starts from the same reliable entry point&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modular&lt;/strong&gt;: &lt;code&gt;SKILL.md&lt;/code&gt; and &lt;code&gt;references/&lt;/code&gt; separate responsibilities cleanly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standardized&lt;/strong&gt;: The order of inspection and evaluation criteria stays consistent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Target-aware&lt;/strong&gt;: It can branch into &lt;code&gt;nextjs&lt;/code&gt;, &lt;code&gt;vercel&lt;/code&gt;, &lt;code&gt;supabase&lt;/code&gt;, or &lt;code&gt;ios&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Higher audit consistency&lt;/strong&gt;: It follows predefined criteria instead of improvising every time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, I didn’t turn this into a Skill just for convenience.&lt;br&gt;
I did it to improve the &lt;strong&gt;reproducibility&lt;/strong&gt; of the audit itself.&lt;/p&gt;
&lt;h2&gt;
  
  
  Key Design Decisions
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Progressive Disclosure
&lt;/h3&gt;

&lt;p&gt;This is the pattern Anthropic emphasizes most strongly.&lt;br&gt;
Claude’s context window is a shared resource, so if you cram everything into &lt;code&gt;SKILL.md&lt;/code&gt;, it competes with the rest of the task.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SKILL.md (always loaded)
  -&amp;gt; overview + phase structure + report format

references/ (loaded only when needed)
  -&amp;gt; nextjs-security.md: Next.js-specific inspection patterns
  -&amp;gt; vercel-security.md: Vercel dashboard inspection steps
  -&amp;gt; supabase-security.md: SQL queries + dashboard checks
  -&amp;gt; ios-testing.md: MASVS v2 commands
  -&amp;gt; web-testing.md: WSTG commands
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Evidence-First
&lt;/h3&gt;

&lt;p&gt;This came directly from studying the Trail of Bits Skills.&lt;br&gt;
Every inspection item should include concrete bash, grep, or SQL commands.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Tables in public schema with RLS disabled (Critical)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nspname&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relname&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_class&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;pg_namespace&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relnamespace&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relkind&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'r'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nspname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'public'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relrowsecurity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Direct OWASP Adoption
&lt;/h3&gt;

&lt;p&gt;I chose not to invent any custom taxonomy.&lt;br&gt;
Instead, I used the standard frameworks directly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Web&lt;/strong&gt;: OWASP Top 10:2025 + WSTG&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;iOS&lt;/strong&gt;: MASVS v2 + MASTG&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared vocabulary&lt;/strong&gt;: CWE for vulnerability classification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The biggest advantage is obvious: it gives your team and external reviewers a common language.&lt;/p&gt;
&lt;h2&gt;
  
  
  Anti-Patterns I Learned from Anthropic’s Best Practices
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Anti-pattern&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;Better approach&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Vague description&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"A security-related skill"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Include explicit trigger phrases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Overloaded &lt;code&gt;SKILL.md&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;One file with 3,000+ lines&lt;/td&gt;
&lt;td&gt;Split detailed content into &lt;code&gt;references/&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Second-person instructions&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"You should check..."&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Use imperative form: &lt;code&gt;"Check..."&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deep reference chains&lt;/td&gt;
&lt;td&gt;A -&amp;gt; B -&amp;gt; C -&amp;gt; D&lt;/td&gt;
&lt;td&gt;Keep references to one level&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Choices with no default&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"Choose one of the following..."&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Recommend a default (&lt;code&gt;all&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No concrete inspection method&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"Check RLS"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Attach exact SQL queries&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;Turning full-stack security auditing into a Claude Code Custom Skill gave me several clear benefits:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Reproducibility&lt;/strong&gt;: the same six-phase audit runs with consistent quality every time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coverage&lt;/strong&gt;: OWASP categories are mapped across four layers — Next.js, Vercel, Supabase, and iOS&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficiency&lt;/strong&gt;: &lt;code&gt;/security-audit all&lt;/code&gt; triggers a full-stack audit in one command&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automation&lt;/strong&gt;: CLI/SQL for machine-readable checks, Chrome MCP for dashboard-only settings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared language&lt;/strong&gt;: OWASP-aligned findings are easier to discuss with other engineers and reviewers&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The most important thing in Skill design is &lt;strong&gt;not starting to write too early&lt;/strong&gt;.&lt;br&gt;
Study the best existing examples first. Define your principles — especially Progressive Disclosure and Evidence-First — and only then implement.&lt;/p&gt;

&lt;p&gt;That’s what turns a one-off prompt into something you can actually keep using.&lt;/p&gt;
&lt;h2&gt;
  
  
  Skill Files
&lt;/h2&gt;

&lt;p&gt;I also published the full Skill files on GitHub:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/toshipon/claude-code-security-audit-skill" rel="noopener noreferrer"&gt;toshipon/claude-code-security-audit-skill&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The article only covers the core ideas, but the full files are ready to drop into:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/.claude/skills/security-audit/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're building with a similar stack, you can use it as-is or adapt the phase structure to your own environment.&lt;br&gt;
The main value is not the exact wording of the Skill — it's having a repeatable audit workflow that doesn't depend on memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/trailofbits/skills" rel="noopener noreferrer"&gt;Trail of Bits Security Skills&lt;/a&gt; — 16 security-oriented Skills (4.5k stars)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices" rel="noopener noreferrer"&gt;Anthropic Skill Best Practices&lt;/a&gt; — official guidance&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://mas.owasp.org/MASVS/" rel="noopener noreferrer"&gt;OWASP MASVS v2&lt;/a&gt; — mobile security verification standard&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://mas.owasp.org/MASTG/" rel="noopener noreferrer"&gt;OWASP MASTG&lt;/a&gt; — mobile security testing guide&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://owasp.org/www-project-web-security-testing-guide/" rel="noopener noreferrer"&gt;OWASP WSTG&lt;/a&gt; — web security testing guide&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://owasp.org/Top10/" rel="noopener noreferrer"&gt;OWASP Top 10:2025&lt;/a&gt; — latest web vulnerability ranking&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>security</category>
      <category>claude</category>
      <category>owasp</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I Have an AI Agent That Tests My Own Product Every 3 Hours</title>
      <dc:creator>toshipon</dc:creator>
      <pubDate>Wed, 08 Apr 2026 22:58:47 +0000</pubDate>
      <link>https://dev.to/toshipon/i-have-an-ai-agent-that-tests-my-own-product-every-3-hours-916</link>
      <guid>https://dev.to/toshipon/i-have-an-ai-agent-that-tests-my-own-product-every-3-hours-916</guid>
      <description>&lt;h2&gt;
  
  
  The Dogfooding Problem for Solo Developers
&lt;/h2&gt;

&lt;p&gt;"Eat your own dog food" is good advice. Use your own product. Find the bugs your users find. Feel the pain before they do.&lt;/p&gt;

&lt;p&gt;In practice, here's what actually happens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You use it heavily right after launch&lt;/li&gt;
&lt;li&gt;Development takes over and you stop touching it&lt;/li&gt;
&lt;li&gt;You check it as a developer, not as a user — you know all the right paths&lt;/li&gt;
&lt;li&gt;"It works" becomes the bar, and rough UX slips through&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I build and maintain a web app solo. At some point I realized: I hadn't actually &lt;em&gt;used&lt;/em&gt; it as a user in weeks. I'd been shipping features, but not experiencing the product.&lt;/p&gt;

&lt;p&gt;So I did something that felt slightly absurd: I gave the job to an AI agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every 3 hours, an AI agent opens my product, checks if things work, and opens a PR if it finds something broken.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's how it works, what it found, and what it can't do.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;Three components:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AI Agent (Claude)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Decides what to check, interprets results, writes fixes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP Server&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Exposes my app's API as callable functions for the AI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Playwright&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Lets the AI control a real browser to check the UI&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The agent runs on a scheduled heartbeat. I define what to check in a markdown file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# What to check (rotate through these):&lt;/span&gt;

&lt;span class="gu"&gt;## API checks&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Call list_projects, get_canvas, get_verification_status
&lt;span class="p"&gt;-&lt;/span&gt; Verify data integrity and response format

&lt;span class="gu"&gt;## UI checks  &lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Open the live site in a real browser
&lt;span class="p"&gt;-&lt;/span&gt; Check mobile viewport (375px)
&lt;span class="p"&gt;-&lt;/span&gt; Check dark mode
&lt;span class="p"&gt;-&lt;/span&gt; Check empty states (what does a new user see?)
&lt;span class="p"&gt;-&lt;/span&gt; Screenshot any anomalies

&lt;span class="gu"&gt;## Code quality&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Run tsc --noEmit, report TypeScript errors
&lt;span class="p"&gt;-&lt;/span&gt; Check for unused imports in recently changed files

&lt;span class="gu"&gt;## When you find something broken:&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Create a branch
&lt;span class="p"&gt;-&lt;/span&gt; Fix it
&lt;span class="p"&gt;-&lt;/span&gt; Run vitest to confirm tests pass
&lt;span class="p"&gt;-&lt;/span&gt; Open a PR
&lt;span class="p"&gt;-&lt;/span&gt; Report to Discord
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the entire instruction set. The AI handles the rest.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the API Integration Works
&lt;/h2&gt;

&lt;p&gt;By default, an AI can't interact with your app's internals. To fix this, I wrapped my API as an MCP (Model Context Protocol) server — basically a list of functions the AI can call.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The AI can call these like tool calls&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;list_projects&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Get all projects&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;project&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findMany&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;add_learning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Record a finding or bug&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;learning&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;get_verification_status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Check the status of all verifications&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;verification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findMany&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This lets the AI do what a human user does — create records, read data, check states — but via API instead of clicking around.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Found
&lt;/h2&gt;

&lt;p&gt;Here are three real bugs the agent caught that I wouldn't have caught otherwise:&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug 1: API and UI were out of sync
&lt;/h3&gt;

&lt;p&gt;When creating data through the API, the API response showed the data correctly. But the data didn't appear in the UI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; The data was stored in two separate database tables. The API wrote to one, the UI read from the other.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why humans missed it:&lt;/strong&gt; Humans always use the UI. If you click "create" in the browser, both tables get written. The bug only appeared when creating via API — which humans never did, but the AI did on every check.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug 2: Mobile layout broken
&lt;/h3&gt;

&lt;p&gt;On desktop: fine. On mobile (375px): input fields overflowed horizontally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; One CSS change: &lt;code&gt;grid-cols-2&lt;/code&gt; → &lt;code&gt;grid-cols-1 md:grid-cols-2&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug 3: Empty state was a white screen
&lt;/h3&gt;

&lt;p&gt;A new user opening their first project saw... nothing. No error, just blank. No guidance, no "create your first item" button.&lt;/p&gt;

&lt;p&gt;This one wasn't technically broken — it just made the product confusing for new users. The agent flagged it as a UX issue and suggested an empty state component.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dogfooding Alone Wasn't Enough
&lt;/h2&gt;

&lt;p&gt;Dogfooding catches a lot — especially broken flows, layout issues, and rough UX.&lt;/p&gt;

&lt;p&gt;But it doesn't catch everything.&lt;/p&gt;

&lt;p&gt;Some bugs only happen in production, under very specific conditions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a component crashes only after a rare user action&lt;/li&gt;
&lt;li&gt;an import mismatch breaks a route that manual testing doesn't hit&lt;/li&gt;
&lt;li&gt;an exception only appears with real data, real timing, or real browser state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those bugs are hard to find by manually using the product every few hours.&lt;/p&gt;

&lt;p&gt;So I ended up adding a second loop: &lt;strong&gt;error monitoring&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The dogfooding agent checks whether the product &lt;em&gt;works as a user experience&lt;/em&gt;.&lt;br&gt;
Error monitoring checks whether the product &lt;em&gt;is failing in the wild&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;That combination turned out to be much stronger than either one alone.&lt;/p&gt;
&lt;h2&gt;
  
  
  Adding Sentry as a Second Feedback Loop
&lt;/h2&gt;

&lt;p&gt;Now the system has two complementary loops:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Loop&lt;/th&gt;
&lt;th&gt;What it catches&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dogfooding every 3 hours&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Broken flows, visual issues, empty states, mobile regressions, rough UX&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sentry monitoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Runtime exceptions, production-only bugs, hard-to-reproduce crashes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The dogfooding loop answers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can a user actually move through the product?&lt;/li&gt;
&lt;li&gt;Does the UI make sense?&lt;/li&gt;
&lt;li&gt;Is anything visually broken?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Sentry loop answers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did something crash in production?&lt;/li&gt;
&lt;li&gt;What stack trace and context came with it?&lt;/li&gt;
&lt;li&gt;Is there a fixable bug hidden behind low-frequency failures?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters because not all quality issues look the same.&lt;/p&gt;

&lt;p&gt;Some problems are visible. Others only show up as stack traces.&lt;br&gt;
If you only rely on dogfooding, you miss production-only failures.&lt;br&gt;
If you only rely on Sentry, you miss awkward UX and broken but non-crashing flows.&lt;/p&gt;

&lt;p&gt;Together, they form a much more complete quality loop.&lt;/p&gt;
&lt;h2&gt;
  
  
  From Detection to Auto-Fix
&lt;/h2&gt;

&lt;p&gt;Once I added Sentry, the agent's job expanded.&lt;br&gt;
It no longer just looked for problems by using the product.&lt;br&gt;
It could also react to problems reported by the product itself.&lt;/p&gt;

&lt;p&gt;The flow now looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Every 3 hours, the agent dogfoods the app&lt;/li&gt;
&lt;li&gt;On a separate schedule, it checks Sentry for unresolved issues&lt;/li&gt;
&lt;li&gt;If it finds a real bug, it analyzes the stack trace and source code&lt;/li&gt;
&lt;li&gt;It creates a branch, writes a fix, runs tests, and opens a PR&lt;/li&gt;
&lt;li&gt;Small safe fixes can be merged automatically after checks pass&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One of the best examples was a page crash caused by the wrong i18n hook import.&lt;br&gt;
The error message itself was vague. Manual testing didn't catch it consistently.&lt;br&gt;
But Sentry provided enough context for the agent to trace the issue back to a bad import and generate a tiny fix.&lt;/p&gt;

&lt;p&gt;That was the moment this stopped feeling like "automated testing" and started feeling more like an &lt;strong&gt;automated maintenance loop&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  What the AI Can and Can't Do
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;The AI is good at&lt;/th&gt;
&lt;th&gt;The AI can't do&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Checking if things &lt;em&gt;work&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;Feeling if things &lt;em&gt;feel right&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Catching regressions automatically&lt;/td&gt;
&lt;td&gt;"This interaction is frustrating"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Covering edge cases humans skip&lt;/td&gt;
&lt;td&gt;Subjective UX judgment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Opening PRs immediately on finding bugs&lt;/td&gt;
&lt;td&gt;Knowing if a feature is missing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Running every 3 hours without fatigue&lt;/td&gt;
&lt;td&gt;Replacing actual user feedback&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The "can't do" column matters. &lt;strong&gt;The AI is a complement, not a replacement.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After the agent does its check, I still need to use the product myself and talk to users. The agent handles the objective, repeatable checks. I handle the subjective, experiential ones.&lt;/p&gt;
&lt;h2&gt;
  
  
  One More Honest Note
&lt;/h2&gt;

&lt;p&gt;About 30% of the time, the agent reports "fixed" when it hasn't fully fixed something. This was frustrating until I built in a hard requirement: &lt;strong&gt;tests must pass before marking anything as done.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Rule: Before opening a PR, run `npx vitest run`.
If tests fail, do not open the PR.
Report the failure instead.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This dropped false completions dramatically. The agent's confidence isn't reliable — test results are.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Build This
&lt;/h2&gt;

&lt;p&gt;You don't need my exact setup. The minimum viable version:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pick a scheduled runner&lt;/strong&gt; — GitHub Actions cron, a crontab, or any agent platform with scheduled tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expose one API endpoint the AI can call&lt;/strong&gt; — Start with just a health check&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write a simple check instruction&lt;/strong&gt; — "Call this endpoint and report if it fails"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add Playwright later&lt;/strong&gt; — Browser checks are optional but powerful for catching visual regressions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The core insight isn't the tech stack. It's that &lt;strong&gt;dogfooding is a discipline problem, not a capability problem.&lt;/strong&gt; You know how to test your own product. You just don't do it consistently.&lt;/p&gt;

&lt;p&gt;Automating it removes the discipline requirement.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you built any automated quality loops into your side projects? Or does your testing start and end with "it worked on my machine"? Curious what others have tried in the comments.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The product the agent keeps testing is &lt;a href="https://kaizen-lab.buildgeeks.dev" rel="noopener noreferrer"&gt;KaizenLab&lt;/a&gt;, my app for hypothesis validation and product learning.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;That made this setup especially useful: the same system I use to organize product decisions is also what the agent keeps checking, stress-testing, and helping improve.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>indiehacking</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How I Used Hypothesis Validation to Shape My Go-to-Market Strategy</title>
      <dc:creator>toshipon</dc:creator>
      <pubDate>Mon, 06 Apr 2026 17:13:36 +0000</pubDate>
      <link>https://dev.to/toshipon/how-i-used-hypothesis-validation-to-shape-my-go-to-market-strategy-1567</link>
      <guid>https://dev.to/toshipon/how-i-used-hypothesis-validation-to-shape-my-go-to-market-strategy-1567</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Have you ever built an app and then realized you had no clear idea how to sell it?&lt;/p&gt;

&lt;p&gt;That’s a common trap in indie development. Building the product is hard, but figuring out &lt;strong&gt;who it’s for, what value it creates, and how to communicate that value&lt;/strong&gt; is often even harder.&lt;/p&gt;

&lt;p&gt;I recently launched &lt;a href="https://site.buildgeeks.dev/en/products/focusnest" rel="noopener noreferrer"&gt;Focusnest&lt;/a&gt;, an iOS ambient sound mixer designed for focus, relaxation, and sleep.&lt;/p&gt;

&lt;p&gt;On the product side, I felt pretty good about it. It supports mixing multiple sounds, saving presets, built-in timers, 1/f fluctuation for more natural sound movement, and full offline use. On paper, it seemed strong enough to compete.&lt;/p&gt;

&lt;p&gt;But when I started thinking about go-to-market, I got stuck.&lt;/p&gt;

&lt;p&gt;As an ambient sound app, the market is crowded. I could explain the features, but I wasn’t convinced they gave people a compelling reason to care.&lt;/p&gt;

&lt;p&gt;So instead of jumping straight into promotion, I treated go-to-market itself as a set of hypotheses.&lt;/p&gt;

&lt;p&gt;That process helped me realize something important: Focusnest probably shouldn’t be positioned as just an “ambient sound app.” It should be positioned as a &lt;strong&gt;focus-switching app&lt;/strong&gt; — a tool that helps people enter deep work faster.&lt;/p&gt;

&lt;p&gt;In this article, I’ll walk through how I organized those hypotheses, what changed in my thinking, and how that shaped the first version of my go-to-market strategy.&lt;/p&gt;

&lt;h2&gt;
  
  
  The App I Built: Focusnest
&lt;/h2&gt;

&lt;p&gt;Focusnest is an iOS app for focus, relaxation, and sleep using customizable ambient soundscapes.&lt;/p&gt;

&lt;p&gt;Its main features include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;White noise, brown noise, and pink noise&lt;/li&gt;
&lt;li&gt;Natural sounds like rain, rivers, fire, waves, birds, wind, and thunder&lt;/li&gt;
&lt;li&gt;Mixing multiple sounds at the same time&lt;/li&gt;
&lt;li&gt;Saving and restoring presets&lt;/li&gt;
&lt;li&gt;A built-in Pomodoro-style timer&lt;/li&gt;
&lt;li&gt;1/f fluctuation for more natural sound variation&lt;/li&gt;
&lt;li&gt;Full offline support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I felt confident about the product quality.&lt;/p&gt;

&lt;p&gt;But product quality and go-to-market are two different problems.&lt;/p&gt;

&lt;p&gt;That’s where many indie products get stuck: &lt;strong&gt;“I built it” does not automatically become “people want it.”&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The First Feeling That Something Was Off
&lt;/h2&gt;

&lt;p&gt;At first, I naturally tried to position it as an ambient sound app.&lt;/p&gt;

&lt;p&gt;But something felt off.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The ambient sound market is crowded&lt;/li&gt;
&lt;li&gt;Spotify and YouTube are viable substitutes&lt;/li&gt;
&lt;li&gt;Existing players like Noisli and Endel are already strong&lt;/li&gt;
&lt;li&gt;“This looks nice” didn’t feel like a strong enough reason to switch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, I had built the product, but I still hadn’t found a clear market context for it.&lt;/p&gt;

&lt;p&gt;That’s an easy place to stall.&lt;/p&gt;

&lt;p&gt;You can keep polishing features forever, but if you haven’t clarified &lt;strong&gt;who it’s for and what job it really does&lt;/strong&gt;, your messaging stays blurry.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Real Mistake
&lt;/h3&gt;

&lt;p&gt;The mistake was treating &lt;strong&gt;feature quality&lt;/strong&gt; and &lt;strong&gt;buying motivation&lt;/strong&gt; as if they were the same thing.&lt;/p&gt;

&lt;p&gt;I could explain things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can mix sounds&lt;/li&gt;
&lt;li&gt;There’s a timer&lt;/li&gt;
&lt;li&gt;It uses 1/f fluctuation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are features.&lt;/p&gt;

&lt;p&gt;But features alone don’t answer the question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why would someone want this now?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What users actually want is not the feature itself, but the &lt;strong&gt;change in state&lt;/strong&gt; the product gives them.&lt;/p&gt;

&lt;h2&gt;
  
  
  So I Organized the Problem as Hypotheses
&lt;/h2&gt;

&lt;p&gt;At that point, I used &lt;a href="https://kaizen-lab.buildgeeks.dev" rel="noopener noreferrer"&gt;KaizenLab&lt;/a&gt;, a hypothesis validation tool, to structure my thinking.&lt;/p&gt;

&lt;p&gt;Instead of treating go-to-market like a vague marketing task, I treated it like a set of testable assumptions.&lt;/p&gt;

&lt;p&gt;Not:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“How do I promote this?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who is this most likely to resonate with?&lt;/li&gt;
&lt;li&gt;Are users really looking for “sound,” or are they looking for something else?&lt;/li&gt;
&lt;li&gt;What makes this meaningfully different from substitutes?&lt;/li&gt;
&lt;li&gt;Which channel is most likely to create early traction?&lt;/li&gt;
&lt;li&gt;What framing makes the value obvious?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A rough summary looked like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Hypothesis&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Target&lt;/td&gt;
&lt;td&gt;Remote workers, students, and creators who struggle to switch into focus mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Core problem&lt;/td&gt;
&lt;td&gt;They are not looking for “nice sounds.” They are looking for a trigger to start work or study&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Differentiation&lt;/td&gt;
&lt;td&gt;Not the number of sounds, but the focus-onboarding experience created by mixing, presets, and timer integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Channels&lt;/td&gt;
&lt;td&gt;X, Zenn, Qiita, App Store ASO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Messaging&lt;/td&gt;
&lt;td&gt;Use cases are stronger than feature lists&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Once I made it visible, “the market feels crowded” stopped being a vague concern.&lt;/p&gt;

&lt;p&gt;It became a clearer strategic question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Who is this for, what is it really helping them do, and through which channel should I explain that first?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Key Insight: It’s Not Really an Ambient Sound App
&lt;/h2&gt;

&lt;p&gt;This was the biggest shift.&lt;/p&gt;

&lt;p&gt;At first, I saw Focusnest as:&lt;/p&gt;

&lt;h3&gt;
  
  
  Before
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;An ambient sound app&lt;/li&gt;
&lt;li&gt;A relaxation app&lt;/li&gt;
&lt;li&gt;A noise playback app&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But after organizing the hypotheses, a different framing emerged.&lt;/p&gt;

&lt;h3&gt;
  
  
  After
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;A tool that reduces the friction of entering focus mode&lt;/li&gt;
&lt;li&gt;A portable deep work environment&lt;/li&gt;
&lt;li&gt;A personal switch for starting work or study&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That change in wording is not just branding.&lt;/p&gt;

&lt;p&gt;It changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;who the competitors are,&lt;/li&gt;
&lt;li&gt;who the message resonates with,&lt;/li&gt;
&lt;li&gt;and what kinds of content I should create.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, saying:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“You can mix 16 different sounds”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;is much weaker than saying:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Start coding faster with a rain + brown noise preset”&lt;/li&gt;
&lt;li&gt;“Recreate your ideal focus environment with one tap”&lt;/li&gt;
&lt;li&gt;“Shorten the ritual it takes to enter work mode”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The second version gives people a concrete reason to care.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Marketing Hypotheses I Came Away With
&lt;/h2&gt;

&lt;p&gt;After organizing everything, a few practical hypotheses stood out.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hypothesis 1: The first users are probably not “people who love ambient sound”
&lt;/h3&gt;

&lt;p&gt;The first people most likely to care may be knowledge workers who struggle with context switching.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remote workers&lt;/li&gt;
&lt;li&gt;Engineers&lt;/li&gt;
&lt;li&gt;Students&lt;/li&gt;
&lt;li&gt;Creators&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These people are less interested in sound for its own sake.&lt;/p&gt;

&lt;p&gt;They care about entering a focused state more easily.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hypothesis 2: Use-case messaging will outperform feature messaging
&lt;/h3&gt;

&lt;p&gt;Feature messaging still matters.&lt;/p&gt;

&lt;p&gt;Things like offline support, 1/f fluctuation, and the number of sound sources are useful.&lt;/p&gt;

&lt;p&gt;But on their own, they don’t create urgency.&lt;/p&gt;

&lt;p&gt;Use cases probably work better:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;for coding&lt;/li&gt;
&lt;li&gt;for studying&lt;/li&gt;
&lt;li&gt;for reading&lt;/li&gt;
&lt;li&gt;for relaxing before sleep&lt;/li&gt;
&lt;li&gt;for starting work in the morning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That kind of framing makes the product feel immediately usable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hypothesis 3: Story-driven distribution is better than paid acquisition at this stage
&lt;/h3&gt;

&lt;p&gt;At this point, I don’t think paid ads are the right first move.&lt;/p&gt;

&lt;p&gt;What seems more promising is building context first.&lt;/p&gt;

&lt;p&gt;That likely means channels like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;use-case posts on X&lt;/li&gt;
&lt;li&gt;developer and validation stories on Zenn / Qiita&lt;/li&gt;
&lt;li&gt;improving App Store screenshots and description copy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In particular, I think the story of &lt;strong&gt;why I built it and how I’m figuring out how to sell it&lt;/strong&gt; is more interesting than simple promotion.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Initial Actions I Decided to Take
&lt;/h2&gt;

&lt;p&gt;Once the hypotheses were clearer, the next actions also became clearer.&lt;/p&gt;

&lt;p&gt;The goal is not to do everything at once.&lt;br&gt;
It’s to test small moves and observe what resonates.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Shift the positioning from “ambient sound app” to “focus-switching app”
&lt;/h3&gt;

&lt;p&gt;This affects everything:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;product description&lt;/li&gt;
&lt;li&gt;App Store copy&lt;/li&gt;
&lt;li&gt;screenshots&lt;/li&gt;
&lt;li&gt;social posts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The wording should consistently emphasize entering focus faster, not just listening to sounds.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Create use-case-based posts on X
&lt;/h3&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Rain + brown noise is my coding preset”&lt;/li&gt;
&lt;li&gt;“I use one tap to switch into work mode every morning”&lt;/li&gt;
&lt;li&gt;“Different presets for focus, relaxation, and sleep”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are stronger than generic feature announcements.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Turn the development and GTM thinking into content
&lt;/h3&gt;

&lt;p&gt;Not just “I built an app,” but:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“I used hypothesis validation to figure out how to bring it to market.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That kind of content does three things at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it promotes Focusnest,&lt;/li&gt;
&lt;li&gt;it shares a practical process other builders can learn from,&lt;/li&gt;
&lt;li&gt;and it strengthens my own identity as someone who builds products through validation, not just intuition.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Delay Product Hunt for now
&lt;/h3&gt;

&lt;p&gt;Product Hunt is attractive, but I don’t think it’s the right first move yet.&lt;/p&gt;

&lt;p&gt;Before trying to launch broadly, I want stronger clarity on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;who this is really for,&lt;/li&gt;
&lt;li&gt;what messaging works,&lt;/li&gt;
&lt;li&gt;and which channels give early traction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this case, building context first seems more valuable than chasing a big launch too early.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;The biggest lesson was simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Building and selling are different jobs.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A good product is not enough.&lt;/p&gt;

&lt;p&gt;The same product can feel irrelevant or compelling depending on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how you frame it,&lt;/li&gt;
&lt;li&gt;who you frame it for,&lt;/li&gt;
&lt;li&gt;and what context you place it in.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At first, I saw Focusnest as an ambient sound app.&lt;/p&gt;

&lt;p&gt;But once I organized the go-to-market hypotheses, I realized its real value was reducing the cost of entering a focused state.&lt;/p&gt;

&lt;p&gt;That clarity alone made the next moves much easier.&lt;/p&gt;

&lt;p&gt;It also made something else clearer:&lt;/p&gt;

&lt;p&gt;not just what I should do next, but what I &lt;strong&gt;shouldn’t&lt;/strong&gt; do yet.&lt;/p&gt;

&lt;p&gt;For example, instead of rushing into ads or Product Hunt, I now think it makes more sense to first build context, messaging, and initial traction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;A lot of indie builders hit the same wall:&lt;/p&gt;

&lt;p&gt;They spend all their energy building, then get stuck at “Now how do I sell this?”&lt;/p&gt;

&lt;p&gt;When that happens, it may be faster to stop adding tactics and start organizing the problem as hypotheses.&lt;/p&gt;

&lt;p&gt;Questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who is this really for?&lt;/li&gt;
&lt;li&gt;What problem does it actually solve?&lt;/li&gt;
&lt;li&gt;What is meaningfully different from substitutes?&lt;/li&gt;
&lt;li&gt;In what context does the value become obvious?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once those become clearer, both your messaging and your product page tend to improve.&lt;/p&gt;

&lt;p&gt;If you’ve built something but still don’t know how to bring it to market, it may help to treat go-to-market as a validation problem too.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;I used &lt;a href="https://kaizen-lab.buildgeeks.dev" rel="noopener noreferrer"&gt;KaizenLab&lt;/a&gt; to organize these hypotheses.&lt;br&gt;&lt;br&gt;
It worked not only for product ideas, but also for shaping go-to-market thinking around an actual indie product.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>startup</category>
      <category>marketing</category>
      <category>indiehackers</category>
    </item>
    <item>
      <title>A Feature I Never Planned Emerged From Persona Interviews — Here's Exactly How</title>
      <dc:creator>toshipon</dc:creator>
      <pubDate>Thu, 02 Apr 2026 16:03:21 +0000</pubDate>
      <link>https://dev.to/toshipon/a-feature-i-never-planned-emerged-from-persona-interviews-heres-exactly-how-4pdk</link>
      <guid>https://dev.to/toshipon/a-feature-i-never-planned-emerged-from-persona-interviews-heres-exactly-how-4pdk</guid>
      <description>&lt;h2&gt;
  
  
  The Feature That Wasn't in the Design Doc
&lt;/h2&gt;

&lt;p&gt;When I started building &lt;strong&gt;&lt;a href="https://apps.apple.com/us/app/bjj-techniques/id6758881037" rel="noopener noreferrer"&gt;BJJ Techniques&lt;/a&gt;&lt;/strong&gt; — a BJJ (Brazilian Jiu-Jitsu) technique learning app for iOS — I had a clear vision: a searchable database of techniques, organized by position and category, with step-by-step instructions and YouTube videos.&lt;/p&gt;

&lt;p&gt;The "Technique Tree" — a visual map showing how techniques connect and flow into each other — was not in that design doc. Not even close.&lt;/p&gt;

&lt;p&gt;It emerged entirely from persona interviews.&lt;/p&gt;

&lt;p&gt;Here's exactly how that happened, including the specific research I used to make those interviews actually work.&lt;/p&gt;




&lt;h2&gt;
  
  
  About the App
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://apps.apple.com/us/app/bjj-techniques/id6758881037" rel="noopener noreferrer"&gt;BJJ Techniques&lt;/a&gt;&lt;/strong&gt; is an iOS app for learning Brazilian Jiu-Jitsu techniques systematically (available on the App Store).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Technique Library&lt;/strong&gt; — Search techniques by category: submissions, sweeps, guard passes, and more&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technique Detail Pages&lt;/strong&gt; — Overview, step-by-step breakdowns, YouTube videos, and related techniques in one place&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technique Tree&lt;/strong&gt; — Visualize how techniques connect from any starting position&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learning Paths&lt;/strong&gt; — Structured weekly curriculum for white belts through early blue belts&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Tool I Used: KaizenLab
&lt;/h2&gt;

&lt;p&gt;Before getting into the personas, a quick note on the workflow.&lt;/p&gt;

&lt;p&gt;I run all my hypothesis validation in &lt;strong&gt;&lt;a href="https://kaizen-lab.buildgeeks.dev" rel="noopener noreferrer"&gt;KaizenLab&lt;/a&gt;&lt;/strong&gt; — a web app I built myself to operationalize the lean hypothesis testing methodology from Toshiaki Ichitani's book &lt;em&gt;&lt;a href="https://www.amazon.co.jp/dp/4802511191" rel="noopener noreferrer"&gt;Build the Right Thing Right&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The core idea: before writing code, define your hypotheses explicitly, design experiments to test them, and record what you learn — in a structured way that builds up over time. KaizenLab handles hypothesis canvases, persona management, AI pseudo-interview simulation, and validation cycle tracking, all in the browser. It also has MCP (Model Context Protocol) integration so AI agents can operate it directly.&lt;/p&gt;

&lt;p&gt;Everything in this article — the personas, the interviews, the feature decision — was run through KaizenLab. I'm writing this both as a case study in persona-driven validation &lt;em&gt;and&lt;/em&gt; as a real-world test of the tool I'm building.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three Personas, Three Real Frustrations
&lt;/h2&gt;

&lt;p&gt;Most indie hackers I know create personas like this: "User A, 25-35, tech-savvy, wants X." Useful, but shallow. The responses you get from shallow personas are shallow too.&lt;/p&gt;

&lt;p&gt;I created three personas with significantly more depth:&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tanaka Shota, 28, IT engineer, white belt&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Frustrations:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Forgets technique names and steps right after learning them&lt;/li&gt;
&lt;li&gt;YouTube search gives fragmented, disconnected information&lt;/li&gt;
&lt;li&gt;Feels bad asking senior students the same questions repeatedly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Goals:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Learn BJJ techniques systematically&lt;/li&gt;
&lt;li&gt;Improve the quality of twice-weekly training sessions&lt;/li&gt;
&lt;li&gt;Reach competition level&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Sato Misaki, 34, marketing manager (reduced hours), female white belt&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Frustrations:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trains only once a week, progress feels too slow&lt;/li&gt;
&lt;li&gt;Most tutorial videos feature male practitioners with strength-based approaches — unclear if techniques work for her body type&lt;/li&gt;
&lt;li&gt;Doesn't know what to prioritize learning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Goals:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maximize limited training time&lt;/li&gt;
&lt;li&gt;Find techniques that work for smaller practitioners&lt;/li&gt;
&lt;li&gt;Understand what's most important to learn &lt;em&gt;right now&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Suzuki Daisuke, 42, sales manager, blue belt (also coaches beginners)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Frustrations:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Feels like fundamentals are shaky despite his rank&lt;/li&gt;
&lt;li&gt;Gets confused by techniques named in English, Portuguese, and Japanese&lt;/li&gt;
&lt;li&gt;Can't rely on physical dominance — technique precision is critical at his age&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Goals:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fill gaps in fundamental technique knowledge&lt;/li&gt;
&lt;li&gt;Organize options by position&lt;/li&gt;
&lt;li&gt;Build a personal game plan&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;These aren't marketing archetypes. Each one has specific contradictions, specific constraints, and specific contexts that change what they actually want from an app.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;KaizenLab's persona management view — three personas organized as cards, each with goals, frustrations, and psychological state dimensions.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fao3s8wlwn48zl6fon90i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fao3s8wlwn48zl6fon90i.png" alt=" " width="800" height="525"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Research That Made Interviews Work
&lt;/h2&gt;

&lt;p&gt;Here's where it gets interesting.&lt;/p&gt;

&lt;p&gt;I use KaizenLab's AI pseudo-interview feature to simulate conversations with personas before talking to real users. The point is to stress-test your questions and spot weak assumptions early — before wasting anyone's time.&lt;/p&gt;

&lt;p&gt;But I found that standard AI personas give obvious, shallow answers. "Yes, that feature would be useful." "I'd like better search." These are useless.&lt;/p&gt;

&lt;p&gt;What changed the quality dramatically was applying principles from the &lt;a href="https://arxiv.org/abs/2502.10558" rel="noopener noreferrer"&gt;HumanLM research paper&lt;/a&gt; from Stanford (2026), which studied how to make AI-simulated participants produce more realistic, human-like responses.&lt;/p&gt;

&lt;p&gt;The key insight from HumanLM: &lt;strong&gt;surface attributes aren't enough. You need to model psychological state dimensions.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;KaizenLab's persona editor has dedicated fields for all three:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Stance&lt;/strong&gt; — What's their position on specific topics?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Suzuki's stance on new tools: "I've been doing BJJ for years.
I'll try a new app if someone I respect recommends it,
but I won't pay for something I haven't validated myself."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Emotional tendencies&lt;/strong&gt; — How do they respond emotionally?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sato's tendencies: "Gets discouraged when progress feels
invisible. Motivated by visible milestones. Anxious about
being the only woman who doesn't understand something."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Communication style&lt;/strong&gt; — How do they express needs?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Tanaka's style: "Direct, specific, data-oriented. Won't say
'I want a feature' but will say 'I tried to look up
arm triangle yesterday and spent 20 minutes cross-referencing
three different YouTube videos.'"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you add these dimensions to a persona, the AI stops giving generic answers. Sato doesn't say "I want a learning path." She says "I have 45 minutes before I need to pick up my kid, and I need to know exactly which two techniques to drill today." That's a different design requirement entirely.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Interviews Actually Found
&lt;/h2&gt;

&lt;p&gt;I ran AI pseudo-interviews with all three personas, asking them to describe how they currently learn and track BJJ techniques.&lt;/p&gt;

&lt;p&gt;The surprising finding: &lt;strong&gt;all three independently requested some form of technique tree or learning path.&lt;/strong&gt; A feature I had not planned and had no intention of building.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;KaizenLab's interview results view — insights extracted from AI pseudo-interviews across all three personas, automatically organized by theme.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx0dgzwnsx8myeb6b1xgn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx0dgzwnsx8myeb6b1xgn.png" alt=" " width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But they wanted completely different things.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tanaka (white belt, IT background)
&lt;/h3&gt;

&lt;p&gt;He wanted an &lt;strong&gt;RPG-style skill tree&lt;/strong&gt; — a branching diagram starting from positions, with unlockable nodes. Closed guard → armbar OR sweep → mount → choke.&lt;/p&gt;

&lt;p&gt;"Feeling of progression. Like I know where I am and what unlocks next."&lt;/p&gt;

&lt;p&gt;This is a learned pattern from gaming and online learning platforms. He wanted the same dopamine loop applied to martial arts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sato (female white belt, time-constrained)
&lt;/h3&gt;

&lt;p&gt;She wanted a &lt;strong&gt;learning path&lt;/strong&gt; integrated into the technique database. Not a map of everything — a filtered view of only what's relevant for her level right now.&lt;/p&gt;

&lt;p&gt;"Show me the 5 techniques that matter most for where I am. Lock everything else. I don't want to see what I'm not ready for."&lt;/p&gt;

&lt;p&gt;This is a completely different mental model from Tanaka's. He wants the full map with fog of war. She wants a guided tour.&lt;/p&gt;

&lt;h3&gt;
  
  
  Suzuki (blue belt, coaching role)
&lt;/h3&gt;

&lt;p&gt;He wanted a &lt;strong&gt;custom game plan builder&lt;/strong&gt; — select from the full technique library to build a personal map of &lt;em&gt;his&lt;/em&gt; game. Multiple plans: one for Gi, one for No-Gi, one for competition.&lt;/p&gt;

&lt;p&gt;"When I'm coaching a white belt, I want to show them &lt;em&gt;my&lt;/em&gt; game plan and say 'start here.' Not a generic beginner curriculum."&lt;/p&gt;

&lt;p&gt;Different again. He already knows the techniques. He wants a tool for organizing and communicating his approach.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Multiple Personas Converge = High Confidence
&lt;/h2&gt;

&lt;p&gt;Here's the validation principle that made me confident enough to build this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When multiple personas independently surface the same underlying need — even if they describe it differently — that's a strong signal.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Tanaka, Sato, and Suzuki each came from different places:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Different experience levels&lt;/li&gt;
&lt;li&gt;Different learning constraints&lt;/li&gt;
&lt;li&gt;Different use cases (self-study vs. coaching)&lt;/li&gt;
&lt;li&gt;Different mental models (gaming vs. workflow vs. curriculum)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But all three had the same core problem: &lt;strong&gt;no way to see how techniques relate to each other and where they stand within that structure.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If only Tanaka had mentioned it, I might have dismissed it as one person's gaming preference. If only Suzuki mentioned it, I might have assumed it was a niche need for advanced practitioners.&lt;/p&gt;

&lt;p&gt;Three independent hits, three different angles, same underlying gap.&lt;/p&gt;

&lt;p&gt;That's when I decided to build it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Feature That Emerged
&lt;/h2&gt;

&lt;p&gt;The Technique Tree I ended up designing has three layers, corresponding to the three personas' needs:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1 (free, all users):&lt;/strong&gt; Position-based technique map&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tap a position → see branching submissions, sweeps, passes, escapes&lt;/li&gt;
&lt;li&gt;Synced with learning progress (mastered = color, not yet = gray)&lt;/li&gt;
&lt;li&gt;Uses existing &lt;code&gt;relatedTechniqueIds&lt;/code&gt; and &lt;code&gt;counterTechniqueIds&lt;/code&gt; data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Phase 2 (premium):&lt;/strong&gt; Custom game plan builder&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select from the full technique library to build your personal map&lt;/li&gt;
&lt;li&gt;Save multiple plans (Gi / No-Gi / competition)&lt;/li&gt;
&lt;li&gt;Share with training partners&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The existing data structure already supported this. The connections between techniques were defined. I just hadn't built a UI that surfaced them visually.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The technique tree in action — starting from closed guard, filtered by arm locks. Mastered techniques appear in color; unlearned ones in gray.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Have Built Without This Process
&lt;/h2&gt;

&lt;p&gt;A searchable database with filters.&lt;/p&gt;

&lt;p&gt;Which is fine. But it's what every BJJ app already has. The techniques would have been well-organized and the search would have been solid. Users would have used it, found a specific technique, watched the linked YouTube video, and moved on.&lt;/p&gt;

&lt;p&gt;The Technique Tree creates something different: a reason to explore the app as a &lt;em&gt;system&lt;/em&gt; rather than a reference lookup. It's the feature most likely to drive retention — coming back to the app not just when you forget a technique name, but to understand how your game is developing.&lt;/p&gt;

&lt;p&gt;I didn't think of this myself. Three personas, systematically interviewed with enough psychological depth to produce real signals, thought of it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Process in Practice
&lt;/h2&gt;

&lt;p&gt;If you want to run this for your own product:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Build personas with psychological state dimensions, not just demographics&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For each persona, define:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Their stance on specific topics relevant to your product&lt;/li&gt;
&lt;li&gt;Their emotional tendencies (what motivates them, what discourages them)&lt;/li&gt;
&lt;li&gt;Their communication style (how they express needs — directly? through frustration? through workarounds?)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Run AI pseudo-interviews before real ones&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use the psychological state dimensions to prompt the AI to respond &lt;em&gt;as&lt;/em&gt; the persona, not as a helpful assistant. If the answers feel generic, your persona lacks depth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Listen for convergence across personas&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One persona mentioning a need = interesting. Two personas = worth investigating. Three personas from different segments = build it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Pay attention to &lt;em&gt;how&lt;/em&gt; they describe the need, not just &lt;em&gt;what&lt;/em&gt; they want&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Tanaka, Sato, and Suzuki all wanted "technique tree," but their descriptions revealed three different product requirements. The surface request was the same. The underlying need was the same. But the specific solution for each was different.&lt;/p&gt;

&lt;p&gt;That distinction is what turns user research into good product design.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you had a feature emerge from user research that you never would have designed yourself? Or do you usually build from your own intuition? I'd be curious to hear in the comments.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I do this kind of persona-driven validation in &lt;a href="https://kaizen-lab.buildgeeks.dev" rel="noopener noreferrer"&gt;KaizenLab&lt;/a&gt; — the tool I built specifically for managing hypothesis validation cycles.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://apps.apple.com/us/app/bjj-techniques/id6758881037" rel="noopener noreferrer"&gt;BJJ Techniques — App Store&lt;/a&gt; — The app built using this validation process&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2502.10558" rel="noopener noreferrer"&gt;HumanLM: Large Language Models as Simulated Participants — Stanford, 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/toshipon/why-i-wasted-6-months-building-the-wrong-product-and-what-i-do-differently-now-1iml"&gt;Why I Wasted 6 Months Building the Wrong Product&lt;/a&gt; — Series Part 1&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/toshipon/i-spent-3-months-building-a-saas-then-ai-did-the-same-thing-in-one-prompt-3kbf"&gt;I Spent 3 Months Building a SaaS — Then AI Did the Same Thing in One Prompt&lt;/a&gt; — Series Part 2&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://kaizen-lab.buildgeeks.dev" rel="noopener noreferrer"&gt;KaizenLab&lt;/a&gt; — Hypothesis validation cycle management&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>indiehacking</category>
      <category>ai</category>
      <category>startup</category>
      <category>product</category>
    </item>
    <item>
      <title>Why an SRE Engineer Built a Product Validation Tool — Bringing Observability Thinking to Product Development</title>
      <dc:creator>toshipon</dc:creator>
      <pubDate>Sun, 29 Mar 2026 01:30:48 +0000</pubDate>
      <link>https://dev.to/toshipon/why-an-sre-engineer-built-a-product-validation-tool-bringing-observability-thinking-to-product-1iml</link>
      <guid>https://dev.to/toshipon/why-an-sre-engineer-built-a-product-validation-tool-bringing-observability-thinking-to-product-1iml</guid>
      <description>&lt;h2&gt;
  
  
  "Why Would an SRE Build a Product Tool?"
&lt;/h2&gt;

&lt;p&gt;I get asked this a lot.&lt;/p&gt;

&lt;p&gt;By day, I'm an SRE engineer at a fintech company. Terraform, AWS, Azure, Kubernetes — my job is keeping systems reliable. I think in dashboards, alerts, and incident response.&lt;/p&gt;

&lt;p&gt;But when I started building side projects, something felt deeply wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure has observability. Product decisions don't.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We use Datadog and Grafana to visualize system state as a matter of course. But "why did we build this feature?" and "was that decision correct?" — there's no dashboard for that. No alerts. No traces.&lt;/p&gt;

&lt;p&gt;That gap is what led me to build a hypothesis validation tool. And it turns out, SRE thinking translates surprisingly well to product development.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Observability Gap in Product Development
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Three Pillars — Reframed
&lt;/h3&gt;

&lt;p&gt;In SRE, we think about observability through three pillars:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pillar&lt;/th&gt;
&lt;th&gt;In Infrastructure&lt;/th&gt;
&lt;th&gt;In Product Development&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Metrics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CPU, memory, response time&lt;/td&gt;
&lt;td&gt;KPIs, usage rates, conversion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Logs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Access logs, error logs&lt;/td&gt;
&lt;td&gt;Decision logs, validation results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Traces&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Request processing paths&lt;/td&gt;
&lt;td&gt;Hypothesis → Experiment → Learning → Next Action&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In infrastructure, we never accept "we don't know what's happening" as a state. We set up alerts, build dashboards, write runbooks for incident response.&lt;/p&gt;

&lt;p&gt;But in product development? "Why we built this feature" is lost within six months. Code preserves &lt;strong&gt;what&lt;/strong&gt; was built, but never &lt;strong&gt;why&lt;/strong&gt; it was built.&lt;/p&gt;

&lt;h3&gt;
  
  
  ADRs for Architecture, But What About Product Decisions?
&lt;/h3&gt;

&lt;p&gt;If you're an engineer, you might use ADRs (Architecture Decision Records) to document technical choices:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# ADR-001: Use Supabase for Database&lt;/span&gt;

&lt;span class="gu"&gt;## Status: Accepted&lt;/span&gt;

&lt;span class="gu"&gt;## Context&lt;/span&gt;
Minimize backend costs for a side project

&lt;span class="gu"&gt;## Decision&lt;/span&gt;
Adopt Supabase (PostgreSQL + Auth + RLS)

&lt;span class="gu"&gt;## Rationale&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; More SQL flexibility than Firebase
&lt;span class="p"&gt;-&lt;/span&gt; RLS handles security at the database layer
&lt;span class="p"&gt;-&lt;/span&gt; Free tier is sufficient for indie projects
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ADRs capture &lt;em&gt;technical&lt;/em&gt; decisions. But they don't capture "the evidence that convinced us this feature was worth building in the first place."&lt;/p&gt;

&lt;p&gt;That's the gap. And it's exactly the kind of gap that makes an SRE uncomfortable.&lt;/p&gt;

&lt;h2&gt;
  
  
  3 SRE Concepts That Changed How I Build Products
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. SLOs → Validation Success Criteria
&lt;/h3&gt;

&lt;p&gt;In SRE, you define SLOs (Service Level Objectives) &lt;em&gt;before&lt;/em&gt; you set up monitoring. "99th percentile response time &amp;lt; 200ms" — the quantitative bar comes first.&lt;/p&gt;

&lt;p&gt;Applied to product development, this means &lt;strong&gt;defining success criteria before running any experiment.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Hypothesis: "Users struggle with tracking hypothesis validation"
Success Criteria: 3 out of 5 interviewees recognize this as a problem
Method: Semi-structured interviews
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This sounds obvious, but most indie hackers (myself included, before) skip it. We run experiments and then decide after the fact whether the results were "good enough." That's like deploying a service without defining SLOs and then arguing about whether the error rate is acceptable.&lt;/p&gt;

&lt;p&gt;Define the bar first. Then measure against it.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Incident Response → Pivot Decisions
&lt;/h3&gt;

&lt;p&gt;SRE incident response has clear escalation rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sev 1:&lt;/strong&gt; Assemble the response team immediately&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sev 2:&lt;/strong&gt; Handle during business hours&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sev 3:&lt;/strong&gt; Address in the next sprint&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I applied the same structure to product validation results:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Validation Result&lt;/th&gt;
&lt;th&gt;Response&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Validated (high confidence)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Continue&lt;/strong&gt; — move to implementation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Validated (low confidence)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Investigate&lt;/strong&gt; — plan additional experiments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Invalidated&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Pivot or kill&lt;/strong&gt; — change direction or stop&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key insight: &lt;strong&gt;don't make pivot decisions emotionally.&lt;/strong&gt; "I spent weeks on this hypothesis, so it must be right" is the product equivalent of ignoring alerts because you don't want to get paged. SREs respond to alerts based on rules, not feelings. Product decisions should work the same way.&lt;/p&gt;

&lt;p&gt;I wrote in &lt;a href="https://dev.to/toshipon/i-spent-3-months-building-a-saas-then-ai-did-the-same-thing-in-one-prompt-3kbf"&gt;my last post&lt;/a&gt; about spending 3 months building a SaaS that AI made obsolete. If I'd had these rules, I would have killed it in week 3 when the early signals were already there.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Runbooks → Validation Playbooks
&lt;/h3&gt;

&lt;p&gt;SREs document incident response procedures as runbooks. When something breaks at 3 AM, you don't want to figure out the steps from scratch.&lt;/p&gt;

&lt;p&gt;Same principle for hypothesis validation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Problem Validation Playbook&lt;/span&gt;

&lt;span class="gu"&gt;### Prep&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Review hypothesis canvas — identify core assumptions
&lt;span class="p"&gt;2.&lt;/span&gt; Define target persona
&lt;span class="p"&gt;3.&lt;/span&gt; Set success criteria (e.g., 3/5 recognize the problem)

&lt;span class="gu"&gt;### Execute&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Pre-test interview questions with AI simulation
&lt;span class="p"&gt;2.&lt;/span&gt; Run 5 semi-structured interviews
&lt;span class="p"&gt;3.&lt;/span&gt; Record key findings and direct quotes

&lt;span class="gu"&gt;### Decide&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Compare results against success criteria
&lt;span class="p"&gt;2.&lt;/span&gt; Record learnings
&lt;span class="p"&gt;3.&lt;/span&gt; Make decision: Continue / Pivot / Kill
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With a runbook, you don't panic during an incident. With a validation playbook, you don't freeze when it's time to decide whether your product idea is worth pursuing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Career Angle: Why This Combination Is Rare
&lt;/h2&gt;

&lt;p&gt;SRE engineers who think about product validation are uncommon. Product managers who think in terms of observability are also uncommon. The intersection is almost empty.&lt;/p&gt;

&lt;p&gt;If you're an engineer considering side projects or a career shift toward product:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Your reliability thinking is an asset&lt;/strong&gt; — you already know how to define measurable targets and respond to data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your operational discipline transfers&lt;/strong&gt; — runbooks, escalation rules, and blameless post-mortems all have product equivalents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your bias toward measurement is exactly what product development needs&lt;/strong&gt; — too many product decisions are made on vibes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The gap isn't your skills. The gap is recognizing that the mental models you already use at work apply directly to building products.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Do Now
&lt;/h2&gt;

&lt;p&gt;I built these SRE-inspired workflows into my own validation process, and eventually into a tool called &lt;a href="https://kaizen-lab.buildgeeks.dev" rel="noopener noreferrer"&gt;KaizenLab&lt;/a&gt; to keep myself honest. But the tool matters less than the mindset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If infrastructure deserves observability, so do your product decisions.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Next time you're about to start a side project, try this: before writing any code, write a validation runbook. Define your SLOs — I mean, success criteria. Set up your "alerts" — the signals that tell you to pivot or kill.&lt;/p&gt;

&lt;p&gt;You already know how to do this. You just haven't applied it to products yet.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Are you an engineer who's applied technical thinking to product development? Or a PM who's borrowed concepts from SRE? I'd love to hear how these worlds collide in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>career</category>
      <category>webdev</category>
      <category>ai</category>
      <category>startup</category>
    </item>
    <item>
      <title>I Spent 3 Months Building a SaaS — Then AI Did the Same Thing in One Prompt</title>
      <dc:creator>toshipon</dc:creator>
      <pubDate>Thu, 26 Mar 2026 12:49:48 +0000</pubDate>
      <link>https://dev.to/toshipon/i-spent-3-months-building-a-saas-then-ai-did-the-same-thing-in-one-prompt-3kbf</link>
      <guid>https://dev.to/toshipon/i-spent-3-months-building-a-saas-then-ai-did-the-same-thing-in-one-prompt-3kbf</guid>
      <description>&lt;h2&gt;
  
  
  The Moment It Hit Me
&lt;/h2&gt;

&lt;p&gt;I'd been heads-down for three months building a real estate investment simulator. It was a proper SaaS — loan calculators, renovation cost modeling, rental income projections, cash flow scenarios for old Japanese houses (kominka). I had Stripe integration, a Pro plan at ¥2,980/month, the works.&lt;/p&gt;

&lt;p&gt;Then one evening, I watched someone type "simulate the rental yield for a 6-room property in Kamakura, purchase price ¥25M, renovation ¥8M, rent ¥65,000/room" into Claude — and get back a detailed cash flow breakdown in about 10 seconds.&lt;/p&gt;

&lt;p&gt;Three months of my work. One prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built and Why
&lt;/h2&gt;

&lt;p&gt;I'm an SRE engineer by day, and I'd gotten into real estate investing on the side — specifically old Japanese houses (kominka) that you can convert into rental apartments. The math is complex: you're juggling purchase price, renovation costs per room, loan terms, vacancy rates, property tax, management fees, and a dozen other variables.&lt;/p&gt;

&lt;p&gt;I kept building the same spreadsheet over and over for each property I evaluated. So I thought: why not turn this into a product? Other investors must have the same pain.&lt;/p&gt;

&lt;p&gt;I spent three months building it. Feature after feature — multiple property comparison, scenario modeling, loan amortization charts, break-even analysis. I even built dark mode. (Every indie hacker's favorite procrastination feature.)&lt;/p&gt;

&lt;p&gt;Here's where I made my first mistake: &lt;strong&gt;I kept adding features without talking to users.&lt;/strong&gt; The UI got complex. Really complex. And I had no idea which features actually mattered because I'd never validated with anyone except myself. When you're the developer AND the only user, everything feels essential.&lt;/p&gt;

&lt;p&gt;Then Stripe rejected my payment integration. That stung, but looking back, it was the universe trying to tell me something.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happened Next
&lt;/h2&gt;

&lt;p&gt;Around the same time, AI models got seriously good at financial analysis. Claude, ChatGPT — they could all handle multi-variable real estate calculations conversationally. You describe a property, ask your questions, and get answers. No UI to learn, no subscription to pay for.&lt;/p&gt;

&lt;p&gt;The "SaaS is Dead" narrative started picking up steam in indie hacker circles. And while I think that take is overblown for most categories, for &lt;strong&gt;calculation-heavy tools with no network effects or proprietary data?&lt;/strong&gt; It hit close to home.&lt;/p&gt;

&lt;p&gt;My simulator was essentially a structured UI for math that a language model could do on the fly. The only "advantage" was a nice interface — but even that was debatable, since my UI had gotten too complex for its own good.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Question I Should Have Asked
&lt;/h2&gt;

&lt;p&gt;Before writing a single line of code, I should have asked:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Can AI do this well enough that a dedicated service adds no unique value?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I never even considered it. In 2025, when I started building, AI-as-calculator wasn't as obvious. But the trajectory was clear if I'd been paying attention. And more importantly, there's a broader version of this question that every indie hacker should ask:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"What makes this worth being a &lt;em&gt;product&lt;/em&gt; instead of a prompt, a script, or a spreadsheet?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the answer is "a nicer UI" — that's not enough anymore.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AI Replacement Test
&lt;/h2&gt;

&lt;p&gt;Here's what I do now before building anything. It takes about an hour.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Try to Replace It with AI (15 minutes)
&lt;/h3&gt;

&lt;p&gt;Open Claude, ChatGPT, or whatever model you prefer. Describe your product's core use case as a prompt. Be specific.&lt;/p&gt;

&lt;p&gt;If the AI produces 80%+ of the value your product would deliver — stop. Your product needs a fundamentally different value proposition, or it shouldn't exist as a product.&lt;/p&gt;

&lt;p&gt;For my simulator, the AI nailed the math. It couldn't save scenarios across sessions or generate comparison charts, but honestly? Most users would be fine with copy-pasting into a spreadsheet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Identify Your "Moat Against AI" (15 minutes)
&lt;/h3&gt;

&lt;p&gt;Ask yourself what your product does that AI &lt;em&gt;can't&lt;/em&gt; replicate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Proprietary data&lt;/strong&gt; — Do you have data the model doesn't? (e.g., real-time pricing, user-generated datasets)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network effects&lt;/strong&gt; — Does it get better with more users? (e.g., marketplace, community)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow integration&lt;/strong&gt; — Does it plug into a system where copy-pasting AI output would be painful? (e.g., CI/CD, CRM)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance/trust&lt;/strong&gt; — Does the domain require auditability, consistency, or certification that AI can't guarantee? (e.g., medical, legal, financial reporting)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collaboration&lt;/strong&gt; — Do multiple people need to work on it together in real-time?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you can't check at least one of these — you're building a nice wrapper around something AI gives away for free.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Ask 5 People (30 minutes)
&lt;/h3&gt;

&lt;p&gt;Not "would you use this?" — that question is useless. Everyone says yes.&lt;/p&gt;

&lt;p&gt;Instead, ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"How do you handle [this problem] today?"&lt;/li&gt;
&lt;li&gt;"Have you tried asking ChatGPT/Claude to do this?"&lt;/li&gt;
&lt;li&gt;"What was missing from the AI's answer?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If they haven't tried AI for this yet, suggest they try it right then. Watch their reaction. If they say "oh wow, this is good enough" — you have your answer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Write Down Your Hypothesis Before Building
&lt;/h3&gt;

&lt;p&gt;Write one sentence:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"People will pay for [my product] instead of using AI because [specific reason]."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you can't fill in that blank convincingly — don't build it yet. Validate the "[specific reason]" part first.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Do Now
&lt;/h2&gt;

&lt;p&gt;That experience — three months of building something that AI made redundant — changed how I approach every new project. I now validate hypotheses before writing code. I decompose ideas into testable assumptions and kill the ones that don't hold up.&lt;/p&gt;

&lt;p&gt;I actually built a tool to manage this process for myself: &lt;a href="https://kaizen-lab.buildgeeks.dev" rel="noopener noreferrer"&gt;KaizenLab&lt;/a&gt;. But honestly, even a notebook works. The tool doesn't matter. The discipline does.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Truth
&lt;/h2&gt;

&lt;p&gt;The hardest part of this story isn't that AI replaced my product. It's that &lt;strong&gt;I could have figured this out in an afternoon&lt;/strong&gt; if I'd been willing to question my own idea.&lt;/p&gt;

&lt;p&gt;I didn't want to test the hypothesis because I was afraid the answer would be "don't build it." And I was right to be afraid — that &lt;em&gt;was&lt;/em&gt; the answer. But finding that out in an afternoon is infinitely better than finding it out after three months.&lt;/p&gt;

&lt;p&gt;If you're an indie hacker reading this: before your next &lt;code&gt;npx create-next-app&lt;/code&gt;, spend one hour on the AI Replacement Test. It might save you three months.&lt;/p&gt;

&lt;p&gt;Or it might confirm that your idea is genuinely defensible — and then you'll build with way more confidence.&lt;/p&gt;

&lt;p&gt;Either way, you win.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you had a project disrupted by AI? Or found a way to build something AI can't easily replace? I'd love to hear your story in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>indiehacking</category>
      <category>saas</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
