<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gregory Potemkin</title>
    <description>The latest articles on DEV Community by Gregory Potemkin (@gregory_potemkin).</description>
    <link>https://dev.to/gregory_potemkin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3842814%2Ff992442a-578e-433b-bd17-3aff1650d127.jpg</url>
      <title>DEV Community: Gregory Potemkin</title>
      <link>https://dev.to/gregory_potemkin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gregory_potemkin"/>
    <language>en</language>
    <item>
      <title>We pointed our chaos-QA agent at our own site. It found a shipped bug.</title>
      <dc:creator>Gregory Potemkin</dc:creator>
      <pubDate>Mon, 22 Jun 2026 10:43:55 +0000</pubDate>
      <link>https://dev.to/gregory_potemkin/we-pointed-our-chaos-qa-agent-at-our-own-site-it-found-a-shipped-bug-3nom</link>
      <guid>https://dev.to/gregory_potemkin/we-pointed-our-chaos-qa-agent-at-our-own-site-it-found-a-shipped-bug-3nom</guid>
      <description>&lt;p&gt;We build an AI QA engineer, so the fair test is the obvious one: point it at&lt;br&gt;
ourselves. On 15 June 2026 we ran &lt;strong&gt;Gremlin mode&lt;/strong&gt; — Prufa's chaos-testing&lt;br&gt;
modality — against our own marketing site, prufa.dev. It found a real,&lt;br&gt;
user-facing bug that our CI had gone green on and shipped that same day. Here is&lt;br&gt;
the whole run, including the parts where the tool was wrong about itself.&lt;/p&gt;
&lt;h2&gt;
  
  
  What Gremlin mode actually does
&lt;/h2&gt;

&lt;p&gt;A normal Prufa flow checks a path you already know to check. Gremlin is for the&lt;br&gt;
paths you didn't. An LLM-backed agent drives a real browser as a deliberately&lt;br&gt;
difficult user — a confused newbie, an impatient double-clicker, a fat-finger&lt;br&gt;
typist, a back-button masher, a hostile poker — and chooses its own next action&lt;br&gt;
every step. It is the part of QA that needs a model: absorbing an unfamiliar UI&lt;br&gt;
and deciding what a frustrated human would try next.&lt;/p&gt;

&lt;p&gt;What the agent never does is decide whether anything broke. That is the same&lt;br&gt;
invariant as the rest of Prufa —&lt;br&gt;
&lt;a href="https://prufa.dev/blog/engineering/how-prufa-verifies-a-signup-flow/" rel="noopener noreferrer"&gt;the LLM navigates, plain code verifies&lt;/a&gt; —&lt;br&gt;
and it is the whole reason a finding from an LLM-driven tester can be trusted: a&lt;br&gt;
separate layer of deterministic detectors grades the run. A 500 response, an&lt;br&gt;
uncaught exception, a form that accepts invalid input, content wider than the&lt;br&gt;
viewport, two clickable elements overlapping — those are facts, read off the&lt;br&gt;
live page, not opinions.&lt;/p&gt;
&lt;h2&gt;
  
  
  The bug: a mobile overflow CI had just shipped
&lt;/h2&gt;

&lt;p&gt;Across three personas, every run reported the same verified finding at the&lt;br&gt;
390px mobile viewport: the page was &lt;strong&gt;103 pixels wider than the screen&lt;/strong&gt;, with&lt;br&gt;
the "Run a free audit" button in the header hanging off the right edge.&lt;/p&gt;

&lt;p&gt;Here is the part that makes the case for chaos QA. Earlier that same day, a&lt;br&gt;
commit titled "fix" had added exactly the rule meant to prevent this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="k"&gt;@media&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max-width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;520px&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nc"&gt;.header-cta&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;display&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;none&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It never applied. The button is styled by &lt;code&gt;a.btn-primary { display: inline-block }&lt;/code&gt;,&lt;br&gt;
whose selector specificity (0,1,1) outranks the bare &lt;code&gt;.header-cta&lt;/code&gt; (0,1,0), so&lt;br&gt;
the &lt;code&gt;display: none&lt;/code&gt; was silently overridden on every phone-width render. The CSS&lt;br&gt;
was valid. The build passed. The linter was happy. CI was green. And the bug&lt;br&gt;
shipped to production, where it sat 103px wide until an agent that had never seen&lt;br&gt;
our codebase resized the viewport and measured the document.&lt;/p&gt;

&lt;p&gt;The fix was to out-specify the button:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="k"&gt;@media&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max-width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;520px&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;header&lt;/span&gt; &lt;span class="nt"&gt;a&lt;/span&gt;&lt;span class="nc"&gt;.header-cta&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;display&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;none&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;header a.header-cta&lt;/code&gt; is specificity (0,1,2), which beats &lt;code&gt;a.btn-primary&lt;/code&gt;&lt;br&gt;
regardless of source order. After the change, a fresh build measured 0px of&lt;br&gt;
horizontal overflow at 390px and the button correctly hidden. The class of bug&lt;br&gt;
matters here: nothing &lt;em&gt;errored&lt;/em&gt;. A test that asserts known selectors would have&lt;br&gt;
stayed green forever, because the breakage was in a layout dimension no one had&lt;br&gt;
written an assertion about. You catch that by measuring the rendered page, not by&lt;br&gt;
re-running the path you already trusted.&lt;/p&gt;

&lt;h2&gt;
  
  
  The safety guarantee, demonstrated on a live site
&lt;/h2&gt;

&lt;p&gt;A chaos tester loose on a real site is only acceptable if it cannot change&lt;br&gt;
anything. In Prufa, mutations are denied by default: the run is dry-run and a&lt;br&gt;
network-layer guard aborts every non-GET request before it leaves the browser. A&lt;br&gt;
destructive click becomes a "would have mutated" finding instead of an action.&lt;/p&gt;

&lt;p&gt;We didn't have to take that on faith — the run logged it. Across the three&lt;br&gt;
personas the agent attempted between 0 and 4 mutations each; every one was&lt;br&gt;
blocked, and the run recorded which control it would have submitted. Real&lt;br&gt;
payment instruments are never used at all. To let Gremlin submit forms for real,&lt;br&gt;
you explicitly authorise a domain you own — and even then, hard caps bound how&lt;br&gt;
many submissions it can make.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the tool was wrong about itself
&lt;/h2&gt;

&lt;p&gt;The honest part. In an earlier run, two of the gremlin's &lt;em&gt;own&lt;/em&gt; detectors fired&lt;br&gt;
on things that were not bugs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A "dead-end / error page" detector matched the bare string &lt;code&gt;500&lt;/code&gt; in ordinary
marketing copy (think "save $500"), calling a healthy page an error page.&lt;/li&gt;
&lt;li&gt;A "bad input accepted" detector treated any navigation after a form fill as a
successful submission — so clicking a normal link after typing in a field
looked like the app had swallowed invalid input.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A verified finding that turns out to be noise costs more trust than a missed bug&lt;br&gt;
costs coverage, so we did not ship around it. We added a detector&lt;br&gt;
false-positive policy: the error-page check now requires a strong error phrase&lt;br&gt;
in the page's &lt;em&gt;prominent&lt;/em&gt; text (title or heading) on an error-shaped page, not a&lt;br&gt;
substring match anywhere in the body; the bad-input check now requires a real&lt;br&gt;
form submission — an actual non-GET request — before it fires. Both false&lt;br&gt;
positives are gone, and the genuine findings (the mobile overflow) still land.&lt;/p&gt;

&lt;p&gt;We also measured discovery quality directly. On a seeded-bug fixture with five&lt;br&gt;
planted bugs, the agent's first pass found four of five (0.80 coverage); after&lt;br&gt;
we gave it an exploration frontier — a running list of same-site pages it&lt;br&gt;
hasn't visited yet, fed back into each decision — it found all five (1.00),&lt;br&gt;
because it stopped looping one corner and started covering the whole app. That&lt;br&gt;
number is fixture discovery quality, not a claim about your site; the point is&lt;br&gt;
that "does the chaos actually find the planted bugs" is something we test, with&lt;br&gt;
a number, not assert.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why we publish the misses
&lt;/h2&gt;

&lt;p&gt;A QA product that only tells flattering stories about itself is exactly the&lt;br&gt;
product you shouldn't trust to test you. The mobile bug is a good demo. The&lt;br&gt;
false positives are a better one: they show the failure mode that matters for an&lt;br&gt;
LLM-driven tester — a confident, wrong "this is broken" — and they show the line&lt;br&gt;
we hold against it. The model proposes; plain code disposes; and when plain code&lt;br&gt;
gets it wrong, we fix the plain code, in the open.&lt;/p&gt;

&lt;p&gt;Gremlin mode is available on &lt;a href="https://prufa.dev/pricing/" rel="noopener noreferrer"&gt;any paid plan&lt;/a&gt; — read how it works on&lt;br&gt;
the &lt;a href="https://prufa.dev/gremlin/" rel="noopener noreferrer"&gt;chaos-testing page&lt;/a&gt;, or&lt;br&gt;
&lt;a href="https://prufa.dev/" rel="noopener noreferrer"&gt;run a free audit&lt;/a&gt; to see the deterministic side of the same engine on your&lt;br&gt;
own URL first.&lt;/p&gt;

</description>
      <category>testing</category>
      <category>webdev</category>
      <category>ai</category>
      <category>showdev</category>
    </item>
    <item>
      <title>We audited 14 side-project launches. Zero critical bugs, same quiet flaws.</title>
      <dc:creator>Gregory Potemkin</dc:creator>
      <pubDate>Tue, 16 Jun 2026 12:40:10 +0000</pubDate>
      <link>https://dev.to/gregory_potemkin/we-audited-14-side-project-launches-zero-critical-bugs-same-quiet-flaws-5a4j</link>
      <guid>https://dev.to/gregory_potemkin/we-audited-14-side-project-launches-zero-critical-bugs-same-quiet-flaws-5a4j</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://prufa.dev/blog/engineering/we-audited-14-side-project-launches/" rel="noopener noreferrer"&gt;the Prufa blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Five days ago we &lt;a href="https://prufa.dev/blog/engineering/we-audited-49-show-hn-launches/" rel="noopener noreferrer"&gt;audited 49 Show HN launches&lt;/a&gt; and found that 78% had a critical bug on day one. This week we pointed the same free audit at a different cohort: 14 products freshly posted to r/SideProject. We expected more of the same.&lt;/p&gt;

&lt;p&gt;We got the opposite — and it turned out to be more interesting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not one of the 14 had a critical finding.&lt;/strong&gt; No broken signup flow, no canonical pointing at the wrong domain, no analytics tag silently swallowing every event. By the measure that matters most on launch day — does the core thing work — these builders shipped clean.&lt;/p&gt;

&lt;p&gt;And yet every single site had findings. They just all live one tier down, in a layer so consistent it reads like a shared checklist nobody handed out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;11 of 14&lt;/strong&gt; sent &lt;strong&gt;no analytics events&lt;/strong&gt; at all.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;11 of 14&lt;/strong&gt; shipped with &lt;strong&gt;no Content-Security-Policy&lt;/strong&gt; and could be &lt;strong&gt;framed by any site&lt;/strong&gt; (no &lt;code&gt;X-Frame-Options&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;11 of 14&lt;/strong&gt; had &lt;strong&gt;serious accessibility violations&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;12 of 14&lt;/strong&gt; had &lt;strong&gt;tap targets smaller than 24px&lt;/strong&gt; on mobile.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;9 of 14&lt;/strong&gt; took &lt;strong&gt;over four seconds&lt;/strong&gt; to paint their largest element on mobile.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8 of 14&lt;/strong&gt; had &lt;strong&gt;no canonical link&lt;/strong&gt; on the entry page.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No site is named in this post. The point isn't to embarrass anyone — these are good builders who got a real product live. The point is that the same common side-project launch mistakes show up again and again, and if 11 of 14 strangers have them, you probably have a few too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Methodology, briefly
&lt;/h2&gt;

&lt;p&gt;We pulled 20 URLs from recent r/SideProject posts and ran each through the same audit a &lt;a href="https://prufa.dev/" rel="noopener noreferrer"&gt;free Prufa run&lt;/a&gt; does: a real browser loads the public pages and captures network traffic, console output, response codes, headers, and the rendered DOM, then a fixed suite of deterministic checks grades the evidence. Same input, same verdict.&lt;/p&gt;

&lt;p&gt;Of the 20: &lt;strong&gt;14 completed cleanly&lt;/strong&gt;, 4 were blocked by bot protection before our runner could load them, and 2 didn't finish inside our polling window. The numbers below are from the 14 that completed.&lt;/p&gt;

&lt;p&gt;Two honest caveats. First, &lt;strong&gt;14 is a small sample&lt;/strong&gt; — treat these as directional, not census. Second, &lt;strong&gt;every number below is from a code-verified check&lt;/strong&gt;; the audit also produces LLM-written UX observations (a hero that over-claims, a CTA with no clear primary action), but those are advisory and &lt;strong&gt;counted nowhere in this data&lt;/strong&gt;. The LLM in our pipeline never grades results — plain code does.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually breaks on a side-project launch: the numbers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Sites affected (of 14)&lt;/th&gt;
&lt;th&gt;Finding&lt;/th&gt;
&lt;th&gt;Severity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;Tap targets smaller than 24px (mobile)&lt;/td&gt;
&lt;td&gt;warning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;Slow largest-contentful-paint (9 of them over 4s)&lt;/td&gt;
&lt;td&gt;warning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;No analytics events detected&lt;/td&gt;
&lt;td&gt;info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;No Content-Security-Policy header&lt;/td&gt;
&lt;td&gt;info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;Page can be framed by any site (no &lt;code&gt;X-Frame-Options&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;Serious accessibility violations&lt;/td&gt;
&lt;td&gt;warning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;No &lt;code&gt;llms.txt&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Minor accessibility violations&lt;/td&gt;
&lt;td&gt;info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;No &lt;code&gt;X-Content-Type-Options: nosniff&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Text assets served without compression&lt;/td&gt;
&lt;td&gt;info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;No canonical link on entry page&lt;/td&gt;
&lt;td&gt;info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Unknown URLs return 200 instead of 404&lt;/td&gt;
&lt;td&gt;warning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;No structured data on entry page&lt;/td&gt;
&lt;td&gt;info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;No &lt;code&gt;Strict-Transport-Security&lt;/code&gt; header&lt;/td&gt;
&lt;td&gt;warning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Missing Open Graph tags&lt;/td&gt;
&lt;td&gt;info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Missing meta description&lt;/td&gt;
&lt;td&gt;warning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;http://&lt;/code&gt; does not redirect to &lt;code&gt;https://&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;warning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Images missing alt text&lt;/td&gt;
&lt;td&gt;info&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The most common mistake: flying blind on your own launch
&lt;/h2&gt;

&lt;p&gt;Eleven of the fourteen sites sent &lt;strong&gt;no analytics events whatsoever&lt;/strong&gt;. The page loads, the browser records every outbound request, and nothing resembling an analytics beacon ever leaves it.&lt;/p&gt;

&lt;p&gt;This was the single most common finding in the Show HN cohort too, and it stings more for a side project. You posted to r/SideProject for one reason — to find out if anyone wants this. The traffic from that post is the clearest signal you will get for weeks: which referrer converted, which screenshot made people click, how many visitors actually reached the signup. For 11 of these 14 builders, that data was never recorded. The launch happened; the evidence didn't.&lt;/p&gt;

&lt;p&gt;(We can only see a &lt;em&gt;recognized&lt;/em&gt; beacon — if you run a first-party collector we don't have a signature for, you'd show up here too. Worth a 30-second check of your own network tab either way.)&lt;/p&gt;

&lt;h2&gt;
  
  
  The security headers nobody adds
&lt;/h2&gt;

&lt;p&gt;Eleven sites had no Content-Security-Policy and could be &lt;strong&gt;embedded in an iframe by any website on the internet&lt;/strong&gt; — the setup behind clickjacking. Nine were missing &lt;code&gt;X-Content-Type-Options: nosniff&lt;/code&gt;; six had no HSTS; four served &lt;code&gt;http://&lt;/code&gt; without redirecting to &lt;code&gt;https://&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;None of these is exploitable on its own for most side projects, and none will page you. But they're each a one-line fix in your host or framework config, and they're the difference between "looks like a weekend hack" and "looks like someone who knows what they're doing" to anyone who checks. Several were also &lt;strong&gt;soft-404s&lt;/strong&gt; — 7 of 14 returned &lt;code&gt;200 OK&lt;/code&gt; for URLs that don't exist, which quietly pollutes search indexing and hides broken links from your own logs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The mobile and accessibility tax
&lt;/h2&gt;

&lt;p&gt;Twelve sites had &lt;strong&gt;tap targets under 24px&lt;/strong&gt; and nine took &lt;strong&gt;over four seconds&lt;/strong&gt; to paint on mobile — one took 18.5 seconds. Most launch traffic from a social post is mobile; a four-second hero is a meaningful chunk of visitors gone before they see the thing.&lt;/p&gt;

&lt;p&gt;Eleven sites had &lt;strong&gt;serious accessibility violations&lt;/strong&gt; (the kind axe-core flags as &lt;code&gt;serious&lt;/code&gt; — missing form labels, insufficient contrast, controls with no accessible name). These aren't only a compliance question: a button a screen reader can't name is often a button that's confusing to everyone, and contrast failures are just hard-to-read text.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AEO gap: 11 of 14 have no llms.txt
&lt;/h2&gt;

&lt;p&gt;Eleven sites had no &lt;code&gt;llms.txt&lt;/code&gt; and seven had no structured data on the entry page. A year ago that was a non-issue. Now a real and growing share of "how do I…" and "what's the best tool for…" traffic resolves inside ChatGPT, Perplexity, and Google's AI overviews — and those engines lean on machine-readable signals to understand and cite you. A side project with no structured data and no &lt;code&gt;llms.txt&lt;/code&gt; is invisible to exactly the channel that's growing fastest.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we take from this
&lt;/h2&gt;

&lt;p&gt;The Show HN cohort failed loudly — broken flows, dead analytics tags, canonical tags aimed at the wrong domain. This cohort failed quietly, and &lt;em&gt;uniformly&lt;/em&gt;. Zero criticals is genuinely good news; it means these builders shipped working products. But "nothing is broken" and "nothing is leaking" are different claims, and all 14 were leaking in the same handful of places: reach (analytics, AEO, SEO), trust (security headers), and reach-again (mobile speed, accessibility).&lt;/p&gt;

&lt;p&gt;None of it requires judgment to detect. Every finding above is a deterministic check against evidence a browser can capture — a request that did or didn't happen, a header that is or isn't present, a response code. Which is exactly why it should be automated instead of living on a checklist you mean to get to.&lt;/p&gt;

&lt;p&gt;That's the audit we ran on these 14 sites, and it's free: paste a URL on &lt;a href="https://prufa.dev/" rel="noopener noreferrer"&gt;the Prufa homepage&lt;/a&gt; and get the same machine-verified findings for your own launch in about a minute. Ideally before you post it.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>qa</category>
      <category>showdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>We audited 49 Show HN launches. 38 had a critical bug on day one.</title>
      <dc:creator>Gregory Potemkin</dc:creator>
      <pubDate>Fri, 12 Jun 2026 09:19:31 +0000</pubDate>
      <link>https://dev.to/gregory_potemkin/we-audited-49-show-hn-launches-38-had-a-critical-bug-on-day-one-1dk7</link>
      <guid>https://dev.to/gregory_potemkin/we-audited-49-show-hn-launches-38-had-a-critical-bug-on-day-one-1dk7</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published on the &lt;a href="https://prufa.dev/blog/engineering/we-audited-49-show-hn-launches/" rel="noopener noreferrer"&gt;Prufa blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In June 2026 we pointed Prufa's free audit at 50 products that had just launched on Show HN — every launch from the previous 30 days that earned at least 10 points. These are products at their moment of maximum attention: front page, real traffic, founders watching the comments.&lt;/p&gt;

&lt;p&gt;The headline numbers, from the 49 audits that completed (one site couldn't be reached by our runner):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;100% of the 49 launches&lt;/strong&gt; had at least one machine-verified finding.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;78% — 38 of 49 — had at least one critical finding.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;40 critical and 61 warning findings&lt;/strong&gt; in total, every one verified by deterministic checks against captured browser evidence.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No site is named in this post. The point isn't to embarrass anyone — it's that these failures are systematic, and if these teams have them on launch day, you probably do too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Methodology, briefly
&lt;/h2&gt;

&lt;p&gt;Each site got the same audit a free Prufa run does: a real browser loads the public pages, captures network traffic, console output, cookies, and response codes, and a fixed suite of deterministic checks grades the evidence. Same input, same verdict. &lt;strong&gt;Every number below is from a code-verified check&lt;/strong&gt; — no LLM opinions are counted anywhere in this data.&lt;/p&gt;

&lt;p&gt;One honest caveat: our export keeps only the top findings per site, so the per-issue counts below are &lt;strong&gt;floors&lt;/strong&gt;, not totals. The real numbers are equal or worse.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually breaks at website launch: the numbers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Sites affected (of 49)&lt;/th&gt;
&lt;th&gt;Finding&lt;/th&gt;
&lt;th&gt;Severity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;38&lt;/td&gt;
&lt;td&gt;No analytics events detected&lt;/td&gt;
&lt;td&gt;critical&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;No canonical link on entry page&lt;/td&gt;
&lt;td&gt;info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;Cookies set without the &lt;code&gt;Secure&lt;/code&gt; attribute&lt;/td&gt;
&lt;td&gt;warning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Broken links&lt;/td&gt;
&lt;td&gt;warning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;No &lt;code&gt;&amp;lt;h1&amp;gt;&lt;/code&gt; heading on entry page&lt;/td&gt;
&lt;td&gt;info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;No robots.txt&lt;/td&gt;
&lt;td&gt;info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;JavaScript console errors during page load&lt;/td&gt;
&lt;td&gt;warning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Missing meta description&lt;/td&gt;
&lt;td&gt;warning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Images missing alt text&lt;/td&gt;
&lt;td&gt;info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Missing Open Graph tags&lt;/td&gt;
&lt;td&gt;info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Tag container loads, but no analytics events fire&lt;/td&gt;
&lt;td&gt;warning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Canonical URL pointing to a &lt;em&gt;different&lt;/em&gt; host&lt;/td&gt;
&lt;td&gt;critical&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The most common launch bug: analytics that record nothing
&lt;/h2&gt;

&lt;p&gt;The most common critical finding, by a wide margin: &lt;strong&gt;no analytics events detected&lt;/strong&gt;. The page loads, the browser captures every outgoing request — and nothing resembling an analytics event leaves the page.&lt;/p&gt;

&lt;p&gt;Think about what that means on launch day specifically. Front page of Hacker News is, for many of these products, the single largest traffic spike they will ever see. Which referrers converted, which pages people actually read, how many of those visitors signed up — for 38 of these 49 teams, that data simply doesn't exist. Not sampled, not skewed: absent.&lt;/p&gt;

&lt;p&gt;Three more sites had a subtler version: the tag container loads (so a quick "view source" check looks fine), but &lt;strong&gt;no events ever fire&lt;/strong&gt;. That one is nasty precisely because it passes the eyeball test — the only way to catch it is to watch the network traffic, which is what our check does.&lt;/p&gt;

&lt;h2&gt;
  
  
  The rest of the list is the unglamorous stuff
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Broken links (14 sites).&lt;/strong&gt; Nobody clicks every link on their own site — especially footer links, docs links, and that one pricing anchor that moved two redesigns ago. Visitors do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Console errors at page load (10 sites).&lt;/strong&gt; Errors at load time often mean broken features visitors never report — they just leave. These ten sites shipped them to the HN front page.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cookies without &lt;code&gt;Secure&lt;/code&gt; (22 sites).&lt;/strong&gt; A one-attribute fix, sitting on nearly half the cohort.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The canonical-to-wrong-host pair (2 sites, critical).&lt;/strong&gt; Two sites shipped a &lt;code&gt;&amp;lt;link rel="canonical"&amp;gt;&lt;/code&gt; pointing at a &lt;em&gt;different domain&lt;/em&gt; — almost certainly a leftover from a template or staging config. That tag tells search engines "index that other site instead of me." On launch week.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we take from this
&lt;/h2&gt;

&lt;p&gt;These aren't careless teams. They got a product to Show HN and earned points doing it. The pattern says something else: &lt;strong&gt;the surface area that needs verifying grows faster than anyone's willingness to click through it&lt;/strong&gt; — especially in the week before a launch, when everything is on fire.&lt;/p&gt;

&lt;p&gt;None of the findings above require judgment to detect. Every one is a deterministic check against evidence a browser can capture: a response code, a network request that did or didn't happen, an attribute on a cookie. Which is exactly why this should be automated — and why &lt;a href="https://prufa.dev/blog/engineering/how-prufa-verifies-a-signup-flow/" rel="noopener noreferrer"&gt;the LLM in our pipeline never grades results&lt;/a&gt;; plain code does.&lt;/p&gt;

&lt;p&gt;We turned this dataset into a &lt;a href="https://prufa.dev/blog/guides/website-qa-checklist-before-launch/" rel="noopener noreferrer"&gt;pre-launch checklist ordered by these failure rates&lt;/a&gt;, if you want the actionable version.&lt;/p&gt;

&lt;p&gt;That's the audit we ran on these 49 sites, and it's free: paste a URL on &lt;a href="https://prufa.dev/" rel="noopener noreferrer"&gt;prufa.dev&lt;/a&gt;, get the same machine-verified findings for your own site in about a minute. Before your launch day, ideally.&lt;/p&gt;

</description>
      <category>testing</category>
      <category>webdev</category>
      <category>startup</category>
      <category>qa</category>
    </item>
    <item>
      <title>How I Set Up OpenClaw: A Developer's Guide to Self-Hosted AI Assistant Infrastructure</title>
      <dc:creator>Gregory Potemkin</dc:creator>
      <pubDate>Wed, 25 Mar 2026 08:49:10 +0000</pubDate>
      <link>https://dev.to/gregory_potemkin/how-i-set-up-openclaw-a-developers-guide-to-self-hosted-ai-assistant-infrastructure-293i</link>
      <guid>https://dev.to/gregory_potemkin/how-i-set-up-openclaw-a-developers-guide-to-self-hosted-ai-assistant-infrastructure-293i</guid>
      <description>&lt;p&gt;I recently set up OpenClaw, the open-source AI assistant framework, and wanted to share my experience for anyone considering self-hosting vs managed options.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is OpenClaw?
&lt;/h2&gt;

&lt;p&gt;OpenClaw is an AI assistant framework that lets you run your own AI assistant with integrations for Telegram, Slack, WhatsApp, and a built-in web chat. Think of it as your own ChatGPT that you control completely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Self-Host?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data privacy&lt;/strong&gt;: Your conversations stay on your infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost control&lt;/strong&gt;: Use your own API keys, pay only for what you use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customization&lt;/strong&gt;: Full control over models, prompts, and integrations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learning&lt;/strong&gt;: Great way to understand AI infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Setup Process
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Install OpenClaw
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;macOS/Linux/WSL2:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://openclaw.ai/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Windows PowerShell:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;iwr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-useb&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;https://openclaw.ai/install.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;iex&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Run the Onboarding Wizard
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw onboard &lt;span class="nt"&gt;--install-daemon&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This configures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model authentication (OpenAI, Anthropic, Gemini, etc.)&lt;/li&gt;
&lt;li&gt;Workspace defaults&lt;/li&gt;
&lt;li&gt;Gateway settings&lt;/li&gt;
&lt;li&gt;Optional messaging channels&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Verify Everything Works
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw gateway status
openclaw doctor
openclaw dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The last command opens Control UI at &lt;code&gt;http://127.0.0.1:18789/&lt;/code&gt; where you can send your first message.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Lessons Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use localhost for dashboard&lt;/strong&gt;: Never expose the Control UI to the public internet. Use Tailscale or SSH tunneling if you need remote access.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Run doctor after updates&lt;/strong&gt;: Always run &lt;code&gt;openclaw doctor&lt;/code&gt; after setup and upgrades to catch issues early.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Start with built-in chat&lt;/strong&gt;: You don't need Telegram or Slack configured to get started. The Control UI works immediately.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Document your install method&lt;/strong&gt;: Whether you used the script, npm, or source build - keep track of how you installed it for troubleshooting.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  When to Consider Managed Hosting
&lt;/h2&gt;

&lt;p&gt;Self-hosting is great for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Developers who want full control&lt;/li&gt;
&lt;li&gt;Teams with existing infrastructure&lt;/li&gt;
&lt;li&gt;Privacy-sensitive use cases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But if you want zero infrastructure work, there's &lt;a href="https://openclaw-setup.me" rel="noopener noreferrer"&gt;OpenClaw Setup&lt;/a&gt; - managed hosting that handles operations while you keep control of your credentials and config.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openclaw-setup.me/openclaw-setup/" rel="noopener noreferrer"&gt;Official Setup Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openclaw-setup.me/install-openclaw/" rel="noopener noreferrer"&gt;Install Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openclaw-setup.me/openclaw-setup-troubleshooting/" rel="noopener noreferrer"&gt;Troubleshooting Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Have you tried self-hosting AI assistants? What's been your experience?&lt;/p&gt;

</description>
      <category>devops</category>
      <category>ai</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
