The thing that breaks before 1,000 users is almost never the code.
Not a race condition. Not a memory leak. Not a missing index. Founders assume their app will collapse because of a bug they could have caught in PR review. Engineers assume they'll get paged for a logic error at 2am. Neither is true.
In twelve startup audits, the first failure was always an infrastructure primitive. A limit someone didn't know existed. An environment that didn't exist at all. A default that worked perfectly at 50 users and silently died at 500.
The first thing I check on any stack is the Supabase project plan. If it is on the free tier with more than 20 active users, the connection pool is already at 60% capacity. Most founders have no idea. They only find out when PGRST 104 appears in their logs at 3pm on a Tuesday — right when their biggest customer is running a demo.
That error message means "too many clients already." Not "your query is slow." Not "your code is wrong." Just: you hit a number you didn't know existed. That is what breaks before 1,000 users. Every single time.
Failure pattern 1 — The Supabase free tier connection cliff
You have 47 active users. Your Product Hunt launch is in 4 hours. Everything worked on staging. Then your Sentry dashboard turns red.
remaining connection slots are reserved for non-replication superuser connections
This is not a database crash. It is a configuration limit. Supabase free tier allows 60 direct connections. At 50 concurrent users — each holding a connection open while they browse, type, or leave a tab in the background — the pool is exhausted. Request 61 gets that error. Every user after that sees a 500 response.
Here is why you never saw this coming. Your local development environment had one user: you. Your staging environment had three test accounts. You never simulated 50 concurrent connections because that requires 50 concurrent humans, not 50 automated requests.
I have seen this exact failure at 3pm on a Tuesday. At 11am on a Monday. And once at 8:47am on the morning of a Product Hunt launch. Two hundred people clicked through. The first ten saw the site. The next 190 saw remaining connection slots are reserved — an error message that means nothing to a non-technical user except "this product is broken." The launch slot passed. There is no second first impression.
The fix takes four minutes once you know what it is:
- Enable pgBouncer in the Supabase dashboard (Settings → Database → Connection pooler)
- Change your connection string from port
5432to port6543 - Deploy
That is it. The same free tier now handles 200 concurrent connections instead of 60. No code change. No migration. No additional cost.
Failure pattern 2 — No staging environment
Without staging, every code change is tested on real users. Every deploy is a gamble. The developer pushes a change that works on their laptop and breaks in production because of one difference they could not see.
Here is exactly how that happens. It is 6:13pm on a Thursday. A developer pushes a one-line fix for a typo on the pricing page. The fix works locally. They run git push and vercel deploy --prod. The deploy completes in 47 seconds. The typo is fixed.
But the fix introduced a regression on the payment flow. Why? Because the local environment uses a Stripe test key starting with pk_test_. Production uses a live key starting with pk_live_. The developer never tested the payment flow after the change because the change was "just a typo."
At 9:04pm, a user emails: "Your checkout page gives me an error after I enter my card." At 9:12pm, a second user reports the same thing. At 9:23pm, the founder checks Stripe dashboard. Zero successful charges in the last three hours. Three hours of revenue lost. Twelve customers abandoned.
The developer now has to roll back manually. There is no automated rollback process. They run git revert HEAD, wait for the CI pipeline, manually verify the typo is back but the payment flow works, and redeploy. Total time: 47 minutes.
Here is what a correct staging environment looks like:
- Same infrastructure as production
- Same environment variable structure (different values)
- Same database schema — run
pg_dump --schema-only production_database > schema.sqlweekly - Different data (seeded test accounts, no real emails)
The deployment workflow becomes: git push → deploy to staging (5 minutes) → run smoke tests (90 seconds) → deploy to production. Three hours of setup once. A 90-second automated check before every deploy. I have seen this gap in nine of twelve startup audits. The answer is always the same: "We meant to set up staging. We just haven't gotten to it yet."
Failure pattern 3 — No error monitoring
Here is how a founder discovers a broken feature without error monitoring. A user emails: "Hey, the export button hasn't worked for the last two days. Love the product otherwise." The founder replies: "Thanks for letting us know — we'll look into it immediately."
What the founder does not know: that user is one in twenty who experienced the problem. The other nineteen left quietly. They did not email. They just stopped using the export feature, assumed the product was unreliable, and started looking for alternatives. By the time the founder knows, the feature has been broken for 47 hours. No one is tracking the revenue impact. No one knows when it started.
Now compare that to what Sentry tells you within 60 seconds of the first error occurring:
- The exact line of code:
src/components/ExportButton.tsx:47 - The exact error message:
TypeError: Cannot read property 'map' of undefined - The browser and device:
Chrome 122 on macOS, 2560x1440 - The user's email address (if logged in)
- How many times the error has occurred:
14 times, 9 unique users
A senior engineer looking at that alert can reproduce and fix most errors within 20 minutes. The same engineer without Sentry spends hours unable to reproduce an intermittent failure.
Here is the part that founders do not believe until they see it. Sentry takes 25 minutes to install in a Next.js or Node.js application:
npm install @sentry/nextjs
npx @sentry/wizard -i nextjs
The free tier covers 5,000 errors per month — enough for any startup under 10,000 active users. The paid tier starts at $26 per month. There is no technical reason not to have it. I have audited twelve stacks. Eight of them had no error monitoring. The founders all said the same thing: "We rely on users to tell us when something is broken." Users do not tell you. They leave.
Failure pattern 4 — The free tier that sleeps
Heroku Eco dynos and Render free tier web services spin down after 30 minutes of inactivity. No traffic for half an hour? The server goes to sleep. This is not a bug. It is the feature that makes the free tier free.
When a user hits the application after a period of inactivity, the server has to start up before it can respond. This cold start takes between 10 and 30 seconds. During those seconds, the browser shows a blank screen. No loading spinner. No progress bar. Just white. The user assumes the site is broken. Most leave before the server has finished starting.
A founder demos their product to a potential investor. They open their laptop. They navigate to their own URL. They have not visited the site themselves in the last 45 minutes because they have been preparing for this demo. The site takes 22 seconds to load. The investor sees a blank screen for 22 seconds. The founder refreshes. Another 22 seconds. The investor's first impression of the product's reliability is a 22-second blank screen. The signal sent is not about the product. It is about the team.
This is fixable for $7 per month. A Heroku Basic dyno or a Render paid web service does not sleep. The site loads in under one second every single time. The cost of not fixing it is incalculable. That investor meeting does not happen again. That user who saw a blank screen does not come back.
Here is the alternative if you genuinely cannot afford the paid tier. UptimeRobot on a free plan. Configure it to ping the server every 5 minutes: https://yourapp.com/health. The cold start never happens. This is not a production-grade solution. But it costs $0 and takes 4 minutes to set up. I have audited twelve stacks. Four of them were running on free tier dynos with no keep-alive mechanism. Every single founder said the same thing: "I didn't know it went to sleep."
The pattern across all 12 audits
None of these failures were caused by bad engineering. Not one.
The Supabase free tier is the right choice for day one. The missing staging environment was a pragmatic trade-off when the team was two people and zero customers. The Heroku free tier that sleeps was a smart way to keep costs at zero during the three months before launch. These were rational decisions made by competent engineers who were optimizing for the constraints they had at the time.
The problem is that those constraints change. At zero users, a free tier connection pool of 60 is infinite. At 500 users, it is a wall. At zero users, a hardcoded API key in a private repository feels safe. At 500 users, that key has been cloned onto six laptops, three of which are no longer managed by the company. The gap between "the right choice for day one" and "the wrong choice for launch day" is rarely filled because no one is watching.
Here is the specific moment when these gaps become expensive:
- The launch: Product Hunt goes live. Two hundred people click through. The database connection pool exhausts at request 61. The next 139 users see a 500 error. The launch slot passes. There is no second first impression.
- The investor demo: The founder opens the laptop. The Render free tier has been idle for an hour. Cold start takes 22 seconds. The investor sees a blank screen. The signal sent is not about the product. It is about the team.
- The first enterprise prospect: The security questionnaire arrives. Question 17: "Do you store secrets in your source code repositories?" The founder hesitates. The answer is yes. The deal stalls.
A one-day audit run before any of these moments costs £1,500. Here is what happens in that day: trufflehog scan against the repository history (5 minutes), Supabase project plan review, staging environment verification (pg_dump --schema-only comparison), error monitoring check, cold start audit (curl -w "Time: %{time_total}s" https://yourapp.com/health). The output is a two-page document: five specific gaps, the exact command to verify each one, and the fix time measured in minutes.
A launch failure costs more than £1,500. A failed investor demo costs more than £1,500. A stalled enterprise deal costs multiples of £1,500. The audit is not insurance. Insurance pays out after the loss. The audit prevents the loss from happening at all.
The thing that breaks before 1,000 users is almost never the code. The thing that fixes it is almost never more code. It is a one-day audit, a toggle in a dashboard, a port number change, a $7/month dyno, and a secret scanner that runs for five minutes.
Twelve startups. Every single one had at least three of these gaps. Every single one fixed them in less time than they spent reading this post.
If your stack is on this list, I run a one-day audit. Link in bio.
Top comments (0)