Alex "ChainBreaker" Morrison

Posted on Feb 9

The Testnet Lie: What 3 AM Taught Me About Production

#web3 #testnet #mainnet

Stumbled across this piece about testnet vs mainnet the other day, and it got me thinking about all the times production has kicked my ass over the last decade.
So here's my version of that story. With more scars and fewer corporate platitudes.

Testnet is a comfortable lie we tell ourselves

You know what testnet is? It's a gym with rubber weights. You go through all the motions, feel strong, pat yourself on the back. Then mainnet hands you actual iron and you realize you've been fooling yourself.
I've shipped maybe 20+ projects to mainnet over the years. Every single time, there's this moment — usually around 2-3 AM, usually a week after launch — where something breaks in a way testnet never warned you about.
Like that time in 2019 when our "perfectly tested" state management system cascaded into failure because ONE user decided to refresh their browser mid-transaction. Who tests for that? Nobody. Who does it in production? Everyone.
Testnet proves your code can work. Mainnet proves whether you can sleep at night.

The monitoring trap (or: how I learned to stop worrying and embrace alerts)

Here's what nobody tells you about monitoring: you will get it wrong the first time.
You'll either build a system that alerts for everything — every tiny hiccup triggers a notification, your phone becomes a vibrating brick of anxiety — or you'll tune it so conservatively that by the time it alerts, the system's already been dead for six hours.
I've done both. Multiple times. Still haven't perfected it.
At ETHGlobal Paris, met this team drowning in alerts. Hundreds per day. They'd trained themselves to ignore the noise. Classic boy-who-cried-wolf. Two weeks later, their database filled up and nobody noticed because the alert got lost in the spam.
The sweet spot? About one false alarm per week. Annoying enough to keep you sharp, rare enough to take seriously.
The 3 AM test: if your monitoring requires you to think clearly at 3 AM, you've already lost. Everything critical needs to be obvious. Red means bad. Green means good. Yellow means "check this tomorrow morning."
Think of monitoring like this: testnet monitoring is your friend telling you your fly is down. Mainnet monitoring is a very paranoid smoke detector that also checks for carbon monoxide, gas leaks, and suspicious activity three blocks away.

Security is a process, not a certificate

Let's talk about the elephant in the room: security audits.
Everyone treats them like a magic shield. Get your audit, slap the badge on your website, job done. Sleep easy.
Then six months later, you add a feature. Or a dependency updates. Or someone leaves the team and their keys are still active. Or (true story) someone commits credentials to a public GitHub repo at a hackathon because they were rushing.
I consulted for a DeFi protocol in 2022. Beautiful audit report. Top firm. Cost them $50K. Made them feel invincible.
Then they added ONE new feature. Without re-auditing. That feature had a reentrancy bug. Lost $2.3M in about 45 minutes.
The audit wasn't wrong. The audit was a snapshot. Security isn't a moment — it's like going to the gym. One workout doesn't make you fit. One audit doesn't make you secure.
What you actually need:

Automated scanning that runs daily
Key rotation you've ACTUALLY practiced (not just documented in Notion)
Incident response that fits on one page (if it's longer, nobody reads it at 3 AM)
Someone actively trying to break your stuff on a regular schedule

Security is boring operational work. Not a certificate on your wall.

Users are chaos engines wrapped in unpredictability

Your test cases are rational. Beautiful. Logical.
Users? Users are beautiful chaos.
Real examples from my hall of shame:
The penny spammer: User who sent 0.000000001 ETH over and over to "test the system." Generated 10,000 transactions. Our indexer couldn't handle it. Crashed spectacularly.
The emoji enthusiast: Put emojis in a transaction memo field. Our parser assumed ASCII. In 2023. Yeah, I know. Don't @ me.
The impatient clicker: Clicked "send" twice because the first click felt slow. Created duplicate transactions. Our deduplication logic didn't exist because "who would do that?" Turns out: everyone.
The creative networker: Found an RPC endpoint we used for fallback. Shared it publicly as a "free alternative." We got rate-limited by our own backup provider.
Mainnet taught me something brutal: assume users will do the MOST unexpected thing at the WORST possible time.
External dependencies? Same story. That API endpoint that was fast and reliable during testing? Slows to a crawl during mainnet peak hours. Has undocumented rate limits that kick in randomly. Goes down for "scheduled maintenance" without warning.
Build for chaos. Assume everything external will fail. Because it will.

Technical debt is just financial debt with extra steps

Every "we'll fix this later" comment in your codebase? That's a credit card with 30% interest.
That manual deployment step you keep meaning to automate? Costs engineer time every single release.
That hardcoded config value? Blocks every scaling attempt.
That "temporary" workaround from testnet? Still there two years later, preventing upgrades.
Met a team at DevCon who couldn't upgrade their Solidity compiler because they'd taken too many shortcuts. Stuck on 0.6.x while everyone else moved to 0.8.x. Every new feature required increasingly creative (read: hacky) workarounds.
Eventually? Complete rewrite. Six months. $400K.
The shortcuts that saved them two weeks in development cost them half a year in production.
Here's the thing about mainnet: it makes your technical debt visible. And expensive. And urgent.

The stuff I wish someone told me at my first hackathon

Mainnet is not the finish line. It's the starting line.
Testnet is practice. Mainnet is the game. And the game never ends.
You don't "ship to mainnet" and pop champagne. You ship to mainnet and start the marathon of:

Daily monitoring
Constant maintenance
Security updates
User support
Scaling challenges
Operational firefighting

Every. Single. Day.
The teams that win understand this from day one. They build systems that tired humans can operate at weird hours.
The teams that struggle? They think shipping is success. Then production teaches them otherwise.
Another thing: simplicity beats cleverness in production.
That recursive elegant solution that got you invited to a podcast? It'll keep you up at night when it breaks.
That boring, flat, predictable code? You'll sleep like a baby.
I've given talks about clever architectures. I sleep well because of boring code.

What actually matters (a checklist from the trenches)

Before you go to mainnet, you need:
Automation that actually works:

Zero manual deployment steps
Recovery procedures you've tested (at 3 AM, seriously)
Rollback that doesn't require three people in three timezones

Monitoring that warns, not eulogizes:

Alerts BEFORE things break, not after
Signal-to-noise ratio that doesn't drive you insane
Clear escalation paths

Security that's operational:

Key rotation you've practiced
Access control that survives team changes
Continuous scanning, not one-time audits

Processes that handle chaos:

Incident response on one page
On-call rotation (you need sleep)
Communication plan for when shit hits the fan

Missing more than two? You're not ready for mainnet. You're ready for expensive lessons.

The mindset shift nobody talks about

Testnet optimizes for demos. For hackathon judges. For investor pitches.
Mainnet optimizes for responsibility.
Testnet lets you be clever, creative, experimental.
Mainnet demands you be reliable, boring, predictable.
Testnet forgives mistakes. Mainnet remembers them forever. With screenshots. And angry users.
When building for testnet, you're building for yourself.
When building for mainnet, you're building for users who don't care about your elegant architecture, don't understand your optimizations, and just want it to work.
Every. Single. Time.

My production philosophy (after 10+ years of 3 AM pages)

Build systems you can operate when you're stupid.
Because you WILL be stupid. At 3 AM. After two beers at a conference. During a vacation. When you're sick.
If your system requires you to be smart to keep it running, you've built it wrong.
Make recovery boring. Make deployment boring. Make operations boring.
Save your creativity for features. Your operational infrastructure should be so predictable it's almost embarrassing.
I've won awards for creative solutions. I sleep well because my infrastructure is boring as hell.

The bottom line

After a decade of shipping to mainnet, surviving hacks, handling incidents, and learning expensive lessons:
Testnet success feels good.
Mainnet stability feels better.
Sleeping through the night because your monitoring, automation, and processes actually work? That's the real victory.
Build for that.
Stay decentralized, friends.
— Alex "ChainBreaker" Morrison
P.S. If you're at ETHDenver this month, I'll be the one in the Ethereum hoodie with too many war stories. Find me, I'll share the time our monitoring system went down while monitoring the monitoring system. Yes, really.

DEV Community