Gerus Lab

Posted on Apr 7

We Replaced 40% of Our Dev Work with AI in 2026 — Here's What Actually Happened

#ai #webdev #programming #productivity

Everyone in tech is making the same bold claim right now: "AI is 10x-ing our team." We heard it at conferences, read it in Medium posts, and got pitched it by vendors. So at Gerus-lab, we decided to stop taking other people's word for it and run our own experiment.

For six months, we systematically integrated AI coding tools — Copilot, Cursor, Claude — into our workflow across Web3, AI SaaS, and GameFi projects. We tracked metrics, debated internally, and came to conclusions that surprised even us.

This is the honest post-mortem.

The Setup: What We Measured

We did not want vibes — we wanted data. So we tracked:

Lines of meaningful code per developer per day (not raw output — we filtered boilerplate)
Bug rate per feature shipped
Time to first working prototype for new client projects
Developer satisfaction (weekly 1-10 scores)
Client revision cycles — how many rounds of changes before sign-off

Our team at Gerus-lab builds across a wide stack: Solana and TON smart contracts, AI-powered SaaS backends, mobile GameFi apps. Not a homogeneous codebase — which made this a real stress test.

What Went Better Than Expected

Boilerplate Is Gone. Forever.

The single biggest win was eliminating the cognitive tax of boilerplate. Setting up a new Next.js project with authentication, database schema, and API structure used to eat 2-3 days. Now it is a morning.

For one of our GameFi projects, we used Claude to scaffold the entire reward system architecture in a day — something that would have taken a senior dev a full sprint to design and implement from scratch. The AI did not get the game mechanics right immediately, but it gave us 80% of the structure to argue about and refine.

That is the key insight we keep returning to: AI is incredible at giving you something to react to.

Smart Contract Auditing Got Cheaper

On the Web3 side, we have been using AI to do preliminary audits of Solana programs before sending them to external auditors. It catches the obvious stuff — reentrancy risks, integer overflow patterns, missing signer checks.

This alone saved us roughly $8,000 in audit costs over six months by catching fixable issues before they hit the professional auditors billable hours.

Junior Devs Leveled Up Faster

Here is something we did not expect: AI tools dramatically compressed the learning curve for junior developers. Instead of a junior spending three days stuck on an async bug, they could get unstuck in an hour — and then actually understand why the fix worked.

Our two junior hires from the last year are performing at a level it would normally take 18 months to reach.

What Went Worse Than Expected

AI Confidently Writes Broken Web3 Code

This is the part nobody puts in their blog posts: AI tools are genuinely dangerous in specialized domains.

We had a near-miss with a TON smart contract where Copilot generated code that looked syntactically perfect but had a logical flaw in the fee calculation. It would have passed basic testing. A senior dev caught it in code review because he recognized the pattern from a real exploit he had read about.

General-purpose AI models are trained on general-purpose code. Solana ownership model, TON actor-based architecture, EVM quirks — the training data is thin, and the models fill gaps with confident hallucinations.

Rule we implemented: No AI-generated smart contract code ships without a human senior review. Full stop.

The 40% Number Is Misleading

Here is the uncomfortable truth about productivity claims: the 40% we gained in code generation was partially offset by new costs:

Prompt engineering time — learning to get good output takes real skill
Review overhead — AI code needs more careful review, not less, because it looks so plausible
Debugging AI mistakes — fixing an AI confident wrong answer can take longer than writing it correctly the first time

Net productivity gain for our team: closer to 20-25% on standard web development work. Still significant. Just not the 10x the hype promises.

Client Communication Did Not Get Faster

We tried using AI to draft client-facing reports and technical specs. The output was technically accurate. Completely soulless.

Clients noticed. One actually asked if we had changed our communication style. We had — and not in a good direction. We pulled back and kept client communication human.

The Honest Breakdown by Project Type

SaaS web apps — +35% productivity gain, low risk. Use heavily.

Mobile apps — +30% gain, low risk. Use heavily.

Smart contracts — +15% gain, HIGH risk. Use cautiously with mandatory human review.

AI/ML pipelines — +10% gain, medium risk. Use for scaffolding only.

Client communication — negative impact. Keep human.

For our AI SaaS projects, AI tools are now a genuine multiplier. For blockchain work, they are a junior assistant with a tendency to hallucinate at the worst possible moments.

What This Means for Hiring

We have not reduced headcount. We have changed what we hire for.

We are less interested in developers who can write lots of correct syntax. We are more interested in developers who can:

Recognize when AI output is wrong — domain knowledge, not just syntax checking
Architect systems that AI can then help implement
Communicate with clients and explain tradeoffs
Debug deeply — when AI-generated code fails in production, you need someone who understands what is actually happening

The bar for senior engineers at Gerus-lab is actually higher now. We expect them to be good at the things AI cannot do.

The Bigger Picture

The developers panicking about AI replacing them are looking at the wrong threat model. The threat is not AI — it is developers who use AI effectively competing with developers who do not.

In six months of real usage, here is what we are certain about:

AI is a real productivity multiplier for standard web development
AI is dangerous without domain expertise as a guardrail
The human things — architecture, judgment, client trust — matter more, not less
The productivity gains are real but the 10x claims are mostly marketing

We are building with AI, not building with the fantasy version of AI. There is a difference.

Building something in Web3, AI, or SaaS? We have done the hard work of figuring out where AI tools actually help and where they need guardrails. Talk to us at Gerus-lab — we will build it right.

Top comments (1)

William Wang • Apr 12

The 40% number resonates. What I've found is that the productivity gains aren't evenly distributed across task types:

Boilerplate/CRUD: 80%+ replacement. AI handles this almost entirely.
Complex refactors: 50-60%. AI does the heavy lifting but needs human review.
Architecture decisions: 10-20%. AI can suggest options but humans still need to choose.
Debugging production issues: 30-40%. AI is great at reading logs but struggles with context that lives in people's heads.

The teams that get the highest overall replacement rate are the ones that restructure their work to maximize the high-replacement categories. Instead of one developer doing everything, they split work into "AI-delegatable" and "human-required" tracks.

The measurement trap: don't count PRs merged. Count outcomes shipped. An AI that generates 10 PRs that each need 30 minutes of review isn't saving time if the human review queue becomes the bottleneck.