Nikhil Jathar

Posted on Mar 1 • Edited on May 5

The New SaaS Playbook: What Building a 29-Module ERP Taught Me About AI-Native Software

#ai #saas #architecture #opensource

2008 had the mobile revolution. Within three years, every company needed a mobile strategy. Entire industries (taxis, hotels, banking) were rebuilt mobile-first. Not "mobile-added." Mobile-FIRST.

2026 has the AI-native revolution. But most companies are doing the equivalent of 2009 "mobile strategy," bolting a chatbot sidebar onto an existing product. A summary button. An autocomplete. That is not AI-native. That is AI-decorated.

I built a 29-module ERP system to test a thesis: when AI becomes the primary implementation layer, the entire SaaS model changes. Not just how code gets written, but how software gets architectured, priced, delivered, and maintained. The system covers general ledger, inventory, manufacturing, payroll, CRM, AI analytics, and four regional compliance overlays.

612 actions. 191 database tables. 1,839 automated tests. Running on a $20/month server.

This article is the playbook for what I learned. ERPClaw is the proof, not the point.

1. Why the Current SaaS Model Is Fragile

The economics of traditional SaaS are straightforward. Hire engineers at $150K-$250K per year, build for 18 months, charge per-seat to recover costs. A typical ERP vendor charges $10K-$50K annually because they need to: 200-person engineering teams, multi-year roadmaps, legacy code maintenance, sales teams, implementation consultants.

The pricing is a function of build cost, not value delivered. A 40-person manufacturing shop pays $50K per year for SAP but uses perhaps 15% of the features. They need inventory, purchasing, invoicing, and payroll. They are paying for multi-currency consolidation across 40 subsidiaries, a feature they will never touch.

Now ask the question that should keep SaaS executives awake at night: what happens when the build cost drops 10x?

The pricing model does not decline gracefully. It collapses. Not because AI products are better, but because the cost of building equivalent software approaches zero. This is the Kodak parallel. Kodak did not fail because digital cameras took better photos. They failed because the cost of taking a photo went to zero, which destroyed the business model of selling film. SaaS incumbents face the same dynamic. Open-source alternatives built at AI speed will make per-seat pricing indefensible for commodity software.

ERP, CRM, project management, HR, invoicing: these are commodity problems with well-understood business rules. The code is not the moat. It never was. The moat was the cost of writing it.

That moat is gone.

2. The AI-Native Playbook

These seven principles are not specific to ERP. They are the architectural patterns that work when AI is your primary implementation tool. I learned them by building ERPClaw, a 29-module ERP with 612 actions, but they apply equally to healthcare scheduling, logistics management, property management, or any domain-specific SaaS.

Principle 1: Spec-First, Not Code-First

AI is an implementation tool, not an architect. It will happily build you a GL posting function that uses floating point arithmetic. Your trial balance will be off by $0.01 after a thousand transactions, and nobody will notice until the auditor does.

I spent the first full day writing a master specification. No Python. No SQL. Just a document: 33 sections, 9,766 lines. Every table (191 of them), every action (612), every naming convention, every validation rule, every test scenario defined before a single line of code existed.

I studied ERPNext's source code to understand real-world edge cases: double-entry reversals, FIFO stock valuation, partial payment allocation, multi-currency revaluation. That domain research went into the spec, not into prompts.

The plan quality directly determines the output quality. A well-specified action produces working code on the first generation 90% of the time. An underspecified action produces plausible code that fails on edge cases 90% of the time.

The rule: Spend 20% of your time on the specification. It saves 80% of debugging.

Principle 2: Metadata-Driven Everything

In ERPClaw, every skill has a SKILL.md file. That one file serves four audiences simultaneously:

AI instruction: tells the AI what actions exist and how to execute them
API documentation: defines parameters, types, and return values
Web form specification: auto-generates the web UI from the same metadata
User manual: progressive disclosure (basic, intermediate, advanced)

One file. Four surfaces. When I built the web dashboard (Webclaw), I wrote a UI.yaml auto-generator that scanned 24 skills and produced form specifications directly from SKILL.md metadata. All 612 actions became accessible through a web UI with zero per-action custom code. A 4,651-check validation suite verified every auto-generated form.

A traditional ERP frontend needs roughly 150 lines of form code per action. Across 612 actions, that is approximately 92,000 lines. The metadata-driven approach replaced all of it with one generic form renderer and 24 YAML files.

The rule: If your metadata serves only one purpose (API docs OR UI OR AI instructions), you are maintaining the same information in multiple places. Design one source of truth that drives all surfaces.

Principle 3: Modular by Default

Not microservices. The operational overhead of service mesh, distributed transactions, and container orchestration is unjustifiable at AI development speed. Not monolith either; too coupled for independent evolution. The middle ground is independent modules with clear ownership boundaries sharing a single database.

ERPClaw's 29 skills each have their own repo, their own tests, their own SKILL.md. They share one SQLite database (191 tables, 535 indexes). The ownership rule is simple: only the owning skill can WRITE to its tables, while any skill can READ any table. Cross-skill writes happen via subprocess calls. A shared library of 16 modules provides common plumbing: GL posting, stock posting, tax calculation, naming, and encryption.

One database file. One server. Zero DBA. Zero network hops between modules.

The rule: Choose the simplest architecture that supports independent module evolution. For most SaaS products, that is a shared database with clear ownership boundaries, not microservices.

Principle 4: Test Pyramid with Systemic Invariants

AI-generated code passes spot checks but fails systemic properties. Traditional unit tests are necessary but not sufficient. You need invariant checks: properties that must hold across the entire system after every change.

ERPClaw has 18 accounting invariant checks that run automatically after every test touching the general ledger:

Total debits equal total credits across all GL entries
Balance sheet equation holds (Assets = Liabilities + Equity)
GL chain hash integrity (SHA-256 sequential hashing)
Every submitted voucher has at least 2 GL entries (double-entry enforcement)
No NaN or Infinity values in any financial column
Cancelled vouchers have matching reversals with swapped debit/credit

If any single invariant fails, every GL-touching test in that run fails. You cannot accidentally break double-entry bookkeeping and have green tests.

On top of the invariants: 1,530 pytest tests, 168 Playwright browser E2E tests, 60 Telegram E2E tests on the production server. Total: 1,839 automated tests.

The rule: Define the systemic properties your software must maintain. Test those properties after every change, not just the individual functions.

Principle 5: Clean-Install as a Gate

"Works on my machine" is the number one failure mode of AI-assisted development. The AI optimises for the current environment. It does not think about first-time setup on a blank server.

I did a full server wipe: deleted all 30 skills, both databases, the shared library. Then reinstalled from published packages. It broke immediately.

Stale user sessions persisting across database wipes because the web UI had its own session database that was never cleaned
10 missing tables in the publish schema, out of sync with the development schema
Seed data creating UNIQUE constraint collisions across regional skills
$(whoami) returning "root" under sudo, causing services to launch as the wrong user

None of these were caught by 1,530 unit tests. All of them would have hit every single new user.

I ran three full clean-install rounds before it was stable. 49 E2E tests across five phases.

The rule: If it does not work on a blank server, it does not work. Gate every release on a clean-install test.

Principle 6: Security Audit the Output, Not the Process

AI does not think about what should NOT be in production. It generates functionally correct code that leaks development context everywhere.

I ran a security audit across 30 published packages, roughly 220 files. 21 findings: 3 HIGH, 2 MEDIUM, 7 LOW, 9 Open Source Readiness issues.

The HIGH findings were embarrassing in their simplicity: my local dev repository path hardcoded in a meta-package, .DS_Store files included in published packages, systemd configs with real server paths. The MEDIUM findings were worse. A real Indian taxpayer ID (GSTIN) was embedded in test seed data. Development paths appeared in user-facing error messages.

The AI-generated code passed every test. It also shipped my home directory path and a real person's tax ID.

The rule: Treat AI output like code from a brilliant but careless junior developer. Review for what is present that should not be, not just whether the logic is correct.

Principle 7: Ship Scope, Not Features

The old SaaS playbook says ship one feature, get feedback, iterate for 18 months. The AI-native playbook says ship the entire scope and let users install what they need.

ERPClaw shipped 29 modules and 612 actions. Not because shipping fast is impressive, but because modular architecture makes it possible to ship broad scope without shipping complexity. Each skill is independent. Install erpclaw-selling without erpclaw-manufacturing. Install erpclaw-payroll without erpclaw-crm. The system degrades gracefully when optional skills are absent.

Traditional per-feature development made this impossible. You could not justify building 29 modules with a 200-person team in 18 months. With AI handling implementation, the constraint becomes specification quality, not engineering hours.

The rule: When build cost drops 10x, the strategy shifts from "build fewer things better" to "build everything and let users choose." Modular architecture is the prerequisite.

3. The Proof

ERPClaw is not the point of this article. It is the evidence.

Scope: 29 skills covering general ledger, journals, payments, tax, reports, inventory, selling, buying, manufacturing, HR, payroll, CRM, projects, assets, quality, support, billing, AI engine, analytics, and four regional compliance overlays (India GST/TDS, Canada GST/HST/CPP/EI, UK VAT/PAYE/NI, EU 27-state VAT/OSS/SAF-T).

Architecture: Built on the OpenClaw platform. Each skill is a self-contained folder: SKILL.md metadata, scripts/db_query.py for logic, tests/ for pytest. Single SQLite database, 191 tables, 535 indexes. Shared library with 16 modules.

Web UI: Webclaw, built with FastAPI gateway plus Next.js 16 plus shadcn/ui. Auto-generated from SKILL.md metadata. 168 Playwright E2E tests.

Numbers: 612 actions. 1,839 automated tests. 18 accounting invariant checks. 33 GitHub repos, all open source licensed. $0 software cost, $20/month server.

4. What Two Weeks Looks Like

This timeline is not the flex. It is the point. If one developer can do this, the question every SaaS company must answer is: what does your 200-person engineering team do for 18 months?

Day 1: Master plan. 9,766 lines, zero code. The highest-leverage day of the entire project.

Days 2-3: Foundation. Setup, General Ledger, Journals, Payments, Tax, Reports. Six skills. The 12-step GL validation was the hardest part.

Days 4-5: Supply chain. Inventory with FIFO valuation, Selling with the full quote-to-cash pipeline, Buying with three-way matching (PO, receipt, invoice).

Days 6-7: Operations. Manufacturing (BOMs, work orders, WIP accounting), HR, Projects, Assets, Quality.

Days 8-9: People and growth. Payroll (FICA, federal progressive tax, state tax, 401k, garnishments, W-2), CRM, Support, Billing.

Days 10-11: Intelligence and compliance. AI anomaly detection, analytics dashboards, four regional tax overlays.

Days 12-14: Testing overhaul, v2 features, clean-install testing, security audit.

Roughly two skills per day, each tested before moving on.

The answer to "what does a 200-person team do for 18 months" is, for the most part, coordination. Meetings about meetings. Sprint planning for the sprint planning. Cross-team dependency resolution. Code review chains five people deep. AI eliminates the coding bottleneck. Small teams eliminate the coordination bottleneck. Together, that is the 10x.

5. What Broke

Every failure I encountered was a systemic property, not a local bug. Traditional testing catches local bugs. AI-native development needs systemic gates.

The Clean-Install Disaster. Full server wipe, reinstall from packages. Five immediate failures: stale sessions, missing tables, seed collisions, path leakage, sudo detection. Unit tests caught zero of these. A clean-install gate would have caught all five.

The Security Audit. 21 findings across 220 files. Real taxpayer IDs in test data. Build artifacts in packages. Development paths in error messages. The code was functionally correct and contextually careless. A security review stage would have caught all 21.

The Schema Drift. Four regional skills built in separate AI sessions. 39 cross-skill inconsistencies: total_tax vs tax_amount, company_setting vs regional_settings, employee_name vs full_name. Each skill passed its own tests. Cross-skill queries returned wrong data. A schema alignment check would have caught all 39.

The pattern is clear. AI-generated code fails at system boundaries, not within modules. Your testing strategy must match: invariant checks, clean-install gates, security audits, and schema alignment tests. These are the new mandatory layers.

6. The Honest AI Assessment

Where AI excels: CRUD implementation, SQL schema generation, test scaffolding, maintaining consistent patterns across 29 skills (it does not get bored on skill 27), and translating well-specified business rules into working code.

Where AI fails: Cross-module dependencies (intercompany invoicing needed heavy manual correction), edge cases not covered in the spec (GL reversals with partial payments, garnishment priority ordering), security awareness (zero instinct for what should not ship), and cross-session consistency (a table rename in one session is forgotten three sessions later).

The pattern that works: Spec-first. One module at a time. Test immediately. Never move on with failing tests. Give the AI a narrow, well-defined task and it executes with remarkable speed.

The pattern that fails: "Build everything at once." Context overflow, compounding bugs, inconsistent assumptions across modules. The AI is a sprinter, not a marathoner.

The role split: The human provides architecture, domain expertise, validation logic, and the questions the AI does not know to ask. The AI provides implementation speed, consistency, tirelessness, and test generation. Neither role is optional. The CTO who thinks "AI replaces my engineers" will ship broken software. The CTO who thinks "AI is just autocomplete" will ship too slowly.

7. What This Means for CTOs

These are uncomfortable questions. They are also unavoidable.

Headcount. If a solo developer ships 612 actions in two weeks, what is the right team size for a SaaS product? Not zero; architecture, domain expertise, security, and infrastructure still require humans. But not 200 either. The AI-native team is 3-5 people: one architect, one domain expert, one infrastructure engineer, one QA/security person. The "20 backend engineers" model belongs to 2010-2025.

Pricing. If build cost drops 10x, per-seat pricing becomes indefensible for commodity software. ERP, CRM, project management, HR: these are well-understood domains with public specifications. The value lies in domain expertise and data, not in the code. Open-source alternatives will eat commodity SaaS the way Linux ate proprietary Unix.

Build vs Buy. The calculus changes fundamentally. "Build" used to mean 18 months and $2M. Now it means 2 weeks and $20/month hosting. For domain-specific software where your industry has unique rules that off-the-shelf products handle poorly, building is now cheaper than buying and customising.

Moats. The code moat is gone. Anyone can build equivalent software with AI. The remaining moats are proprietary data, regulatory expertise, distribution, and trust. If your SaaS company's primary asset is "we wrote a lot of code over 10 years," that asset is depreciating fast.

Architecture. Every new SaaS product should be metadata-driven. If your metadata serves only one purpose, you are doing 3x the work and creating 3x the maintenance burden. Single-source-of-truth metadata that drives all surfaces (AI, API, UI, docs) is the new baseline.

8. The Playbook

If you are starting a new SaaS product:

Write the full specification first. Tables, actions, validations, edge cases. Spend 20% of your time here.
Design metadata-driven architecture. One definition file per module that drives AI, API, UI, and docs.
Choose boring infrastructure. SQLite or Postgres. Monorepo or simple multi-repo. Subprocess communication. Microservices are for Google-scale problems.
Implement systemic testing. Invariant checks that verify global properties, not just unit assertions.
Gate on clean-install. Every release must work on a blank server.
Security audit the output. Treat AI code like junior developer code; review for context leakage.
Ship scope, then polish. 29 modules at 90% is more useful than 3 modules at 99%.

If you are running an existing SaaS company:

Ask yourself: what is our moat if someone rebuilds our product scope in two weeks with AI?
If the answer is "our code," that is not a moat anymore. Invest in data, domain expertise, and distribution.
Evaluate your per-seat pricing model. The pressure is coming from open-source alternatives built at AI speed.
Remember: your customers do not want software. They want their business operations to work. The delivery mechanism is irrelevant to them.

9. Honest Tradeoffs

AI-native development is not magic. Here is what it cannot do yet.

SQLite is single-server, single-writer. Sufficient for 95% of SMBs. Not for Fortune 500 with 40 countries and 100,000 transactions per day.

Bus factor of one. Open source mitigates this (open source licence, anyone can fork) but it is not the same as having a team.

No SOX or ISO certification. 1,839 automated tests is not a formal audit. If your auditor requires compliance documentation, you will need to produce it yourself.

The web UI is auto-generated from metadata. It is functional, not beautiful. There is no design team behind it.

Cross-module complexity still requires human judgment. Intercompany invoicing, multi-currency revaluation, payroll garnishment priority ordering: these need a domain expert, not a prompt.

If you have 10,000 employees and operate in 40 countries, use SAP. Genuinely. This playbook is for the other 95%.

10. The Endgame

The $50K/year ERP licence becomes indefensible for most SMBs within two to three years. Not because ERPClaw is better than SAP at Fortune 500 scale (it is not), but because the cost of building equivalent scope for SMB needs approaches zero.

Every vertical will get the same treatment. Healthcare scheduling. Property management. Logistics. Legal practice management. Education administration. The formula is the same: domain expert plus AI plus metadata-driven architecture equals full-scope software at near-zero cost.

The surviving SaaS companies will sell domain expertise, regulatory compliance, and data network effects. Not code. The code becomes a commodity. The knowledge of what the code should do remains scarce.

Open-source AI-native software becomes the Linux of business applications. Not glamorous. Not venture-scale. But everywhere.

The CTO's job shifts from "managing engineering teams that write code" to "defining specifications that AI implements and humans validate." The best CTOs in 2030 will not be the ones who managed the largest teams. They will be the ones who wrote the best specs.

This is not a prediction. ERPClaw exists. 29 modules, 612 actions, 1,839 tests, four countries, running on a $20 server. The future is not coming. It shipped.

ERPClaw is free, open-source, and open source licensed. But the point of this article is not ERPClaw; it is the playbook. These principles apply whether you are building ERP, CRM, healthcare scheduling, or logistics management.

If you are a CTO evaluating what AI changes about your business, I would rather hear your skepticism than your applause. Comments are open.

Top comments (2)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.