DEV Community

Hao Wang
Hao Wang

Posted on

I Built a Full SaaS Finance App Solo Using AI — Here's My Honest Take on Every Model I Used

I'm a DevOps engineer. Not a "real" fullstack developer by training. But over the past year I've been building YourFinanceWORKS — a self-hosted, AI-powered finance platform with invoicing, OCR receipt processing, bank reconciliation, multi-tenant architecture, MCP integration, and a full business intelligence layer.

Python/FastAPI backend. React + TypeScript + Vite frontend. 670+ commits. One person.

AI made this possible. And I went through a lot of models to get here.

This is my honest, unsponsored breakdown of every model I used — what worked, what didn't, and what surprised me.


🧪 Phase 1: The Free Model Rotation

Before paying for anything, I treated free models like a buffet. Here's what I found:

Windsurf SWE-1.5

Genuinely the best free option for coding tasks. Fast responses, decent output quality, and it handles real engineering problems — not just toy examples. I've half-jokingly wondered if Windsurf has some Google collaboration going on given how polished it feels for a free tier.

If you're a developer on a budget, start here.

Grok Code

Worth keeping in your rotation. Solid for dev tasks and it's free. Not my first call for complex refactoring, but for quick code generation and boilerplate it holds its own.

Ollama (Local Models)

I really wanted this to work. The appeal of running everything locally — privacy, no API costs, full control — is real. And the output quality was okay. But the speed killed my flow completely. When you're deep in debugging a multi-tenant isolation issue, waiting 30+ seconds per response breaks the mental thread entirely. Local models will get there, but they're not there yet for serious daily use on consumer hardware.

Free OCR Models

This one matters a lot for YourFinanceWORKS since OCR is central to the receipt and invoice processing features. My honest experience: roughly 80% accuracy. That sounds acceptable until you're reconciling actual financial data and realize that 20% error rate means manual correction on 1 in 5 documents. Not viable at scale.

For comparison, Claude Haiku — Anthropic's smallest paid model — made noticeably fewer mistakes. Not zero, but the gap was meaningful.

Gemini Free Tier + Mistral Le Chat

Both solid for research, long-context reading, and general questions. Mistral is underrated for reasoning tasks. Neither replaced a proper coding assistant for me, but they filled gaps well.


📅 Phase 2: Gemini Advanced (A Couple of Months)

I paid for Gemini Advanced for a couple of months and it had genuine strengths.

Where it excelled: UI work. Gemini has a strong instinct for layout and component design. For the React/TypeScript frontend — building out dashboards, data tables, form flows — it consistently produced clean, well-structured components. If your work is frontend-heavy, Gemini is worth serious consideration.

Where it fell short (for me): Deep backend logic. When I was debugging gnarly FastAPI issues, refactoring complex multi-tenant database isolation, or reasoning through Kafka event flow — the responses felt like they understood the surface of the problem but not the intent behind it. I kept having to over-explain context that a more senior-feeling model would have inferred.

Good product. Just not the right fit for the kind of work I was doing most.


✅ Phase 3: Claude Pro (This Month — Where I Landed)

This is where bug fixing and refactoring finally felt right.

The difference I kept noticing: Claude understands the intent behind code, not just the syntax. When I'm tracking down a subtle multi-tenant isolation bug, it doesn't just look at the function I paste — it reasons about what I'm trying to do and where the breakdown is likely happening. When I ask for a refactor, it asks clarifying questions that show it understood the architecture, not just the snippet.

For a project like YourFinanceWORKS — with real financial logic, security-sensitive multi-tenant patterns, and a lot of interconnected moving parts — that difference matters a lot.

Is it perfect? No. It makes mistakes. It sometimes over-explains. But for the specific task of "help me fix this real bug in a real codebase," it's the best I've used.


🤔 The Rate Limit: An Unpopular Opinion

Here's something I didn't expect to say: Claude's rate limit has been good for me.

I was using AI as a pure reflex. Every half-formed thought, every minor question, every moment of uncertainty — I'd just fire it off immediately without thinking first. It became a dependency loop. The question would pop into my head and I'd be typing before I even finished forming the thought.

The rate limit broke that loop.

Suddenly I had to triage. Is this question actually worth using a message on? Can I think through this myself first? Can I look it up? The friction forced me back into the habit of actually thinking — and I started retaining more because I wasn't just offloading everything.

Long term, I genuinely believe the rate limit is helping me avoid a real AI dependency problem. I came to Claude a bit addicted to the instant-answer reflex that unlimited free models had built in me. The constraint is doing what I couldn't do on my own: making me more intentional.

I know that's not a popular take. But it's been my experience.


🗺️ My Model-to-Task Map (After All This)

Task My Go-To
Bug fixing & refactoring Claude (not close)
UI / frontend components Gemini
Quick code generation Windsurf SWE-1.5 or Grok Code
Local / private work Ollama (if speed isn't critical)
OCR at scale Don't cheap out — the accuracy gap is real
Research & long-context Gemini or Mistral

💭 The Bigger Question

The README for YourFinanceWORKS ends with a question I've been sitting with:

Is the SaaS business model really dying? With AI drastically lowering the barrier to building sophisticated software, what can one person with DevOps skills and AI assistance actually ship?

I don't have a clean answer yet. But I built a multi-tenant finance SaaS with OCR, banking reconciliation, MCP integration, and business intelligence — solo, while working a day job — and it runs in production.

That used to require a team. Now it requires the right AI stack and the discipline to use it well.


The project is open source (AGPL-3.0 for the core): github.com/snowsky/yourfinanceworks. Happy to discuss the architecture, the AI workflow, or anything else in the comments.

Top comments (0)