Richard Zampieri

Posted on Feb 21

Web/Mobile Personal AI CFO

#webdev #programming #startup #ai

How I Built an AI-Powered Finance App and distributed it's services to support volume and concurrency

A technical high level walkthrough of building a personal finance application that combines traditional CRUD with AI features—and why I split the services

The Problem

I wanted to build a personal finance app that could do more than track transactions. I wanted it to understand your finances: answer questions in natural language, categorize expenses automatically, and even take voice input, like a real personal assistant, so users dont have to type everything, you can ask Finny. The challenge? Most AI work happens in Python, while typical web backends live in Node.js. How do you get the best of both worlds without a tangled mess?

This is how I did it.
Obs: remember that are more than one way to skin the cat and all comes down to concurrency, volume, cost, context in which your app will start running.

High-Level Architecture

I chose a dual-backend architecture: one service for the core application, another dedicated entirely to AI. Here's the flow:

Frontend: React with TypeScript and Vite. Pages, components, contexts, and hooks all feed into a single API abstraction layer.

ExpressoTS API: Handles all non-AI operations—transactions, accounts, goals, budgets, auth. We use ExpressoTS—a type-safe TypeScript framework I created about five years ago for building scalable Node.js APIs. Database used Postgres.

Python AI Service: A separate service for everything AI: RAG (retrieval-augmented generation) for chat, transaction classification, voice transcript parsing, and document extraction. It calls out to an LLM provider for embeddings and completions. Good tip here is OpenRouter!

Why Split the Backend?

1. Keep API Keys Server-Side

AI providers require API keys. We never expose them in the frontend. All AI requests flow through our Python service, which holds the credentials and proxies requests. The browser never talks to the AI provider directly.

2. Python's Strengths for AI

Transaction classification, embeddings, RAG pipelines—these are natural fits for Python. The ecosystem (NumPy, pandas, OpenAI SDK, etc.) is mature. Trying to replicate that in Node would mean more custom code and more maintenance.

3. Independent Scaling and Concurrency

CRUD traffic and AI traffic have different profiles. With two services, we can scale each independently. A spike in chat usage doesn't affect the core transaction APIs. The ExpressoTS API handles many concurrent requests efficiently—it's built for speed with minimal overhead—while the Python service processes AI workloads in isolated batches. That separation means we're not blocking CRUD requests while classification jobs run, and vice versa.

4. Clear Separation of Concerns

Developers know exactly where logic lives: business data and CRUD in ExpressoTS, AI and ML in Python. That separation makes onboarding and refactoring much easier.

Frontend Structure

The app follows a clean layering pattern:

Layer	Location	Responsibility
UI	`components/`, `pages/`	Presentational components only
State	`contexts/`, `hooks/`	Global state and reusable logic
Data	`api/`	All data-fetching and API calls
Utils	`utils/`	Pure functions, no side effects

The api/ folder is the single entry point for backend communication. It doesn't care whether a request goes to ExpressoTS or Python—it just exposes functions like fetchTransactions(), classifyTransactions(), or askFinny(). The rest of the app stays unaware of the dual-backend setup.

The Python AI Service: What It Does

RAG for Financial Q&A

Users ask questions like "How much did I spend on groceries last month?" or "What's left in my dining budget?" The RAG service:

Generates an embedding for the user's question
Searches a vector store for relevant transactions, goals, and spending caps
Injects that context into the LLM prompt
Returns an answer grounded in the user's actual data

We store embeddings for transactions, categories, goals, and budgets. Vector search returns the most relevant slices of data, and the model answers from that—no hallucinations about numbers.

Batch Transaction Classification

When users upload a CSV from their bank, we have dozens or hundreds of uncategorized transactions. The classifier:

Uses the user's existing category hierarchy
Learns from past classifications (e.g., "Starbucks" → Coffee)
Returns structured JSON with category and subcategory IDs

We batch transactions to stay within token limits and process them asynchronously for speed. The frontend shows progress and then refreshes the list when done.

Voice Input

Users can speak a transaction: "Fifty dollars at the gas station for fuel." The voice parser turns that into structured fields:

Amount: 50
Description: Gas station
Category: Transportation
Subcategory: Fuel

The transcript is sent to the Python service, which returns a transaction object ready for the user to confirm or edit before saving.

Patterns That Worked

1. Context-Aware AI

The RAG service receives user_id and household_id from the auth token. Every query is scoped to that user's data. There's no risk of leaking one household's finances into another's answers.

2. Rate Limiting and Usage Tiers

AI calls are expensive. We enforce per-user limits: X chat queries, Y voice sessions, Z classifications per month. Free users get a taste; paid users get more. Limits are checked in middleware before any AI logic runs.

3. Caching Where It Makes Sense

Stateless questions like "What's my total income this year?" can be cached. We cache by query + user_id with a short TTL. Conversational threads aren't cached—they're too dynamic.

4. Tool Calling for Actions

The chat isn't just Q&A. Users can say "Add twenty dollars for lunch" and the AI triggers a tool that creates a transaction. We define tool schemas (add_transaction, get_balance, etc.) and the LLM decides when to call them. The Python service executes the call and returns the result for the model to summarize.

What I'd Do Differently

Start with the dual-backend split from day one. I flirted with a monolith and then split. Migrating mid-build was painful.
Standardize error handling early. ExpressoTS and Python return errors in different shapes. I added a shared error format layer later—would've saved time to do it upfront.
Invest in embedding pipelines sooner. RAG quality depends on good embeddings. We iterated on what to embed (full transaction text vs. summaries) and when to re-index. Doing that earlier would have improved the chat experience faster. There's more to it, obviously can't reveal all strategies used, but this is a good start. :)

Takeaways

Building an AI-powered app doesn't mean throwing everything into one stack, specially if you're vibe coding (be cautious). Splitting the backend—ExpressoTS for CRUD, Python for AI—gave me:

Clear ownership of API keys and AI logic
The right tool for each job
Independent scaling, concurrency, and deployment
A maintainable codebase

If you're considering adding AI to an existing app, or starting fresh with AI in mind, a dedicated Python, rust or c++ service for all AI workloads is a pattern that scales well.

Performance and Cost: Why This Matters When You're Starting Out

When you're building your first product or bootstrapping a side project, every dollar counts. Hosting an app and running AI can get expensive fast. This architecture helps on both fronts.

Concurrency without over-provisioning. The dual-backend split means CRUD and AI run on different processes. The ExpressoTS API can handle thousands of lightweight requests (transactions, account fetches, auth checks) on a modest instance, while the Python service runs AI jobs asynchronously. We're not paying for one beefy server that sits idle waiting for AI calls—we right-size each service. A small Node instance for the API and a small Python instance for AI often cost less than a single large monolith trying to do both.

Lower infrastructure cost. By isolating AI workloads, we avoid scaling the entire app when chat usage spikes. A free-tier or low-cost Python host can handle the AI service early on. As usage grows, we scale only that service. The CRUD API stays stable and cheap.

Cost-effective from day one. I built this understanding that building and hosting has to be cost-effective, especially for developers starting their own business. You don't need Kubernetes or a fleet of containers to launch. Two small services, clear boundaries, and the ability to scale each independently—that's a path that keeps bills low while you validate your product.

I'm the developer behind FinTrack, a personal finance app with an AI called Finny as your best advisor. If you're building something similar, I'd love to hear how you approached it—drop a comment below.

DEV Community