DEV Community

Bhavya Sree
Bhavya Sree

Posted on

Aura Financial Agent

How I Cut AI API Costs by 80% Without Sacrificing Response Quality — Aura Financial Agent :root { --bg: #0b0d14; --surface: #13151f; --border: rgba(255,255,255,0.07); --accent: #7c6af7; --accent2: #38d9a9; --accent3: #f7c56c; --text: #dde3f0; --muted: #7a82a0; --code-bg: #0e101a; } * { box-sizing: border-box; margin: 0; padding: 0; } body { background: var(--bg); color: var(--text); font-family: 'Outfit', sans-serif; font-size: 17px; line-height: 1.8; padding: 0 1rem 6rem; } .hero { max-width: 760px; margin: 0 auto; padding: 5rem 0 3rem; border-bottom: 1px solid var(--border); } .tag-line { font-size: 0.72rem; letter-spacing: 0.18em; text-transform: uppercase; color: var(--accent2); font-family: 'DM Mono', monospace; margin-bottom: 1.2rem; } h1 { font-family: 'Lora', serif; font-size: clamp(2rem, 5vw, 3rem); font-weight: 700; line-height: 1.2; color: #fff; margin-bottom: 1.4rem; } .subtitle { color: var(--muted); font-size: 1.05rem; margin-bottom: 2rem; max-width: 620px; } .meta { display: flex; gap: 1.6rem; flex-wrap: wrap; font-size: 0.82rem; color: var(--muted); font-family: 'DM Mono', monospace; } .meta span { display: flex; align-items: center; gap: 0.4rem; } article { max-width: 760px; margin: 0 auto; padding-top: 3.5rem; } h2 { font-family: 'Lora', serif; font-size: 1.6rem; font-weight: 600; color: #fff; margin: 3.5rem 0 1rem; padding-top: 0.5rem; border-top: 1px solid var(--border); } h3 { font-size: 1.1rem; font-weight: 600; color: var(--accent); margin: 2rem 0 0.6rem; font-family: 'DM Mono', monospace; letter-spacing: 0.03em; } p { margin-bottom: 1.2rem; color: var(--text); } a { color: var(--accent2); text-decoration: none; border-bottom: 1px solid rgba(56,217,169,0.3); transition: border-color 0.2s; } a:hover { border-color: var(--accent2); } code { font-family: 'DM Mono', monospace; font-size: 0.84em; background: var(--code-bg); color: var(--accent2); padding: 0.15em 0.45em; border-radius: 4px; border: 1px solid var(--border); } pre { background: var(--code-bg); border: 1px solid var(--border); border-left: 3px solid var(--accent); border-radius: 8px; padding: 1.4rem 1.6rem; overflow-x: auto; margin: 1.5rem 0 2rem; position: relative; } pre code { background: none; border: none; padding: 0; color: #c9d1f0; font-size: 0.88rem; line-height: 1.7; } .lang-label { position: absolute; top: 0.5rem; right: 0.8rem; font-size: 0.68rem; font-family: 'DM Mono', monospace; color: var(--accent); letter-spacing: 0.1em; text-transform: uppercase; opacity: 0.6; } .callout { background: rgba(124,106,247,0.08); border: 1px solid rgba(124,106,247,0.25); border-left: 3px solid var(--accent); border-radius: 8px; padding: 1.2rem 1.5rem; margin: 1.8rem 0; font-size: 0.95rem; color: #b8bfd8; } .callout strong { color: var(--accent); } .callout-green { background: rgba(56,217,169,0.06); border-color: rgba(56,217,169,0.2); border-left-color: var(--accent2); } .callout-green strong { color: var(--accent2); } .callout-amber { background: rgba(247,197,108,0.06); border-color: rgba(247,197,108,0.2); border-left-color: var(--accent3); } .callout-amber strong { color: var(--accent3); } table { width: 100%; border-collapse: collapse; margin: 1.5rem 0 2rem; font-size: 0.88rem; font-family: 'DM Mono', monospace; } th { background: rgba(124,106,247,0.12); color: var(--accent); font-weight: 500; text-align: left; padding: 0.7rem 1rem; border-bottom: 1px solid rgba(124,106,247,0.2); } td { padding: 0.65rem 1rem; border-bottom: 1px solid var(--border); color: var(--text); vertical-align: top; } tr:last-child td { border-bottom: none; } .badge { display: inline-block; font-size: 0.72rem; font-family: 'DM Mono', monospace; padding: 0.2em 0.6em; border-radius: 4px; letter-spacing: 0.05em; font-weight: 500; } .badge-green { background: rgba(56,217,169,0.12); color: var(--accent2); border: 1px solid rgba(56,217,169,0.25); } .badge-blue { background: rgba(100,175,255,0.1); color: #64afff; border: 1px solid rgba(100,175,255,0.25); } .badge-amber { background: rgba(247,197,108,0.1); color: var(--accent3); border: 1px solid rgba(247,197,108,0.25); } .badge-red { background: rgba(255,100,100,0.1); color: #ff8080; border: 1px solid rgba(255,100,100,0.2); } .stack-grid { display: grid; grid-template-columns: repeat(auto-fill, minmax(160px, 1fr)); gap: 0.8rem; margin: 1.5rem 0 2rem; } .stack-item { background: var(--surface); border: 1px solid var(--border); border-radius: 8px; padding: 0.8rem 1rem; font-size: 0.82rem; font-family: 'DM Mono', monospace; } .stack-item .layer { color: var(--muted); font-size: 0.7rem; margin-bottom: 0.3rem; text-transform: uppercase; letter-spacing: 0.1em; } .stack-item .name { color: var(--text); font-weight: 500; } .results-row { display: grid; grid-template-columns: repeat(3, 1fr); gap: 1rem; margin: 1.5rem 0 2rem; } .result-card { background: var(--surface); border: 1px solid var(--border); border-radius: 10px; padding: 1.2rem; text-align: center; } .result-card .num { font-family: 'Lora', serif; font-size: 2rem; font-weight: 700; color: var(--accent2); display: block; line-height: 1; margin-bottom: 0.4rem; } .result-card .label { font-size: 0.8rem; color: var(--muted); line-height: 1.4; } .divider { border: none; border-top: 1px solid var(--border); margin: 3rem 0; } .team-card { display: flex; align-items: center; gap: 1rem; background: var(--surface); border: 1px solid var(--border); border-radius: 10px; padding: 1rem 1.4rem; margin-bottom: 0.8rem; } .avatar { width: 44px; height: 44px; border-radius: 50%; background: linear-gradient(135deg, var(--accent), var(--accent2)); display: flex; align-items: center; justify-content: center; font-weight: 700; color: #fff; font-size: 1rem; flex-shrink: 0; } .team-name { font-weight: 600; color: #fff; font-size: 0.95rem; } .team-role { font-size: 0.8rem; color: var(--muted); font-family: 'DM Mono', monospace; } .refs { list-style: none; } .refs li { padding: 0.5rem 0; border-bottom: 1px solid var(--border); font-size: 0.9rem; display: flex; align-items: center; gap: 0.6rem; } .refs li:last-child { border-bottom: none; } .ref-dot { width: 6px; height: 6px; border-radius: 50%; background: var(--accent); flex-shrink: 0; } blockquote { border-left: 3px solid var(--accent3); padding-left: 1.2rem; margin: 1.5rem 0; color: var(--muted); font-style: italic; font-family: 'Lora', serif; } ol, ul { padding-left: 1.5rem; margin-bottom: 1.2rem; } li { margin-bottom: 0.5rem; } @media (max-width: 600px) { .results-row { grid-template-columns: 1fr; } h1 { font-size: 1.8rem; } pre { padding: 1rem; font-size: 0.8rem; } }

AI Engineering · Financial Agents · LLM Architecture

How I Cut AI API Costs by 80% Without Sacrificing Response Quality

Building Aura — a multi-LLM financial agent that routes every query to the right model in real time, remembers everything across sessions, and never sends sensitive data to the cloud.

✦ May 2026 ✦ 12 min read ✦ Live Demo ↗

Introduction

Every AI application I've worked on runs into the same wall eventually: your cloud LLM bill. You start with GPT-4o because the quality is great, you ship fast, and then suddenly you're paying $300/month for an app where 60% of the queries are "what's the difference between SIP and lumpsum." That's not sustainable.

The obvious solution — swap to a cheaper model — breaks the use cases that actually matter. Financial advice is not a uniform task. Routing "Hi, how are you?" and "Draft a SEBI-compliant risk stress analysis for an equity options portfolio under high interest rate environments" to the same model is either wasteful or dangerous depending on which model you pick.

I built Aura Financial Agent to solve this properly. It's a financial intelligence platform powered by two core engines:

  • Cascadeflow — a real-time multi-LLM routing matrix that scores each prompt for complexity and sends it to the cheapest model capable of handling it.
  • Hindsight — a persistent memory layer that extracts user facts, financial goals, and risk profiles across sessions, so the agent actually knows who it's talking to.

Combined, they cut API costs by 75–80% compared to a single-model setup — without any perceptible drop in quality for users.

Problem Statement

There are three structural problems that break almost every production AI financial tool I've seen:

1 — The Cost-Cognition Paradox

Financial queries span an enormous complexity range. A basic compound interest calculation needs nothing more than a fast 8B model. A DCF valuation with SEBI compliance constraints requires frontier-level reasoning. Using one model for both is either expensive or unreliable. There's no middle ground with a static setup.

2 — Context Amnesia

Standard stateless LLMs forget the user's risk profile, investment goals, and personal financial context the moment a session ends. Users end up pasting the same background every single time, which destroys the experience and means the agent never actually learns anything about the person it's advising.

3 — Privacy Leakage

Financial planning involves sensitive data — portfolio credentials, tax IDs, bank account details. Routing any of this to public cloud APIs is a compliance and security risk. There's no good reason sensitive identifiers should leave a user's machine.

Proposed Solution

Core Insight: Not all prompts are equal. If you can score a query's complexity before sending it, you can route it to the cheapest model that can handle it — and only escalate when you need to.

Aura addresses each problem directly:

Dynamic query profiling (Cascadeflow) analyzes every prompt against a multidimensional complexity heuristic before it leaves the browser. The score determines which tier the request goes to — no manual configuration needed.

Semantic context hydration (Hindsight) extracts structured facts from the conversation — names, goals, risk preferences, financial context — and persists them to MongoDB. Every subsequent prompt is hydrated with this memory block, giving the model a growing picture of the user.

Privacy-tier routing detects sensitive identifiers (passwords, account numbers, API keys) and bypasses all cloud providers entirely, routing to a locally-running Ollama instance. That data never touches an external API.

Project Objectives

  • Achieve real-time, zero-latency model routing based on dynamic prompt evaluation — no pre-classification step
  • Implement a persistent memory cycle using Hindsight that survives page refreshes and re-logins
  • Guarantee local-first processing for any prompt containing sensitive credentials
  • Build a resilient fallback chain so the agent never goes dark — even if multiple providers are down simultaneously
  • Deliver a real-time intelligence HUD that makes the routing decisions visible to the user

Tech Stack

Frontend

React 19 + Vite 8

Styling

Vanilla CSS + Glassmorphism

Backend

Node.js + Express 5

Auth

Passport.js + JWT

Database

MongoDB Atlas

Memory

Hindsight Engine

Routing

Cascadeflow

AI Cloud

Groq / OpenAI / Gemini / Claude

AI Local

Ollama (Llama 3)

System Architecture

Aura is structured as a client-orchestrated, server-synchronized platform. The key architectural decision was to run the Cascadeflow scoring logic directly in the browser rather than on the server. This keeps routing instantaneous — the user sees model selection update in real time as they type, before they even send the message.

The overall data flow looks like this:

  1. User enters a prompt → Cascadeflow scores it client-side
  2. Hindsight's recall() hydrates the prompt with the user's persistent memory
  3. The compiled system prompt + user message goes to the selected AI provider
  4. On success, retain() extracts new facts and updates memory state
  5. Session state (facts, spend, audit log, conversations) syncs to MongoDB via POST /api/user/sync
  6. On reload, GET /api/user/me rehydrates the entire local state instantly

Why client-side routing? Server-round-trips add 80–200ms before the AI call even starts. For simple queries routed to Groq (which itself responds in ~200ms), that's a 50–100% latency increase for no benefit. Scoring stays in the browser.

Workflow & Methodology

Every single user interaction follows this lifecycle:

Top comments (0)