<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Bhavya Sree</title>
    <description>The latest articles on DEV Community by Bhavya Sree (@bhavya_sree_).</description>
    <link>https://dev.to/bhavya_sree_</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3938650%2F4b4fc579-4526-42d5-a910-f83512559cbb.png</url>
      <title>DEV Community: Bhavya Sree</title>
      <link>https://dev.to/bhavya_sree_</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/bhavya_sree_"/>
    <language>en</language>
    <item>
      <title>Aura Financial Agent</title>
      <dc:creator>Bhavya Sree</dc:creator>
      <pubDate>Mon, 18 May 2026 17:45:44 +0000</pubDate>
      <link>https://dev.to/bhavya_sree_/aura-financial-agent-f3f</link>
      <guid>https://dev.to/bhavya_sree_/aura-financial-agent-f3f</guid>
      <description>&lt;p&gt;How I Cut AI API Costs by 80% Without Sacrificing Response Quality — Aura Financial Agent   :root { --bg: #0b0d14; --surface: #13151f; --border: rgba(255,255,255,0.07); --accent: #7c6af7; --accent2: #38d9a9; --accent3: #f7c56c; --text: #dde3f0; --muted: #7a82a0; --code-bg: #0e101a; } * { box-sizing: border-box; margin: 0; padding: 0; } body { background: var(--bg); color: var(--text); font-family: 'Outfit', sans-serif; font-size: 17px; line-height: 1.8; padding: 0 1rem 6rem; } .hero { max-width: 760px; margin: 0 auto; padding: 5rem 0 3rem; border-bottom: 1px solid var(--border); } .tag-line { font-size: 0.72rem; letter-spacing: 0.18em; text-transform: uppercase; color: var(--accent2); font-family: 'DM Mono', monospace; margin-bottom: 1.2rem; } h1 { font-family: 'Lora', serif; font-size: clamp(2rem, 5vw, 3rem); font-weight: 700; line-height: 1.2; color: #fff; margin-bottom: 1.4rem; } .subtitle { color: var(--muted); font-size: 1.05rem; margin-bottom: 2rem; max-width: 620px; } .meta { display: flex; gap: 1.6rem; flex-wrap: wrap; font-size: 0.82rem; color: var(--muted); font-family: 'DM Mono', monospace; } .meta span { display: flex; align-items: center; gap: 0.4rem; } article { max-width: 760px; margin: 0 auto; padding-top: 3.5rem; } h2 { font-family: 'Lora', serif; font-size: 1.6rem; font-weight: 600; color: #fff; margin: 3.5rem 0 1rem; padding-top: 0.5rem; border-top: 1px solid var(--border); } h3 { font-size: 1.1rem; font-weight: 600; color: var(--accent); margin: 2rem 0 0.6rem; font-family: 'DM Mono', monospace; letter-spacing: 0.03em; } p { margin-bottom: 1.2rem; color: var(--text); } a { color: var(--accent2); text-decoration: none; border-bottom: 1px solid rgba(56,217,169,0.3); transition: border-color 0.2s; } a:hover { border-color: var(--accent2); } code { font-family: 'DM Mono', monospace; font-size: 0.84em; background: var(--code-bg); color: var(--accent2); padding: 0.15em 0.45em; border-radius: 4px; border: 1px solid var(--border); } pre { background: var(--code-bg); border: 1px solid var(--border); border-left: 3px solid var(--accent); border-radius: 8px; padding: 1.4rem 1.6rem; overflow-x: auto; margin: 1.5rem 0 2rem; position: relative; } pre code { background: none; border: none; padding: 0; color: #c9d1f0; font-size: 0.88rem; line-height: 1.7; } .lang-label { position: absolute; top: 0.5rem; right: 0.8rem; font-size: 0.68rem; font-family: 'DM Mono', monospace; color: var(--accent); letter-spacing: 0.1em; text-transform: uppercase; opacity: 0.6; } .callout { background: rgba(124,106,247,0.08); border: 1px solid rgba(124,106,247,0.25); border-left: 3px solid var(--accent); border-radius: 8px; padding: 1.2rem 1.5rem; margin: 1.8rem 0; font-size: 0.95rem; color: #b8bfd8; } .callout strong { color: var(--accent); } .callout-green { background: rgba(56,217,169,0.06); border-color: rgba(56,217,169,0.2); border-left-color: var(--accent2); } .callout-green strong { color: var(--accent2); } .callout-amber { background: rgba(247,197,108,0.06); border-color: rgba(247,197,108,0.2); border-left-color: var(--accent3); } .callout-amber strong { color: var(--accent3); } table { width: 100%; border-collapse: collapse; margin: 1.5rem 0 2rem; font-size: 0.88rem; font-family: 'DM Mono', monospace; } th { background: rgba(124,106,247,0.12); color: var(--accent); font-weight: 500; text-align: left; padding: 0.7rem 1rem; border-bottom: 1px solid rgba(124,106,247,0.2); } td { padding: 0.65rem 1rem; border-bottom: 1px solid var(--border); color: var(--text); vertical-align: top; } tr:last-child td { border-bottom: none; } .badge { display: inline-block; font-size: 0.72rem; font-family: 'DM Mono', monospace; padding: 0.2em 0.6em; border-radius: 4px; letter-spacing: 0.05em; font-weight: 500; } .badge-green { background: rgba(56,217,169,0.12); color: var(--accent2); border: 1px solid rgba(56,217,169,0.25); } .badge-blue { background: rgba(100,175,255,0.1); color: #64afff; border: 1px solid rgba(100,175,255,0.25); } .badge-amber { background: rgba(247,197,108,0.1); color: var(--accent3); border: 1px solid rgba(247,197,108,0.25); } .badge-red { background: rgba(255,100,100,0.1); color: #ff8080; border: 1px solid rgba(255,100,100,0.2); } .stack-grid { display: grid; grid-template-columns: repeat(auto-fill, minmax(160px, 1fr)); gap: 0.8rem; margin: 1.5rem 0 2rem; } .stack-item { background: var(--surface); border: 1px solid var(--border); border-radius: 8px; padding: 0.8rem 1rem; font-size: 0.82rem; font-family: 'DM Mono', monospace; } .stack-item .layer { color: var(--muted); font-size: 0.7rem; margin-bottom: 0.3rem; text-transform: uppercase; letter-spacing: 0.1em; } .stack-item .name { color: var(--text); font-weight: 500; } .results-row { display: grid; grid-template-columns: repeat(3, 1fr); gap: 1rem; margin: 1.5rem 0 2rem; } .result-card { background: var(--surface); border: 1px solid var(--border); border-radius: 10px; padding: 1.2rem; text-align: center; } .result-card .num { font-family: 'Lora', serif; font-size: 2rem; font-weight: 700; color: var(--accent2); display: block; line-height: 1; margin-bottom: 0.4rem; } .result-card .label { font-size: 0.8rem; color: var(--muted); line-height: 1.4; } .divider { border: none; border-top: 1px solid var(--border); margin: 3rem 0; } .team-card { display: flex; align-items: center; gap: 1rem; background: var(--surface); border: 1px solid var(--border); border-radius: 10px; padding: 1rem 1.4rem; margin-bottom: 0.8rem; } .avatar { width: 44px; height: 44px; border-radius: 50%; background: linear-gradient(135deg, var(--accent), var(--accent2)); display: flex; align-items: center; justify-content: center; font-weight: 700; color: #fff; font-size: 1rem; flex-shrink: 0; } .team-name { font-weight: 600; color: #fff; font-size: 0.95rem; } .team-role { font-size: 0.8rem; color: var(--muted); font-family: 'DM Mono', monospace; } .refs { list-style: none; } .refs li { padding: 0.5rem 0; border-bottom: 1px solid var(--border); font-size: 0.9rem; display: flex; align-items: center; gap: 0.6rem; } .refs li:last-child { border-bottom: none; } .ref-dot { width: 6px; height: 6px; border-radius: 50%; background: var(--accent); flex-shrink: 0; } blockquote { border-left: 3px solid var(--accent3); padding-left: 1.2rem; margin: 1.5rem 0; color: var(--muted); font-style: italic; font-family: 'Lora', serif; } ol, ul { padding-left: 1.5rem; margin-bottom: 1.2rem; } li { margin-bottom: 0.5rem; } &lt;a class="mentioned-user" href="https://dev.to/media"&gt;@media&lt;/a&gt; (max-width: 600px) { .results-row { grid-template-columns: 1fr; } h1 { font-size: 1.8rem; } pre { padding: 1rem; font-size: 0.8rem; } }&lt;/p&gt;

&lt;p&gt;AI Engineering · Financial Agents · LLM Architecture&lt;/p&gt;

&lt;h1&gt;
  
  
  How I Cut AI API Costs by 80% Without Sacrificing Response Quality
&lt;/h1&gt;

&lt;p&gt;Building Aura — a multi-LLM financial agent that routes every query to the right model in real time, remembers everything across sessions, and never sends sensitive data to the cloud.&lt;/p&gt;

&lt;p&gt;✦ May 2026 ✦ 12 min read ✦ &lt;a href="https://adaptive-fin-agent.vercel.app/" rel="noopener noreferrer"&gt;Live Demo ↗&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Every AI application I've worked on runs into the same wall eventually: your cloud LLM bill. You start with GPT-4o because the quality is great, you ship fast, and then suddenly you're paying $300/month for an app where 60% of the queries are "what's the difference between SIP and lumpsum." That's not sustainable.&lt;/p&gt;

&lt;p&gt;The obvious solution — swap to a cheaper model — breaks the use cases that actually matter. Financial advice is not a uniform task. Routing "Hi, how are you?" and "Draft a SEBI-compliant risk stress analysis for an equity options portfolio under high interest rate environments" to the same model is either wasteful or dangerous depending on which model you pick.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;Aura Financial Agent&lt;/strong&gt; to solve this properly. It's a financial intelligence platform powered by two core engines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Cascadeflow&lt;/strong&gt; — a real-time multi-LLM routing matrix that scores each prompt for complexity and sends it to the cheapest model capable of handling it.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;a href="https://hindsight.vectorize.io/" rel="noopener noreferrer"&gt;Hindsight&lt;/a&gt;&lt;/strong&gt; — a persistent memory layer that extracts user facts, financial goals, and risk profiles across sessions, so the agent actually knows who it's talking to.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Combined, they cut API costs by 75–80% compared to a single-model setup — without any perceptible drop in quality for users.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem Statement
&lt;/h2&gt;

&lt;p&gt;There are three structural problems that break almost every production AI financial tool I've seen:&lt;/p&gt;

&lt;h3&gt;
  
  
  1 — The Cost-Cognition Paradox
&lt;/h3&gt;

&lt;p&gt;Financial queries span an enormous complexity range. A basic compound interest calculation needs nothing more than a fast 8B model. A DCF valuation with SEBI compliance constraints requires frontier-level reasoning. Using one model for both is either expensive or unreliable. There's no middle ground with a static setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  2 — Context Amnesia
&lt;/h3&gt;

&lt;p&gt;Standard stateless LLMs forget the user's risk profile, investment goals, and personal financial context the moment a session ends. Users end up pasting the same background every single time, which destroys the experience and means the agent never actually learns anything about the person it's advising.&lt;/p&gt;

&lt;h3&gt;
  
  
  3 — Privacy Leakage
&lt;/h3&gt;

&lt;p&gt;Financial planning involves sensitive data — portfolio credentials, tax IDs, bank account details. Routing any of this to public cloud APIs is a compliance and security risk. There's no good reason sensitive identifiers should leave a user's machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Proposed Solution
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Core Insight:&lt;/strong&gt; Not all prompts are equal. If you can score a query's complexity before sending it, you can route it to the cheapest model that can handle it — and only escalate when you need to.&lt;/p&gt;

&lt;p&gt;Aura addresses each problem directly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamic query profiling (Cascadeflow)&lt;/strong&gt; analyzes every prompt against a multidimensional complexity heuristic before it leaves the browser. The score determines which tier the request goes to — no manual configuration needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic context hydration (&lt;a href="https://hindsight.vectorize.io/" rel="noopener noreferrer"&gt;Hindsight&lt;/a&gt;)&lt;/strong&gt; extracts structured facts from the conversation — names, goals, risk preferences, financial context — and persists them to MongoDB. Every subsequent prompt is hydrated with this memory block, giving the model a growing picture of the user.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Privacy-tier routing&lt;/strong&gt; detects sensitive identifiers (passwords, account numbers, API keys) and bypasses all cloud providers entirely, routing to a locally-running Ollama instance. That data never touches an external API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Objectives
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Achieve real-time, zero-latency model routing based on dynamic prompt evaluation — no pre-classification step&lt;/li&gt;
&lt;li&gt;  Implement a persistent memory cycle using &lt;a href="https://github.com/vectorize-io/hindsight" rel="noopener noreferrer"&gt;Hindsight&lt;/a&gt; that survives page refreshes and re-logins&lt;/li&gt;
&lt;li&gt;  Guarantee local-first processing for any prompt containing sensitive credentials&lt;/li&gt;
&lt;li&gt;  Build a resilient fallback chain so the agent never goes dark — even if multiple providers are down simultaneously&lt;/li&gt;
&lt;li&gt;  Deliver a real-time intelligence HUD that makes the routing decisions visible to the user&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;p&gt;Frontend&lt;/p&gt;

&lt;p&gt;React 19 + Vite 8&lt;/p&gt;

&lt;p&gt;Styling&lt;/p&gt;

&lt;p&gt;Vanilla CSS + Glassmorphism&lt;/p&gt;

&lt;p&gt;Backend&lt;/p&gt;

&lt;p&gt;Node.js + Express 5&lt;/p&gt;

&lt;p&gt;Auth&lt;/p&gt;

&lt;p&gt;Passport.js + JWT&lt;/p&gt;

&lt;p&gt;Database&lt;/p&gt;

&lt;p&gt;MongoDB Atlas&lt;/p&gt;

&lt;p&gt;Memory&lt;/p&gt;

&lt;p&gt;Hindsight Engine&lt;/p&gt;

&lt;p&gt;Routing&lt;/p&gt;

&lt;p&gt;Cascadeflow&lt;/p&gt;

&lt;p&gt;AI Cloud&lt;/p&gt;

&lt;p&gt;Groq / OpenAI / Gemini / Claude&lt;/p&gt;

&lt;p&gt;AI Local&lt;/p&gt;

&lt;p&gt;Ollama (Llama 3)&lt;/p&gt;

&lt;h2&gt;
  
  
  System Architecture
&lt;/h2&gt;

&lt;p&gt;Aura is structured as a &lt;em&gt;client-orchestrated, server-synchronized&lt;/em&gt; platform. The key architectural decision was to run the Cascadeflow scoring logic directly in the browser rather than on the server. This keeps routing instantaneous — the user sees model selection update in real time as they type, before they even send the message.&lt;/p&gt;

&lt;p&gt;The overall data flow looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; User enters a prompt → Cascadeflow scores it client-side&lt;/li&gt;
&lt;li&gt; Hindsight's &lt;code&gt;recall()&lt;/code&gt; hydrates the prompt with the user's persistent memory&lt;/li&gt;
&lt;li&gt; The compiled system prompt + user message goes to the selected AI provider&lt;/li&gt;
&lt;li&gt; On success, &lt;code&gt;retain()&lt;/code&gt; extracts new facts and updates memory state&lt;/li&gt;
&lt;li&gt; Session state (facts, spend, audit log, conversations) syncs to MongoDB via &lt;code&gt;POST /api/user/sync&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt; On reload, &lt;code&gt;GET /api/user/me&lt;/code&gt; rehydrates the entire local state instantly&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Why client-side routing?&lt;/strong&gt; Server-round-trips add 80–200ms before the AI call even starts. For simple queries routed to Groq (which itself responds in ~200ms), that's a 50–100% latency increase for no benefit. Scoring stays in the browser.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflow &amp;amp; Methodology
&lt;/h2&gt;

&lt;p&gt;Every single user interaction follows this lifecycle:&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>api</category>
      <category>showdev</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>Bhavya Sree</dc:creator>
      <pubDate>Mon, 18 May 2026 17:19:15 +0000</pubDate>
      <link>https://dev.to/bhavya_sree_/-5273</link>
      <guid>https://dev.to/bhavya_sree_/-5273</guid>
      <description></description>
    </item>
  </channel>
</rss>
