DEV Community: Shivani Saraf

Building a Multi-Persona Finance Agent with Persistent Memory: Inside Vorniq

Shivani Saraf — Sun, 28 Jun 2026 10:34:04 +0000

Building a Multi-Persona Finance Agent with Persistent Memory: Inside Vorniq
A technical deep-dive into five expert AI agents, one shared memory layer, and what it takes to make them feel like a single coherent financial advisor.

The Problem: Expert Amnesia
Most AI finance tools have a blindspot that nobody talks about: they are brilliant in isolation and amnesiac across sessions. You tell your investment assistant you have a moderate risk appetite and a five-year horizon, and three days later, when you switch to a tax question, it has no idea who you are. The expertise is there. The continuity is not.
That is the gap I built Vorniq to close. It is a personal finance intelligence system built around five specialized expert personas — a Bookkeeper, a Financial Analyst, an FP&A Analyst, an Investment Researcher, and a Tax Strategist — all sharing a single persistent memory layer powered by Hindsight by Vectorize. When Quinn, the Investment Researcher, learns you have a five-lakh corpus with moderate risk tolerance, Cassandra, the Tax Strategist, knows it on your very next session — without you repeating a word.

What Vorniq Does and How It Hangs Together
At its core, Vorniq is a Next.js 14 frontend talking to an Express/TypeScript backend. The backend handles three things: recalling financial context from Hindsight before each LLM call, routing the message to the correct persona's system prompt, and retaining new facts back to memory after the response is generated. The LLM itself is Groq running qwen/qwen3-32b — fast enough that the recall-generate-retain loop does not introduce perceptible latency.
The five personas are not cosmetic aliases for the same prompt. Each one has a distinct system instruction tuned to its domain:
• Controller: Month-end close, GAAP reconciliation, internal controls, audit readiness
• Financial Analyst: DCF modeling, scenario analysis, valuation, capital structure
• FP&A Analyst: Rolling forecasts, variance analysis, driver-based budgeting
• Investment Researcher: Portfolio due diligence, equity research, risk assessment
• Tax Strategist: Section 80C deductions, LTCG optimization, multi-jurisdiction compliance
The user selects a persona per session through a command palette in the sidebar. Switching personas mid-conversation is a first-class action — not a workaround. And because Hindsight memory persists across all five, switching from the Analyst to the Tax Strategist is less like opening a new app and more like walking from one specialist's office to the next in the same building.

The Core Technical Story: Memory as First-Class Infrastructure

The single architectural decision that made everything else possible was treating memory not as a feature bolted onto the chat layer, but as the primary infrastructure concern — resolved before a single token of generation begins.
Here is the sequence inside backend/src/agent/core.ts on every request:
try {
// Memory is optional — if Hindsight is down, chat still works
try {
await ensureBank(bankId);
} catch (bankErr: unknown) {
console.warn([core] ensureBank failed (Hindsight may be offline): ${(bankErr as Error).message});
}
const memories = await recall(bankId, message);
const systemPrompt = getPersonaPrompt(persona, formatMemories(memories));
const completion = await groq.chat.completions.create({
model: settings.GROQ_MODEL,
messages: [
{ role: "system", content: systemPrompt },
...history.slice(-10),
{ role: "user", content: message },
],
temperature: 0.7,
max_tokens: 1024,
});
The Recall step is where Hindsight earns its keep. It runs a TEMPR search — a combination of semantic vector similarity, BM25 keyword retrieval, graph traversal across related facts, and temporal weighting that ranks recent information higher. A single query for "what should I invest in?" returns not just investment preferences, but linked debt obligations, income constraints, and tax-bracket context that were stored in completely separate sessions.
The Retain step is equally deliberate. Rather than dumping the entire conversation back into the memory store, the system extracts discrete financial facts — income figures, debt balances, risk tolerances, stated goals — and stores them as structured entries keyed to the user's bankId. This keeps retrieval clean and prevents the memory store from becoming a noisy transcript archive.
Cross-Persona Memory in Practice
The most revealing test of the system is what happens when you switch personas mid-workflow. Here is a real session sequence:
// Session 1 — Investment Researcher
User: "I have ₹5L to invest, moderate risk, 5-year horizon."
Investment Researcher: [provides allocation across equity, debt, hybrid funds]
// Retained: investable_capital=₹5L, risk=moderate, horizon=5yr

// Session 2 — Tax Strategist
User: "What's the most tax-efficient way to invest?"
Tax Strategist: "Given your ₹5L corpus and 5-year horizon Quinn noted,
ELSS funds give you Sec 80C deduction on ₹1.5L
and LTCG exemption after 1 year."
Tax Strategist did not ask Investment Reasearcher what the user's situation was. The memory system bridged the two sessions invisibly. That is the behavior the architecture was designed to produce — and in practice, it works on session counts well into the double digits, accumulating up to 105 retained memories in a single user's bank.

Why Hindsight, and What It Actually Does
I evaluated a few memory backends before landing on Hindsight. The deciding factor was TEMPR — its hybrid retrieval strategy. Pure semantic search misses exact figures ("5L" and "five lakh" are semantically close but keyword retrieval is more reliable for number-heavy financial data). Pure BM25 misses conceptually related facts. Hindsight runs both, then applies graph traversal to pull in linked entities and temporal weighting to surface recent updates over stale entries.
The Reflect endpoint is a useful addition for session hygiene. Calling POST /reflect at session end triggers Hindsight to consolidate the session's retained facts into a coherent profile update — deduplicating, resolving conflicts, and pruning superseded data. Without it, a user who updates their income figure across three sessions would accumulate three separate income entries rather than a single current value.
Memory is stored per user in a named bankId. The backend/.env sets this as HINDSIGHT_BANK_ID=finsight-bank for local development, but in production this would be a per-user identifier so financial data is properly scoped and isolated.
Persona Prompts: Where Domain Expertise Lives
Each persona's system prompt in backend/src/agent/personas.ts does two things:
it establishes the expert identity Controller who cares about GAAP compliance and segregation of duties; Financial analyst who defaults to DCF and scenario analysis, and it specifies how recalled memories should be integrated into responses.
The memory injection is not naive concatenation. The system prompt instructs each persona to surface recalled context only when it is relevant to the current query, and to cite the source — "as you mentioned in a previous session" or "based on the income figure Investment Researcher noted." This keeps responses grounded without turning every reply into a recitation of everything the system remembers.
One design decision I am glad I made early: keeping persona switching stateless on the backend. The frontend sends { persona, bankId, message, history } on every request. The backend does not maintain session state between calls. This means the frontend can switch personas mid-conversation without any server-side coordination — the correct persona prompt is just selected from the map and injected fresh on each call.

Results: What the System Actually Produces

The concrete outputs are what matter most, so here is what the system produces in practice. When a user with a financial profile of ₹1,20,000/month income, ₹5L in mutual funds, ₹5L car loan, ₹50,000 monthly budget queries Financial Analyst, they receive a structured quick-look summary breaking down gross income, stated surplus of ₹70,000, current investments, and debt obligations — with comments flagging unknowns like the car loan interest rate.
That same user, switching to Tax Strategist in a later session, receives a four-column table covering ELSS vs. balanced fund allocation: what deduction each provides, the LTCG tax treatment, and post-tax CAGR projections of 10–12% and 7–9% respectively. The response references the ₹5L corpus and 5-year horizon from the Investment Researcher session without prompting.
Controller handles requests in a different register entirely — responding to "What can I do today?" with a structured to-do list covering journal entry review, high-risk account reconciliation, AP/AR aging, close-calendar status, and team process improvement — each item with specific sub-tasks rather than generic advice.
Lessons Learned
Five takeaways worth carrying into your next agent build:

Memory before generation, always. The recall step must happen before the LLM call, not after. Any architecture that tries to "stitch in" context during or after generation produces inconsistent results. Treat memory retrieval as a hard dependency, not an optimization.
Stateless persona routing scales better. Sending the full context on every request means you get horizontal scalability for free. The backend can be replicated without any session affinity requirements.
Reflect is not optional. Without periodic consolidation, memory stores accumulate contradictory facts. A user who updates their income three times ends up with three entries that confuse retrieval. The Reflect step keeps the profile coherent.
Domain expertise lives in the prompt, not the model. Qwen3-32b is competent across all five domains. The differentiation between Dana's GAAP focus and Cassandra's tax optimization comes entirely from the persona system prompt. Investing in prompt quality compounds over time.
Hybrid retrieval is non-negotiable for financial data. Financial facts are a mix of semantically meaningful context and precise numerical values. Semantic-only retrieval loses exact figures. Keyword-only retrieval misses conceptual connections. You need both, plus temporal weighting to surface the most recent version of any given fact.

What This Opens Up
The most interesting thing about Vorniq is not the five personas — it is the memory architecture that makes them feel coherent. The same pattern applies to any domain where a user interacts with multiple specialized agents over time: healthcare, legal research, engineering design review. The challenge in each case is identical: how do you ensure that what one agent learns is available to all others without rebuilding it from scratch each session?
Persistent, structured, cross-session memory is the answer. And tools like Hindsight make it operationally straightforward enough that you can focus on what your personas actually know — not on plumbing the infrastructure to make them remember it.
The full backend is three files: core.ts for the recall-generate-retain loop, memory.ts for the Hindsight client wrapper, and personas.ts for the five system prompts. If you are building something similar, that is the surface area you need to own. Everything else is standard web infrastructure.

Resources: Hindsight GitHub | Hindsight Docs | Vectorize Agent Memory Overview | Groq Console
References: -
The Hindsight GitHub repository: https://github.com/vectorize-io/hindsight
The documentation for Hindsight: https://hindsight.vectorize.io/
GitHub Link of project: https://github.com/Durgaprasad044/VORNIQ
Built with Hindsight for persistent agent memory, Groq for inference, and Next.js + Express for the stack

Building a Multi-Persona Finance Agent with Persistent Memory: Inside Vorniq

Shivani Saraf — Sun, 28 Jun 2026 10:34:04 +0000

The Core Technical Story: Memory as First-Class Infrastructure

Results: What the System Actually Produces

Memory before generation, always. The recall step must happen before the LLM call, not after. Any architecture that tries to "stitch in" context during or after generation produces inconsistent results. Treat memory retrieval as a hard dependency, not an optimization.
Stateless persona routing scales better. Sending the full context on every request means you get horizontal scalability for free. The backend can be replicated without any session affinity requirements.
Reflect is not optional. Without periodic consolidation, memory stores accumulate contradictory facts. A user who updates their income three times ends up with three entries that confuse retrieval. The Reflect step keeps the profile coherent.
Domain expertise lives in the prompt, not the model. Qwen3-32b is competent across all five domains. The differentiation between Dana's GAAP focus and Cassandra's tax optimization comes entirely from the persona system prompt. Investing in prompt quality compounds over time.
Hybrid retrieval is non-negotiable for financial data. Financial facts are a mix of semantically meaningful context and precise numerical values. Semantic-only retrieval loses exact figures. Keyword-only retrieval misses conceptual connections. You need both, plus temporal weighting to surface the most recent version of any given fact.