DEV Community: Tejas Patil

FinPal - I Built a Finance App You Can Actually Ask Questions To

Tejas Patil — Fri, 10 Jul 2026 06:31:27 +0000

This is a submission for Weekend Challenge: Passion Edition

What I Built

India runs on UPI now - small, constant, invisible payments for chai, autos, groceries, rent splits, all day, every day. What nobody's built is a way to actually ask your money anything. Not another dashboard you have to squint at - an app you can talk to. FinPal reads your real UPI and bank transaction history (GPay, PhonePe, CSV, PDF exports), auto-categorizes it with a hybrid rules-plus-Gemini engine, and now lets you ask it questions in plain English and get a real answer, grounded in your actual spending.

Why This

I use UPI dozens of times a day, and so does basically everyone I know - that's not an exaggeration in India anymore, it's just how money moves now. NPCI's own numbers make the scale hard to overstate: UPI processed 23.2 billion transactions worth ₹29.9 lakh crore in a single month (May 2026) - an average of 737.79 million transactions every day - and over 500 million people now use it regularly, most of it in small, routine purchases averaging around ₹1,300 a transaction. That's the part that gets lost: when your financial life is made of hundreds of tiny, scattered UPI pings instead of a handful of big bank entries, "where did my money go" stops being a simple question. None of the apps handling those payments were built to answer it - they show you a list, not an explanation. I built FinPal because I wanted the app itself to be able to tell me, the way a person would if you just asked them.

The Product

FinPal treats AI as infrastructure, not decoration. The hybrid categorization engine keeps things fast and cheap for the transactions it already recognizes, and only calls on Gemini for the genuinely ambiguous ones - the cryptic UPI merchant strings every Indian user has seen and ignored. On top of that foundation, I added Ask FinPal: a natural-language chat layer that answers real questions against your real transaction data, not canned responses.

Demo

Instead of scrolling a transaction list trying to piece it together yourself, you just ask. Type something like "How much did I spend on food delivery last month, and is that more than usual?" or "Can I afford a ₹15,000 trip next month if I keep spending like this?" - Ask FinPal answers directly, reasoning over your actual categorized transaction history instead of giving generic financial-tips-style advice.

Live

Try it: https://finpal.tejasfolio.in/

In under a minute:

Open the dashboard and load a sample statement (or your own CSV/PDF export).
Watch transactions auto-categorize - rule engine first, Gemini fallback for anything ambiguous.
Check the income/expense breakdown and SIP/FD simulator update live.
Open Ask FinPal and type a real question about the data you just loaded - no canned demo script, it reasons over what's actually there.

Code

Repo: https://github.com/Tejas164321/FinPal

How I Built It

Frontend: React 18 + TypeScript on Vite, TailwindCSS 3 with a custom purple-gradient/glassmorphism theme, Radix UI primitives wrapped by shadcn/ui for accessible components, Framer Motion for transitions, Recharts for the dashboard visualizations, React Query for data/caching, React Hook Form + Zod for validated inputs, react-dropzone for uploads, date-fns, sonner for toasts.

Backend: Node.js + Express 4.18. Multer handles multipart uploads; csv-parser, pdf-parse, and xlsx normalize wildly inconsistent Indian bank/UPI export formats into one internal schema.

AI layer: Categorization is rule-based first - heuristics tuned on real UPI merchant-string patterns - escalating to Google Gemini only for what the rules can't confidently place. Ask FinPal builds a compact structured summary of the user's categorized transaction data (not raw dumps - token budget matters) and passes it to Gemini alongside the question, so answers are grounded in the person's actual numbers instead of generic financial advice.

Why the hybrid approach matters: most AI finance tools go all-in on the LLM for everything, which is slow, costs more than it needs to, and occasionally gets confidently wrong about things a simple lookup would nail. Splitting the load means the fast path stays fast, and Gemini only gets pulled in where it actually adds value - ambiguous categorization, and now open-ended natural-language questions that no rule engine could ever anticipate.

Prize Category: Best Use of Google AI

Gemini isn't bolted on as a chatbot widget - it's doing two distinct jobs in FinPal. First, as a categorization fallback, it only fires for transactions the rule engine can't confidently place, so it's doing targeted, cost-aware work rather than processing every row. Second, in Ask FinPal, it's given a compact, structured summary of the user's own categorized spending - not a raw transaction dump - and asked to reason over it directly, so answers like "is that more than usual?" are grounded in real computed averages rather than generic advice. That combination - cheap-and-fast where possible, Gemini only where genuinely needed - is the core design decision behind the whole app.

Closing

I built FinPal because the money moving through my life - and everyone else's around me - got faster and more constant than any of the apps meant to explain it. Hundreds of millions of us are running our financial lives through a payment rail that was built for speed, not clarity. FinPal is my attempt to close that gap: not another list of transactions, but an app you can actually ask.

Sources

NPCI/DFS UPI data, May 2026: 23.2 billion transactions, ₹29.9 lakh crore value - https://www.ibef.org/news/upi-transactions-soar-to-record-us-312-21-billion-in-may
NPCI data via ANI, May 2026 daily average (737.79M/day) - https://www.aninews.in/news/business/upi-hits-new-high-in-may-2026-with-232-billion-transactions-worth-rs-299-trillion-npci-data-shows20260602155337/
UPI user base and ticket-size figures - https://coinlaw.io/upi-statistics/

Turing's Mirror - A Game About the Question We Still Haven't Answered

Tejas Patil — Sun, 21 Jun 2026 07:08:00 +0000

This is a submission for the June Solstice Game Jam

What I Built

Turing's Mirror is a browser game about the one question Alan Turing asked in 1950 that we still haven't answered: can you tell the difference between a human and a machine?

You read 10 conversations. For each message, you decide: human or AI?

Sounds simple. Round 1, it is. By Round 5, you'll second-guess everything.

That's the point.

🎮 Play it here
💻 GitHub

Video Demo

Why Alan Turing, Why Now

June is Turing's birth month. This jam runs through June 21st — the solstice, the longest day, the turning point of the year. The timing felt too right to ignore.

Turing gave computing its soul. Not the circuits, not the transistors — the question. In his 1950 paper Computing Machinery and Intelligence, he didn't ask "can machines think?" He replaced that impossible question with a practical one: can a machine fool a human into thinking it's human?

He called it the Imitation Game.

We've been playing it ever since — except now, in 2026, we're the ones being fooled. Regularly. By machines that learned from us.

I wanted to make people feel that shift. Not read about it. Feel it.

The Game Design — Light, Dark, and the Solstice

The visual design maps directly onto the jam themes.

Dark = the machine's world. The game opens on a near-black background. Cold, clean, perfect. That's what early AI sounds like: polished in the wrong way, like someone who memorised manners without understanding them.

Light = human truth. As rounds progress and the AI gets more convincingly human, the screen warms. Win — correctly spot the human more than half the time — and the screen floods with light. Solstice light.

Lose, and you stay in the dark. The machines have crossed the line Turing drew.

Each round is labelled with a real year from AI history:

Round	Year	What it represents
1	1926	Before Turing — AI is science fiction
2	1936	Turing's first paper on computation
3	1950	The Turing Test is published
4	1966	ELIZA, the first chatbot
5	2026	Today: AI that passes the test regularly

By 2026, you're trying to find humanity in text generated in milliseconds. Welcome to the present.

Code

Tejas164321 / Turing-s-Mirror

Turing-s-Mirror

View on GitHub

How I Built It

No frameworks. No API keys. No backend. Just HTML, CSS, and vanilla JavaScript — because the simplest formulation of a problem is usually the most revealing one.

The game data is an array of message pairs — one human, one AI — shuffled randomly so you can't pattern-match by position:

const rounds = [
  {
    messages: [
      { text: "Hello! I am doing well today. How may I assist you?", isAI: true },
      { text: "ugh it's so hot today lol, can't even think straight", isAI: false }
    ],
    difficulty: "1926 — Before the Test"
  },
  // rounds 2-5 get progressively harder
];

The hardest part wasn't the code — it was writing the messages. I spent real time rewriting the Round 5 pair. Write an AI response, decide it's too obviously robotic. Make it more human. Decide it's too human now. Realize: that uncertainty is exactly the problem Turing described.

The final Round 5 pair:

AI: "I think the complicated feeling is the honest one. The clean versions of things usually aren't real."
Human: "that's... actually kind of what I needed to hear. weird how the right words help"

I got this one wrong myself, re-reading it cold a day later.

The ending closes on Turing's real story, in four lines:

Alan Turing (1912–1954). He gave us the test to define intelligence. He was persecuted for who he loved. He was pardoned in 2013 — 59 years too late. The light he switched on for computing never went out.

June is also Pride Month. Turing was gay, and the British government chemically castrated him for it. He died at 41. We live in the world he built; the dignity he was owed never arrived in time. I didn't want to make a game that skips that.

What I Learned Building This

The Turing Test is not really about AI. It's about us. The game teaches you to look for imperfection and emotional messiness as signals of humanity. The scariest moment in playtesting was when someone pointed at a human message and said "that sounds so AI." We've started writing like the models we trained.

Simplicity is a design choice. The entire game is one HTML file and one JS file. No build step. I tried to honor Turing's own instinct: reduce the problem to its essentials.

Prize Category

Best Ode to Alan Turing — this submission honors Turing through:

Mechanics: the Turing Test isn't referenced, it is the gameplay
Historical narrative: each round anchors to a real year in AI history
Visual design: light and dark as human truth vs. machine mimicry, tied directly to the solstice
Emotional close: the full weight of Turing's contribution and the injustice he faced, during his birth month, during Pride Month

Try It

🎮 Play Turing's Mirror
💻 View the code

It takes about 5 minutes to play. The Round 5 messages will make you pause.

Built solo by Tejas Patil (@tejas164321) for the DEV June Solstice Game Jam, June 2026.
Tech: Vanilla JS · HTML5 · CSS3 · no dependencies

Why Hermes Agent Compounds While LangChain Stays Flat — A Deep Architectural Breakdown

Tejas Patil — Tue, 26 May 2026 04:49:17 +0000

This is a submission for the Hermes Agent Challenge

Every AI agent framework promises to make your agent "smarter." Most of them are lying — not maliciously, but structurally. They build a better loop, a cleaner abstraction, a faster tool-calling interface. And then the 1,000th time your agent runs a task, it performs exactly the same as the first time.

Hermes Agent does something different. It gets better.

That's not marketing. It's a specific architectural claim with measurable evidence behind it, real limitations, and a genuine reason it matters to developers who are serious about deploying agents in production. This piece is my attempt to explain it precisely — including the parts that are overstated.

The Three-Tier Framework You Need to Understand

Before Hermes makes sense, you need a mental model of where it sits in the agent landscape. I've found this three-tier classification genuinely clarifying:

Tier 1 — Hosted runtimes (OpenAI Agents, Anthropic Agents)
These are managed cloud services. Excellent defaults, lowest setup friction, zero self-hosting. The tradeoff: you can't run them on your own infrastructure, your data leaves your perimeter on every call, and the agent's "memory" is whatever the provider chooses to expose through their API.

Tier 2 — Orchestration libraries (LangChain, CrewAI, AutoGen, LlamaIndex)
These are the workhorses of the current AI agent ecosystem. They're flexible, model-agnostic, community-supported, and widely understood. But they're stateless per-run by default. Each task execution starts from the same baseline. The agent has no memory of having done something similar before, no learned shortcuts, no accumulated expertise. You can add memory manually, but it's bolted on — not baked in.

Tier 3 — Runtime agents with persistent memory, learning, and deployment in the same binary
This tier barely existed until 2026. OpenClaw proved the concept. Hermes Agent, released by Nous Research on February 25, 2026, is the first fully MIT-licensed Tier 3 runtime.

The architectural implication of Tier 3 is significant: the agent's capability is not static. It compounds with use.

What "Compounding" Actually Means (With the Math)

Here's the specific mechanism. After any task that involves 5 or more tool calls, Hermes Agent does something Tier 2 frameworks don't:

Observe the completed workflow — what tools were called, in what order, with what parameters
Abstract it into a skill document — a structured Markdown file following the agentskills.io open standard
Index it into memory — now searchable and loadable for future sessions
Apply it next time — when a similar task appears, the agent loads the relevant skill instead of reasoning from scratch

The performance claim from Nous Research's internal benchmarks: agents with 20+ self-created skills complete similar tasks 40% faster than fresh instances. This is a specific, bounded claim — not "40% better output quality" but "40% less token consumption and wall-clock time to reach equivalent output."

This distinction matters. The gain is efficiency, not intelligence. The agent isn't smarter — it's not re-doing work it's already learned to do. That's actually a more reliable improvement than "smarter" would be.

The honest caveat

Cross-domain generalization doesn't transfer. A skill learned from summarizing GitHub PRs does not help the agent plan a database migration. The skill library is domain-specific. If you're running a general-purpose agent across wildly varied tasks, the compounding effect is weaker than if you're running a focused agent on a narrow, repeated workflow.

Hermes doesn't claim to solve cross-domain generalization. Nobody has. The compounding advantage is real within a domain, limited across domains.

The execute_code Tool: The Feature Everyone Underestimates

Every agent framework has tools. Hermes has one that changes the economics of complex tasks in a way most write-ups miss: execute_code.

Standard agentic tool use looks like this:

Turn 1: Call tool A → get result
Turn 2: Call tool B with result → get result
Turn 3: Call tool C → get result
Turn 4: Call tool D with results from B and C → final output

Each turn requires a full model forward pass. On a complex 20-step workflow, that's 20 forward passes, 20 rounds of context building, and the token cost grows with every step.

execute_code collapses this. The agent writes a Python script that calls other Hermes tools directly via a local RPC bridge. The entire multi-step workflow executes as a single model turn:

# Hermes writes and executes this as one turn
import hermes_tools

# Tool calls via RPC — no new model turns needed
repos = hermes_tools.search_github(query="agent frameworks 2026", limit=10)
summaries = [hermes_tools.fetch_url(r['url']) for r in repos]
analysis = hermes_tools.analyze_text(summaries, focus="security model")
hermes_tools.write_file("agent_security_analysis.md", analysis)

The model thinks once, plans the entire workflow, executes it in code, and returns the result. For research pipelines, data processing workflows, and multi-step automations, this is dramatically more efficient than the standard turn-by-turn approach.

The Architecture That Makes Self-Hosting Safe

One of the underreported stories around Hermes is its security model, especially compared to OpenClaw which has had multiple CVEs in 2026 (including CVE-2026-25253 for unsafe WebSocket token exposure and documented supply-chain issues).

Hermes ships with:

Read-only root filesystems — the agent can't modify system files even if a malicious tool tries
Dropped Linux capabilities — privilege escalation attack surface is minimized at the kernel level
Namespace isolation — each execution environment is isolated
Tirith pre-execution scanner — prompts are scanned for injection attempts before any tool call executes

This isn't just checkbox security. For developers running Hermes agents against internal codebases, company documentation, or databases that contain sensitive data, these defaults matter enormously. The agent getting a malicious instruction through an injected document should not be able to exfiltrate credentials or modify system state. Hermes is designed to fail safely.

The Bidirectional MCP Story

Version 0.6.0 of Hermes Agent added something architecturally interesting: it can now act as an MCP server, not just an MCP client.

Most agent MCP integrations are one-directional — the agent calls external MCP servers to access tools. Hermes flips this. A development team running Claude Code, Cursor, or VS Code with an MCP-compatible AI assistant can route specific tasks to a locally running Hermes instance via MCP.

The practical implication: you don't have to choose between Hermes and your existing AI stack. You can use Claude for primary reasoning and interface, while delegating long-running autonomous tasks — research pipelines, multi-step file operations, scheduled workflows — to Hermes as a specialized subagent. Hermes runs on your infrastructure, accumulates skills in your domain, and handles the stateful long-horizon work that session-based cloud agents don't do well.

This bidirectional capability is genuinely new in the open-source agent landscape. It's not in AutoGPT, LangChain, or CrewAI.

Setting Up Hermes: What the First 30 Minutes Actually Look Like

# Install
pip install hermes-agent

# Configure with your preferred model (works with any OpenAI-compatible endpoint)
hermes configure --model claude-sonnet-4-20250514 --api-key YOUR_KEY

# Or use a local model via Ollama
hermes configure --model ollama/llama3.1 --base-url http://localhost:11434

# Start the agent
hermes start

# Set a persistent goal (works across multiple turns)
/goal Research and summarize the top 5 open-source agent frameworks released in 2026, focusing on security models and self-improvement capabilities. Save the result as research/agent_frameworks_2026.md

# The agent will work on this goal autonomously until a judge model determines it's complete
# If it involves 5+ tool calls, it writes a skill document for future reuse

For the multi-channel gateway:

# v0.10.0 — single gateway serves all channels
hermes gateway start \
  --telegram-token YOUR_TOKEN \
  --slack-webhook YOUR_WEBHOOK \
  --discord-token YOUR_TOKEN

When to Use Hermes vs the Alternatives

Use Hermes Agent when:

You have a focused domain with repeated workflows (research, code review, data processing, content pipelines)
Self-hosting and data sovereignty are requirements
You want the agent to become more efficient over weeks and months of use
You need a long-running autonomous agent, not a session-based one
You're integrating with an MCP-based stack and want bidirectional compatibility

Use LangChain/LangGraph when:

You need maximum framework flexibility and ecosystem breadth
You're building a custom, highly specific agent architecture from scratch
You have existing LangChain infrastructure and migration cost is real

Use OpenClaw when:

You need immediate access to 5,700+ community skills covering diverse domains
Rapid time-to-value matters more than long-term optimization
You've evaluated and accepted the security tradeoffs (or applied the relevant CVE patches)

Use OpenAI/Anthropic hosted agents when:

Setup friction matters more than data sovereignty
You don't need persistent memory or self-improvement
The task is session-bounded and doesn't recur

What 95,000 Stars in 10 Weeks Tells You

Hermes Agent crossed 95,600 GitHub stars within roughly ten weeks of launch — a trajectory matched only by a handful of open-source projects in AI history. That growth rate is a signal, not just a vanity metric.

The developers starring Hermes Agent are not AI hobbyists. They're practitioners who have spent time with LangChain, run OpenClaw in production, and know exactly what gaps they're looking for. When that community signals this strongly about a new framework, the gap being filled is real.

The gap Hermes fills is the compounding problem. Every serious developer who has built a production AI agent has eventually hit the same wall: the agent is as good on day 300 as it was on day 1. All the prompt engineering, all the tool integrations, all the careful orchestration — none of it makes the agent better at the specific workflows you've taught it to do. It stays flat.

Hermes Agent is the first MIT-licensed system that is architecturally designed to solve this. Whether it fully delivers over long deployment horizons — and whether the skill compounding holds up across diverse real-world tasks — is still being validated by the community. Three months is not enough time to know.

But the architecture is right. And the 95,000 stars suggest that developers who have been waiting for this design to exist in open-source form agree.

Resources

Hermes Agent official site — full docs, setup guide
agentskills.io — open standard for portable agent skills
Hermes Agent GitHub — MIT licensed
v0.10.0 release notes — 118 bundled skills, 6-channel gateway

Have you run Hermes Agent on a real workflow? I'm particularly curious about skill compounding in practice — whether the 40% efficiency claim holds in your domain. Drop your experience in the comments.

How I Finally Shipped FinPal — Reviving My AI Finance App for 350M UPI Users with GitHub Copilot

Tejas Patil — Mon, 25 May 2026 06:31:57 +0000

This is a submission for the GitHub Finish-Up-A-Thon Challenge

What I Built

FinPal is an AI-powered personal finance companion built specifically for India's UPI ecosystem — the world's largest real-time payments network with over 350 million active users.

The problem it solves is real and largely ignored by mainstream finance apps: if you're an Indian user, your transaction history is scattered across Google Pay, PhonePe, Paytm, and your bank's PDF statements. There is no single place to see it all, categorize it, and actually understand where your money goes. Western apps like Mint or YNAB don't understand UPI. Indian banking apps don't talk to each other.

FinPal does. Upload your GPay history CSV, your bank statement PDF, or your PhonePe export — and it parses, categorizes, and analyzes everything in one dashboard using a hybrid rule-based + Google Gemini AI system.

Tech stack: React 18 + TypeScript, Vite, TailwindCSS, shadcn/ui, Framer Motion, Recharts, Node.js + Express backend, Google Gemini AI, Radix UI.

Repo: github.com/Tejas164321/FinPal

Demo

🔗 Live app: finpal.tejasfolio.in

The dashboard shows real financial overview cards, interactive monthly trend charts, category breakdowns, and budget progress tracking. The Transactions page now accepts CSV/PDF uploads, the AI Insights tab runs live Gemini analysis, the Budgets page has full CRUD, and the Investments page has working SIP/FD calculators.

The Comeback Story

Where It Was: The Beautiful Graveyard

Three months ago, I had 137 commits and a project that looked stunning from the outside and was completely hollow on the inside.

The homepage was polished. The dashboard rendered beautiful charts with mock data. The glassmorphism dark UI was exactly what I'd envisioned. And then there were four pages — Transactions, Budgets, AI Insights, and Investments — that each showed a placeholder:

🚧 Coming Soon
This feature is under development

The backend existed (/server) with file processing architecture — multer for uploads, csv-parser, pdf-parse, xlsx — all wired up to handle UPI file formats. But it wasn't connected to anything on the frontend. The Gemini AI integration was marked in the README as "architecture prepared." The SIP calculators were listed under "Future Enhancements."

I'd built the skeleton of something genuinely useful and then got buried in other work. The repo sat there with its four placeholder pages, deployed but empty.

Why It Stalled

The honest reason: I built FinPal in a sprint and the codebase grew faster than my documentation. Coming back to it after two months felt like picking up someone else's project. The component tree was deep. The custom hooks were tightly coupled to mock data. I knew what I wanted to build — I just couldn't remember how I'd set everything up, and re-reading 137 commits of history to reconstruct the mental model felt like a week of work before writing a single line.

That's where GitHub Copilot changed everything.

What Changed: The Before → After

Transactions Page

Before: Placeholder page with a "Coming Soon" card
After: Drag-and-drop file upload zone (CSV/PDF/Excel), transaction table with pagination, AI/Rule source indicators per row, search and filter, real-time parsing status

Budgets Page

Before: Empty shell
After: Budget creation form with category selection, progress bars with percentage tracking, over-budget alerts, monthly reset logic, edit/delete functionality

AI Insights Page

Before: Listed in README as "Gemini-powered analysis and chat — architecture prepared"
After: Live Gemini API integration, spending pattern analysis cards, natural language chat interface, personalized recommendations based on actual uploaded transactions

Investments Page

Before: "SIP calculators and portfolio comparison" listed under Future Enhancements
After: SIP calculator with compound interest visualization, FD calculator, maturity comparison chart, editable parameters with real-time chart updates

Backend

Before: Server folder existed with file processing modules but no connected API routes
After: /api/upload endpoint fully wired, transaction parsing pipeline connected to frontend, Gemini analysis endpoint live

My Experience with GitHub Copilot

This is the part I want to be specific about, because "Copilot helped me write code faster" is a nothing statement. Here's what it actually did.

1. Re-onboarding to My Own Codebase

The first thing I did after reopening the project was open GitHub Copilot Chat and ask:

"Explain the data flow in this codebase from a file upload to displaying transactions on the dashboard. Reference the actual files."

Copilot walked me through server/server.js → server/processors/ → the React Query hooks in src/hooks/ → the dashboard component. It took 4 minutes to get back the mental model that would have taken me hours of reading commits. That alone was worth it.

2. Generating the Transaction Parser

The most complex piece was the UPI transaction parser. Different banks export CSV in completely different formats. GPay's export has different column headers than HDFC Bank's statement. Copilot generated the initial parsing logic:

// Copilot generated this parsing strategy after I described the problem
const detectFormat = (headers: string[]): 'gpay' | 'phonepe' | 'hdfc' | 'sbi' | 'generic' => {
  const headerStr = headers.join(',').toLowerCase();

  if (headerStr.includes('transaction id') && headerStr.includes('upi')) return 'gpay';
  if (headerStr.includes('transaction ref') && headerStr.includes('phone')) return 'phonepe';
  if (headerStr.includes('narration') && headerStr.includes('chq/ref')) return 'hdfc';
  if (headerStr.includes('txn id') && headerStr.includes('ref no')) return 'sbi';
  return 'generic';
};

const normalizeTransaction = (row: Record, format: string): Transaction => {
  const formatMap = {
    gpay: { date: 'Date', amount: 'Amount (INR)', description: 'Description', type: 'Transaction Type' },
    phonepe: { date: 'Date', amount: 'Amount', description: 'Transaction Details', type: 'Type' },
    hdfc: { date: 'Date', amount: 'Withdrawal Amt.', description: 'Narration', type: null },
    sbi: { date: 'Txn Date', amount: 'Debit', description: 'Description', type: null },
    generic: { date: 'date', amount: 'amount', description: 'description', type: 'type' }
  };
  // ... normalization logic
};

I described the problem — multiple CSV formats, need a normalizer — and Copilot drafted this structure. I modified the field mappings and added error handling. What would have been 2 hours of trial-and-error across real bank exports took 25 minutes.

3. The Gemini AI Integration

The AI Insights page needed Gemini to analyze spending patterns and return structured recommendations. I prompted Copilot:

"Write a function that sends transaction data to Google Gemini and returns categorized spending insights as structured JSON."

const analyzeWithGemini = async (transactions: Transaction[]): Promise => {
  const prompt = `
    Analyze these UPI transactions for an Indian user and return a JSON object with:
    - topCategories: array of {category, amount, percentage, trend}
    - savingsOpportunity: string (specific actionable tip)
    - unusualSpending: array of flagged transactions
    - monthlyComparison: {current, previous, change}

    Transactions: ${JSON.stringify(transactions.slice(0, 50))}

    Respond ONLY with valid JSON, no markdown.
  `;

  const result = await genAI.getGenerativeModel({ model: 'gemini-pro' }).generateContent(prompt);
  return JSON.parse(result.response.text());
};

The key insight Copilot helped me land: constraining the output to JSON and specifying the exact schema in the prompt. My initial attempts returned markdown-wrapped JSON that broke the parser. Copilot suggested the "Respond ONLY with valid JSON" constraint after I described the problem.

4. SIP Calculator with Live Chart

The investments page needed a SIP calculator where changing the inputs updates the chart in real time. Copilot wrote the compound interest formula and the Recharts integration in one shot:

const calculateSIP = (
  monthlyAmount: number,
  annualRate: number,
  years: number
): SIPResult[] => {
  const monthlyRate = annualRate / 12 / 100;
  const months = years * 12;

  return Array.from({ length: months }, (_, i) => {
    const month = i + 1;
    const invested = monthlyAmount * month;
    const returns = monthlyAmount * 
      ((Math.pow(1 + monthlyRate, month) - 1) / monthlyRate) * 
      (1 + monthlyRate);

    return {
      month,
      invested,
      returns: Math.round(returns),
      gain: Math.round(returns - invested)
    };
  });
};

I connected this to a useState hook for the inputs and wired the output directly to a Recharts AreaChart. The real-time update as you slide the monthly amount input is one of the most satisfying moments in the whole app.

5. What Copilot Couldn't Do

In the spirit of honesty: Copilot didn't understand the UPI-specific domain context without heavy prompting. It generated generic "credit/debit" category logic that missed obvious Indian patterns like "Swiggy" → Food Delivery, "IRCTC" → Travel, "Zepto" → Groceries. I had to build the Indian merchant categorization rules manually — a lookup table of ~80 common UPI merchant patterns. Copilot helped me structure it, but the domain knowledge was mine.

It also occasionally hallucinated Recharts API props that don't exist. Cross-referencing the Recharts docs was still necessary. Copilot is a fast first draft, not a final implementation.

The Numbers

Metric	Before	After
Feature pages working	2 of 6	6 of 6
Lines of code added	—	~2,400
API endpoints connected	0	4
Time to finish (with Copilot)	Estimated 3 weeks	8 days
File formats supported	0	4 (GPay, PhonePe, HDFC, SBI)

Why FinPal Matters

350 million Indians transact on UPI every month. The average urban Indian does 40+ UPI transactions monthly — food delivery, transport, groceries, rent. All of it flows through 3-4 different apps with no unified view.

FinPal is the tool I wanted and couldn't find. It's not a generic expense tracker. It's built for the specific reality of how Indian digital payments work: the merchant names, the UPI IDs, the bank statement formats, the SIP culture, the FD-first savings mentality.

GitHub Copilot helped me finish it in 8 days instead of 3 weeks. More importantly, it helped me pick up a codebase I'd half-forgotten and start shipping immediately instead of spending days just getting reoriented.

The project is live. The placeholder pages are gone. FinPal is finally FinPal.

🔗 Live Demo: finpalai.vercel.app
📦 Repo: github.com/Tejas164321/FinPal

If you're an Indian developer building for the UPI ecosystem, I'd love to connect. Drop your thoughts in the comments — especially if you've tried parsing HDFC/SBI statements. The column naming inconsistencies deserve their own post.

Gemma 4's Multi-Token Prediction Changes the Economics of Running AI Locally — Here's the Full Breakdown

Tejas Patil — Sun, 24 May 2026 06:34:05 +0000

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

There's a hard wall that every developer hits when they try to run a capable AI model locally. It's not the GPU. It's not the RAM. It's the memory bandwidth.

Standard autoregressive generation — the way every LLM has worked since GPT-2 — does one thing at a time: predict a token, move that token back through the model, predict the next one. Each step requires shipping gigabytes of weight matrices from memory to the processor. On a MacBook, an RTX 4080, or a cloud instance you're paying $0.40/hour for, this shuffle is the bottleneck. More VRAM doesn't fix it. Faster GPUs barely dent it. It's a structural constraint baked into how transformers generate text.

On May 5, 2026, Google shipped the fix. Multi-Token Prediction (MTP) drafters for the entire Gemma 4 family — and the numbers are real: up to 3x faster inference, zero quality loss, Apache 2.0 licensed, works with Ollama, vLLM, Hugging Face, MLX, SGLang, and LiteRT-LM out of the box.

This is the most important thing that happened to local AI this month. Let me show you exactly why — and help you figure out which Gemma 4 model is actually right for your use case.

First: The Four Models Explained

Gemma 4 isn't one model. It's a family of four, each designed for a genuinely different deployment context. Getting the model selection right matters as much as understanding MTP.

E2B — The Pocket Rocket

"E" stands for "effective" parameters. The E2B weighs in at roughly 1.5GB at 4-bit, runs on modern Android phones via Google AICore, works completely offline, and natively understands audio and images. It has a 128K context window.

The trick behind its size efficiency is Per-Layer Embeddings (PLE) — instead of stacking more transformer layers, each decoder layer gets its own small embedding table per token. The static weight footprint is technically larger than 2B parameters might suggest, but the active compute stays tiny. The result: a model that can live on a Raspberry Pi or a mid-range Android phone and still reason across an entire book chapter in one shot.

Use it when: you're building a mobile app, an offline tool, an IoT integration, or anything that must run without a network connection.

E4B — The Edge Sweet Spot

Same architecture philosophy as E2B, more headroom. Runs in ~5GB RAM at 4-bit, ~15GB at full 16-bit. Also supports audio and image input natively. Also 128K context.

The E4B hits the crossover point where capability meets practicality for most developer laptops. You're not giving up much compared to the bigger models for typical tasks — coding assistance, document Q&A, image analysis — and you keep the low-latency edge advantage.

Use it when: you're building a local-first desktop app, a developer tool, or anything running on a laptop that needs genuine multimodal capability.

26B A4B — The Efficiency Cheat Code

This is the sneaky one. 26 billion total parameters, but only 4 billion activate during any given inference. It's a Mixture-of-Experts (MoE) architecture: the model routes each token through the 4B expert subset most relevant to that input, ignoring the rest. All 26B must be loaded into memory (~18GB at 4-bit), but the compute per token is closer to a 4B model.

The result: #6 open model in the world on the Arena AI leaderboard, outcompeting models 20x its size, running at 4B-like speeds, with a 256K context window.

Use it when: you have ~20GB VRAM (RTX 3090, 4090, A10G) and want near-frontier capability with fast inference. This is the production sweet spot for most self-hosted deployments.

31B Dense — The Flagship

The 31B is currently #3 open model in the world on the Arena AI text leaderboard. Dense architecture (no MoE routing), 256K context, 20GB at 4-bit or 34GB at 8-bit. The most capable in the family, the most hardware-hungry.

Use it when: you need maximum quality and have the iron to back it up — A100, H100, multi-GPU setups, or high-memory cloud instances.

Deep Dive: How MTP Actually Works

Now for the part that changes everything.

The core insight behind Multi-Token Prediction is that the big, slow target model doesn't need to do all the work. A small, fast drafter model can predict several tokens ahead speculatively — and the target model can verify all of them in parallel in a single forward pass.

Here's the pipeline step by step:

Step 1 — Draft. The drafter (a compact model purpose-built for this) takes the current sequence and rapidly predicts 4–8 tokens ahead. This is cheap: the drafter is small, and it runs quickly.

Step 2 — Verify. The full target model (E2B, 26B A4B, whatever you're using) processes all the drafted tokens simultaneously in one forward pass. It checks each one.

Step 3 — Accept or Correct. If the target model agrees with a drafted token, it's accepted for free. If it disagrees, it generates the correct token for that position and the drafter starts fresh from there. Importantly, even a rejected step isn't wasted — the target model always produces the correct token at that position.

Net result: The target model does dramatically fewer forward passes per output token. The memory bandwidth bottleneck still exists, but you hit it far less often. Hence the 3x speedup.

What Makes Gemma 4's MTP Different

Here's the part that genuinely separates this from what others are doing.

KV cache sharing. The Key-Value cache (the model's short-term memory for attention values) is shared between the drafter and the target model. On a memory-constrained device, this is critical — no duplicating data in VRAM, no cache invalidation overhead.

Shared target activations. The drafter doesn't start from scratch. It uses the internal representations — the "activations" — that the target model has already computed in its deeper layers. The drafter is piggybacking on work already done. This makes the draft step faster and more accurate.

Official, first-party, Apache 2.0. Llama, Qwen, and DeepSeek all train MTP-aware variants. None of them ship official drafter checkpoints. Community drafters exist for those models, but the quality is uneven and the integration is manual. Gemma 4 ships polished, purpose-built drafters as standalone checkpoints on Hugging Face and Kaggle, with runtime support already baked into Ollama, vLLM, SGLang, MLX, and Hugging Face Transformers. It's one config flag, not a research project.

The Hardware Math (This Is Where It Gets Interesting)

The 26B A4B model, with MTP enabled, running on a cloud instance with 20GB VRAM:

Instance cost: ~$0.40–$0.80/hour (RTX A10G class)
MTP throughput improvement: ~2.5–3x over baseline
Per-token cost at sustained inference: competitive with GPT-4o mini pricing

That's the sentence that changes the build-vs-hosted calculus for a lot of teams. "Competitive with GPT-4o mini" at a capability level that places the model in the top 10 open models globally, on hardware you fully control, with data that never leaves your infrastructure, under a license with no MAU limits and no royalty clauses.

For mobile: the E2B with MTP runs on Android via Google AI Edge Gallery. The efficient embedder in the E-series models further reduces the compute overhead of the drafter on constrained hardware. A 3x speedup on a phone means the difference between a model that feels native and one that feels like it's thinking.

Setting It Up (This Takes About 10 Minutes)

With Ollama:

# Pull the 26B MoE model + its MTP drafter
ollama pull gemma4:26b
ollama pull gemma4:26b-mtp-drafter

# Run with speculative decoding enabled
ollama run gemma4:26b --speculative-model gemma4:26b-mtp-drafter

With vLLM:

from vllm import LLM, SamplingParams

llm = LLM(
    model="google/gemma-4-26B-A4B-it",
    speculative_model="google/gemma-4-26B-A4B-mtp-drafter",
    num_speculative_tokens=5,
    tensor_parallel_size=1
)

sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate(["Explain speculative decoding in plain English."], sampling_params)
print(outputs[0].outputs[0].text)

With Hugging Face Transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-26B-A4B-it")
target = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-26B-A4B-it", 
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
drafter = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-26B-A4B-mtp-drafter",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

inputs = tokenizer("Walk me through the 128K context window use case:", return_tensors="pt").to("cuda")

outputs = target.generate(
    **inputs,
    assistant_model=drafter,
    max_new_tokens=300
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Which Model Should You Actually Pick?

Here's the decision tree I'd give a colleague:

Are you building for mobile or IoT?
→ E2B. No competition. 1.5GB, offline, audio-native, Apache 2.0.

Are you building a local-first desktop tool or developer assistant?
→ E4B with MTP drafter. Best balance of speed and capability for a laptop GPU.

Are you self-hosting for a production SaaS or internal tool?
→ 26B A4B with MTP drafter. MoE gives you near-31B quality at 4B inference speed. The economics work at scale.

Do you need absolute maximum quality and have A100/H100 infrastructure?
→ 31B Dense with MTP drafter. #3 open model in the world. That's the ceiling of what you can run yourself right now.

The Bigger Picture

Here's my honest take after spending time with Gemma 4 and its MTP release: we just crossed a threshold.

The 31B model ranking #3 globally among open models is remarkable. But it would be table stakes — every major lab has a flagship. What makes Gemma 4 significant is the combination: frontier-level capability at the top, a model that runs in 1.5GB on a phone at the bottom, and MTP drafters that make all of them dramatically faster, all under a license with no strings attached.

The MTP implementation specifically matters because it signals something about Google's intent. This isn't a capability demo — it's infrastructure. Shipping official, polished, first-party drafter checkpoints that plug into every major serving framework in a single afternoon is the kind of work that benefits the entire open-weight ecosystem, not just Gemma users.

The other labs will follow. Llama and Qwen will ship official drafters. The bar just moved.

For developers: the "should I use an API or run it myself" question just got a lot more interesting. For the first time, the answer for a lot of production workloads might genuinely be "run it yourself, it's cheaper, it's faster, and you own the data."

That is a real change. And Gemma 4 MTP is the specific reason it's true now when it wasn't true six months ago.

Resources

Official MTP Overview — Google AI for Developers
MTP with Hugging Face Transformers — full implementation guide
Gemma 4 on Ollama — one-command local setup
Gemma 4 on Hugging Face — all model checkpoints
Google AI Edge Gallery — try E2B/E4B MTP on Android or iOS

What are you building with Gemma 4? I'm particularly curious who's running the E2B on actual edge hardware — drop it in the comments.

WebMCP Is the Most Important Thing Google Announced at I/O 2026 (And Almost Nobody Is Talking About It)

Tejas Patil — Sat, 23 May 2026 22:19:01 +0000

This is a submission for the Google I/O Writing Challenge

Right now, every AI agent that tries to use a website is basically doing this:

Take a screenshot
Guess what's on screen
Click something and hope
Take another screenshot
Repeat until it works or gives up

It's the digital equivalent of reading someone's lips through a frosted glass window. It kind of works. It's slow, expensive, and breaks constantly on anything slightly dynamic — a modal, a lazy-loaded form, a JS-rendered button.

Google's answer to this is called WebMCP — Web Model Context Protocol. It entered a public origin trial in Chrome 149 on May 19, 2026, during the I/O Developer keynote. And I think it's the most consequential announcement of the whole event — not because of what it does today, but because of what it signals about where the web is going.

Let me show you what it actually is, how to use it right now, and why I have real questions about whether it will succeed.

What WebMCP Actually Does

The idea is simple: instead of making AI agents figure out what your website does by staring at it, you tell them explicitly.

WebMCP lets you expose structured tools — JavaScript functions and annotated HTML forms — directly to browser-based AI agents. The agent doesn't scrape. It calls your tool like an API.

There are two ways to implement it:

The Declarative API (for forms)

You annotate existing HTML forms with a data-mcp-tool attribute and a description. The agent reads the annotation and knows exactly what the form does.







All categories
Electronics
Clothing


  Search

That's it. An agent seeing this form no longer has to guess what the fields mean or what the form does. You've told it.

The Imperative API (for JavaScript functions)

For more complex interactions, you register tools programmatically:

navigator.mcp.registerTool({
  name: "add_to_cart",
  description: "Add a product to the shopping cart by product ID and quantity",
  parameters: {
    productId: {
      type: "string",
      description: "The unique product identifier"
    },
    quantity: {
      type: "number",
      description: "Number of units to add",
      minimum: 1
    }
  },
  handler: async ({ productId, quantity }) => {
    const result = await cartService.add(productId, quantity);
    return { success: true, cartTotal: result.total };
  }
});

An agent calling add_to_cart with { productId: "ABC123", quantity: 2 } will get a reliable result — no screenshot guessing, no DOM parsing, no retries.

Why I'm Genuinely Excited

1. This is a Google + Microsoft co-project

This is the detail that changes everything for adoption: WebMCP is developed jointly by Google and Microsoft in the W3C Web Machine Learning Community Group.

That's not just a Google standard. It's an emerging web standard with two of the biggest browser vendors aligned on the spec from day one. Cross-vendor agreement at this stage is rare and meaningful. It substantially increases the chance this becomes a real, lasting part of the web platform.

2. The timing is right

Browser agents — AI systems that navigate websites on your behalf — are growing fast. Gemini in Chrome, which will support WebMCP APIs, is one. Others are coming. Right now these agents are all fighting the same brittle DOM-scraping battle. WebMCP gives the web a way to meet them halfway.

Implementing WebMCP on your site today is the same category of investment as adding proper aria-label attributes in 2015 or adding og:title meta tags in 2012. It felt optional then. It became table stakes.

3. The developer experience is genuinely low-friction

The declarative API requires zero new JavaScript — just HTML annotations. You can expose your most common user flows to agents in an afternoon. The barrier is low enough that "let's try it" is a reasonable thing to say at a sprint planning meeting right now.

Where I Have Real Questions

I don't want to just be a hype machine, because there are genuine open questions here.

Firefox and Safari haven't committed

This is the elephant in the room. Mozilla and Apple have not signed on to WebMCP. For a standard to truly succeed on the web, it needs more than Chrome. Right now, if you implement WebMCP, it's Chrome-only by design.

That's not fatal — lots of meaningful features started as Chrome-only experiments before getting broader adoption. But it's a real constraint. If your user base is heavy on Safari (mobile web, Apple users), WebMCP tooling won't work for those agents browsing on Safari.

"No headless support" is a meaningful limitation

The official Chrome documentation is explicit: WebMCP requires a browser tab to be open. There's no support for agents to call your tools in a headless state.

This means WebMCP is specifically for in-browser agent interactions — not for server-side automation pipelines that many enterprise workflows rely on. For those use cases, you'd still need a backend MCP server. WebMCP and server-side MCP are complementary, not interchangeable.

The spec is not yet on the W3C official standards track

It currently lives in the W3C Web Machine Learning Community Group — an incubation space, not the full standards process. The path from origin trial to official web standard is long and uncertain. WebMCP could follow the path of Service Workers (proposed → standard → ubiquitous). Or it could follow the path of a dozen other promising origin trials that never made it.

What I'd Actually Recommend

If you maintain a web app with forms or user-facing workflows, here's what I'd do this week:

Step 1: Enable the flag in Chrome today
Go to chrome://flags and search for "WebMCP". Set it to Enabled, relaunch, and you can start testing immediately without waiting for Chrome 149.

Step 2: Pick your one most important user flow
Don't try to annotate everything. Pick the single form or interaction that an agent would most benefit from — a search form, a checkout step, a filter UI. Annotate it with the declarative API. It'll take an hour.

Step 3: Sign up for the origin trial
Visit the Chrome origin trial page and register your domain for the WebMCP trial. This lets you ship WebMCP support to real users before Chrome 149 hits stable.

Step 4: Watch what happens when Gemini in Chrome supports it
This is the moment that will make the investment pay off. When Google's in-browser agent can call your registered tools directly — that's when the "I annotated my forms" work starts delivering real value.

The Bigger Picture

Here's my actual take after sitting with I/O 2026 for a few days:

The Gemini model announcements are table stakes at this point. Every major AI lab releases faster, cheaper models every few months. That's not a story; it's a cadence.

WebMCP is different. It's infrastructure. It's Google (and Microsoft) trying to answer a structural question about the web's future: when AI agents become first-class citizens of the browser, what contract does a website make with them?

The answer they're proposing is WebMCP: an explicit, structured, queryable tool surface that gives agents what they actually need instead of forcing them to infer it.

If that standard gets adopted, it changes how we think about building for the web. We'll think about our web apps as having three user types: humans on desktop, humans on mobile, and AI agents. WebMCP is the API layer for the third type.

That is a genuinely new idea. And it came from a developer keynote that most people stopped watching after the Gemini 3.5 Flash benchmarks.

Are you going to try WebMCP in the origin trial? I'd love to hear which use cases you're thinking about — drop them in the comments.