DEV Community

Cover image for 💀 The Hidden Cost of “AI Features”: What I Learned Turning a Messaging App Into an Expense Tracker

💀 The Hidden Cost of “AI Features”: What I Learned Turning a Messaging App Into an Expense Tracker

I create a Telegram bot and treat it as my personal ledger. It’s a bit addictive because “I can type anything and it just works”. And I realized, just ship it!

Then my chad's message themselves:

  • “Makan soto 20k.”
  • “Indihome 350k”
  • “Isi bensin 100k, tambah angin 5k, cilok 10k”
  • upload Indomaret photo receipt
  • a voice note after leaving a cafĂ©

So I pivoted the product into what felt obvious: a chat-first expense tracker.

The killer feature was “auto-save expenses” from text, image, and audio. And once data was captured, we added analytics: weekly and monthly breakdowns, categories, trends, and a “you’re going to overshoot” signal (fancy word for overcast, wkwkw).

The product felt magical:

You don’t “log expenses”. You just send your saved messages.
The bot does the bookkeeping.

The beta version was cheap because the demo was polite

In demos, users send one clean message. One receipt. One voice note. Everything is short and predictable.

In real production, a chat UI is basically a cost amplifier:

  • users spam message
  • they resend when it feels slow
  • they paste long text
  • they upload huge images
  • they record 2-minute audio notes when 10 seconds would do
  • they ask for “weekly summary” five times because it looks fun

A messaging interface doesn’t behave like a form, it behaves like a stream. And for sure, AI love stream.. because it can charge you for every token spent :D

My original pipeline (aka “how I accidentally built a money burner”)

For each message, I did something like this:

If it’s text

  1. detect if it’s an expense
  2. extract amount, merchant, category, or define it
  3. save transaction
  4. respond with a confirmation + a friendly explanation

If it’s an image

  1. OCR/vision extraction
  2. parse line items or totals
  3. extract amount, merchant, category, or define it
  4. save transaction
  5. respond with what it found and ask for confirmation

If it’s an audio

  1. transcribe
  2. run the same extraction as text
  3. save transaction
  4. respond with a summary

Then on top of that, I had analytics:

  • weekly/monthly charts
  • “top categories”
  • “spikes” (“why is transport higher this week?”)
  • forecasting / overcast alerts (“you’re trending 18% above your normal pace”)

So the app wasn’t “AI + database.”

It was a distributed system with metered inference.

Hidden cost #1: tokens compound in a chat product

I didn’t feel it immediately because my brain was still in “API calls are roughly the same cost” mode.

But LLM usage isn’t flat. It scales with:

  • prompt length (system instructions + rules + examples)
  • user message length
  • retrieved context (history, past expenses, category list)
  • output length (confirmations, explanations, summaries)

And chat systems naturally push you toward more context:

  • “Use the last 20 messages so it understands the user”
  • “Include last month’s spending so it can categorize smarter”
  • “Include the user’s recurring merchants”
  • “Include the budget settings”
  • “Include the currency rules”

Suddenly, the simplest “makan soto 20k” message is riding inside a prompt suitcase filled with your entire product.

The cheapest token is the one you never send.

Chat UX makes you forget that.


Hidden cost #2: multimodal is not one cost—it's three

Text-only expense parsing was manageable.

Images and audio were where the bill started growing teeth.

Images (receipts)

A single receipt can mean:

  • image upload bandwidth + storage
  • vision/OCR inference cost
  • parsing errors → retries
  • confirmation flows (extra model calls)
  • edge cases (blur, shadows, multiple totals, currencies)

And users don’t upload “small receipts.”

They upload full-res photos straight off their camera roll.

Audio (voice notes)

Audio is sneaky because it’s time-based. People talk longer than they type.

A “quick note” becomes:

  • transcription compute
  • then LLM extraction
  • then confirmation response

So one voice note can be more expensive than ten text messages—without looking scary in the UI.


Hidden cost #3: analytics turns “data” into “context inflation”

The analytics requests were where I got hit hardest.

Because analytics queries are vague by nature:

  • “How am I doing this month?”
  • “Why am I spending more than usual?”
  • “What should I cut?”
  • “Give me a weekly summary”
  • “Predict next month”

To answer these, my first instinct was: send more context.

So I started retrieving lots of transactions, bundling them into the prompt, and asking the model to reason over it.

That’s the trap.

It works… until it scales.

Because now every analytics request is:

  • a database read (sometimes large)
  • plus prompt assembly
  • plus a big generation (users love long insights)
  • and it repeats weekly/monthly for every active user

This is where “AI features” quietly become “cloud bill features.”


Hidden cost #4: latency causes retries, retries cause duplicates, duplicates cause chaos

When the AI response took too long, users did what users do:

  • send the same message again
  • refresh
  • tap the button twice
  • re-upload the receipt
  • re-record the voice note

And if your system isn’t aggressively idempotent, you get:

  • duplicate transactions
  • angry users (“why is lunch logged twice?”)
  • extra model calls
  • extra cleanup tools you now must build

So the cost problem turned into a data integrity problem.

And the data integrity problem turned back into a cost problem.

The builder takeaway: AI features are product architecture now

Turning a messaging app into an expense tracker taught me something simple:

In a chat UI, every UX decision is a cost decision.

  • auto-run analytics = cost spike
  • long explanations by default = cost spike
  • “just include more context” = cost spike
  • retries without idempotency = cost spike and bad data
  • multimodal without guardrails = cost spike and latency pain

But you don’t need to fear it.

You just need to design AI like you design infra:

  • budgets
  • fast paths
  • async heavy work
  • caching
  • observability
  • graceful fallbacks

Because once AI is inside your product, you’re not only building features anymore.

You’re operating a meter.

Thank you for reading this article, hope it’s helpful 📖. See you in the next article 🙌

Top comments (0)