Graita Sukma Febriansyah Triwildan Azmi

Posted on Feb 4

💀 The Hidden Cost of “AI Features”: What I Learned Turning a Messaging App Into an Expense Tracker

#ai #telegram #automation #showdev

I create a Telegram bot and treat it as my personal ledger. It’s a bit addictive because “I can type anything and it just works”. And I realized, just ship it!

Then my chad's message themselves:

“Makan soto 20k.”
“Indihome 350k”
“Isi bensin 100k, tambah angin 5k, cilok 10k”
upload Indomaret photo receipt
a voice note after leaving a café

So I pivoted the product into what felt obvious: a chat-first expense tracker.

The killer feature was “auto-save expenses” from text, image, and audio. And once data was captured, we added analytics: weekly and monthly breakdowns, categories, trends, and a “you’re going to overshoot” signal (fancy word for overcast, wkwkw).

The product felt magical:

You don’t “log expenses”. You just send your saved messages.
The bot does the bookkeeping.

The beta version was cheap because the demo was polite

In demos, users send one clean message. One receipt. One voice note. Everything is short and predictable.

In real production, a chat UI is basically a cost amplifier:

users spam message
they resend when it feels slow
they paste long text
they upload huge images
they record 2-minute audio notes when 10 seconds would do
they ask for “weekly summary” five times because it looks fun

A messaging interface doesn’t behave like a form, it behaves like a stream. And for sure, AI love stream.. because it can charge you for every token spent :D

My original pipeline (aka “how I accidentally built a money burner”)

For each message, I did something like this:

If it’s text

detect if it’s an expense
extract amount, merchant, category, or define it
save transaction
respond with a confirmation + a friendly explanation

If it’s an image

OCR/vision extraction
parse line items or totals
extract amount, merchant, category, or define it
save transaction
respond with what it found and ask for confirmation

If it’s an audio

transcribe
run the same extraction as text
save transaction
respond with a summary

Then on top of that, I had analytics:

weekly/monthly charts
“top categories”
“spikes” (“why is transport higher this week?”)
forecasting / overcast alerts (“you’re trending 18% above your normal pace”)

So the app wasn’t “AI + database.”

It was a distributed system with metered inference.

Hidden cost #1: tokens compound in a chat product

I didn’t feel it immediately because my brain was still in “API calls are roughly the same cost” mode.

But LLM usage isn’t flat. It scales with:

prompt length (system instructions + rules + examples)
user message length
retrieved context (history, past expenses, category list)
output length (confirmations, explanations, summaries)

And chat systems naturally push you toward more context:

“Use the last 20 messages so it understands the user”
“Include last month’s spending so it can categorize smarter”
“Include the user’s recurring merchants”
“Include the budget settings”
“Include the currency rules”

Suddenly, the simplest “makan soto 20k” message is riding inside a prompt suitcase filled with your entire product.

The cheapest token is the one you never send.

Chat UX makes you forget that.

Hidden cost #2: multimodal is not one cost—it's three

Text-only expense parsing was manageable.

Images and audio were where the bill started growing teeth.

Images (receipts)

A single receipt can mean:

image upload bandwidth + storage
vision/OCR inference cost
parsing errors → retries
confirmation flows (extra model calls)
edge cases (blur, shadows, multiple totals, currencies)

And users don’t upload “small receipts.”

They upload full-res photos straight off their camera roll.

Audio (voice notes)

Audio is sneaky because it’s time-based. People talk longer than they type.

A “quick note” becomes:

transcription compute
then LLM extraction
then confirmation response

So one voice note can be more expensive than ten text messages—without looking scary in the UI.

Hidden cost #3: analytics turns “data” into “context inflation”

The analytics requests were where I got hit hardest.

Because analytics queries are vague by nature:

“How am I doing this month?”
“Why am I spending more than usual?”
“What should I cut?”
“Give me a weekly summary”
“Predict next month”

To answer these, my first instinct was: send more context.

So I started retrieving lots of transactions, bundling them into the prompt, and asking the model to reason over it.

That’s the trap.

It works… until it scales.

Because now every analytics request is:

a database read (sometimes large)
plus prompt assembly
plus a big generation (users love long insights)
and it repeats weekly/monthly for every active user

This is where “AI features” quietly become “cloud bill features.”

Hidden cost #4: latency causes retries, retries cause duplicates, duplicates cause chaos

When the AI response took too long, users did what users do:

send the same message again
refresh
tap the button twice
re-upload the receipt
re-record the voice note

And if your system isn’t aggressively idempotent, you get:

duplicate transactions
angry users (“why is lunch logged twice?”)
extra model calls
extra cleanup tools you now must build

So the cost problem turned into a data integrity problem.

And the data integrity problem turned back into a cost problem.

The builder takeaway: AI features are product architecture now

Turning a messaging app into an expense tracker taught me something simple:

In a chat UI, every UX decision is a cost decision.

auto-run analytics = cost spike
long explanations by default = cost spike
“just include more context” = cost spike
retries without idempotency = cost spike and bad data
multimodal without guardrails = cost spike and latency pain

But you don’t need to fear it.

You just need to design AI like you design infra:

budgets
fast paths
async heavy work
caching
observability
graceful fallbacks

Because once AI is inside your product, you’re not only building features anymore.

You’re operating a meter.

Thank you for reading this article, hope it’s helpful 📖. See you in the next article 🙌

DEV Community