DEV Community: Udit Jain

How WhatsApp Makes Money Without Charging You a Rupee

Udit Jain — Tue, 16 Jun 2026 08:14:06 +0000

You've never seen an ad inside a chat. You've never paid a fee. Meta shelled out $19 billion for the app anyway. Here's the story of how that math actually works.

Act 1: The $1 app that hated ads

WhatsApp wasn't always free. When Jan Koum and Brian Acton launched it in 2009, the deal was simple: free for your first year, then $0.99 a year after that. One dollar. Less than a chai. Koum grew up in Soviet Ukraine, where every wall had ears. For him, building an app without surveillance wasn't a product strategy — it was something closer to a personal conviction.

Brian Acton put it bluntly on Twitter in 2012: "No ads, no games, no gimmicks." Both founders had spent years at Yahoo watching ads slowly rot a product they cared about. They weren't going to do that to WhatsApp.

$0.99/year launch → $19B Facebook acquisition (2014)
→ fee dropped, businesses hinted (2016) → Business
App + API launch (2018–2020) → CTWA ads hit $10B
run-rate (2023–2024) → per-message pricing + AI
agents (2025–2026)" width="800" height="989">

Then in February 2014, Facebook bought WhatsApp for roughly $19 billion — a mix of cash, stock, and restricted shares. Most people thought Zuckerberg had lost the plot. The app had 400 million users and was losing $138 million a year. There was no obvious path to profit.

Two years later, in January 2016, Koum stood up at DLD Munich and announced the $1 fee was gone. His reason: too many users in India and Brazil didn't have credit cards. But tucked into the same announcement was a line about "communicating with businesses." Nobody made much of it at the time. That line was the entire future of WhatsApp's business model.

The whole pivot, one sentence: Stop charging the person getting the message. Start charging the business that desperately wants to send it.

Act 2: Two Products, One Revenue Engine

People mix these up constantly, and it actually matters. There are two different "WhatsApp Business" products and only one of them makes Meta money.

The WhatsApp Business App is the free one — the green app your neighbourhood pharmacy, local gym, or kurti boutique uses. Business profile, product catalog, quick replies, away messages. Works on one phone, maybe a couple of linked devices. Meta doesn't really earn from this directly. It's table stakes to keep small businesses on the platform.

The WhatsApp Business Platform — the API — is the other one. When HDFC Bank sends you a transaction alert, when Flipkart tells you your parcel is out for delivery, when an airline shoots you a check-in reminder, none of that comes from someone typing on a phone. It goes through an API. That API access is what Meta charges for, and that's where the actual money is.

(free, for small businesses — profile, catalog,
quick replies, one phone) vs WhatsApp Business
Platform / API (paid, for enterprises — banks,
airlines, e-commerce sending at scale via API calls).
Stat cards: 200M+ Business App users, $2B paid
messaging run-rate, 60% CTWA ad growth YoY, 10M
AI conversations/week." width="800" height="216">

Act 3: How the Billing Actually Works

Most articles about WhatsApp's business model breeze past this part. They shouldn't. The billing mechanics changed fundamentally in 2025, and if you want to understand where Meta's money actually comes from, you need to sit with this for a minute.

Until mid-2025, Meta billed businesses per conversation. You pay one flat fee to open a 24-hour window with a customer. Inside that window, send as many messages as you want — one fee total. Simple, but imprecise. Businesses that sent a single message paid the same as businesses running full-blown chat flows.

On July 1, 2025, that changed to per-message pricing. Every delivered template message now gets its own bill. What you pay depends on two variables: the message category, and the country where the recipient's number is registered.

The price depends on what category of message it is:

Marketing — promos, sale alerts, "your cart misses you" nudges. Most expensive, no bulk discount.
Utility — order confirmations, delivery updates, payment reminders. Cheap, and often free if it's a reply inside an active conversation.
Authentication — your OTPs. Always charged, even mid-conversation, because these are high-volume and high-value to businesses.
Service — a human or bot replying to something you asked. Free.

Act 4: The Real Money Is One Click Away

UHere's the part that genuinely surprised me when I dug into this. The biggest pile of revenue connected to WhatsApp doesn't show up in WhatsApp's numbers at all. It gets booked as Facebook and Instagram ad revenue.

These are called Click-to-WhatsApp ads. You're scrolling Instagram, an ad catches your eye, you tap it — and instead of a landing page, a WhatsApp chat opens. The business's number, a pre-filled message. You're already in the funnel. For the business, this is a dramatically warmer lead than a website click where the user might close the tab in ten seconds and never come back.

User sees ad on Instagram/Facebook → taps CTWA ad
(Meta earns ad revenue, $10B+ run-rate) → WhatsApp
chat opens with pre-filled message → Business replies
free for 72 hours → Conversion happens inside
WhatsApp or on website." width="799" height="540">

Susan Li, Meta's CFO, said on the Q3 2025 earnings call that click-to-WhatsApp ads grew 60% year-on-year. In Q4 2025, US click-to-message ads alone grew over 50% year-on-year. And this came off a base that was already running at $10 billion a year back in early 2023.

WhatsApp doesn't pocket this ad money directly. What it does is make Meta's ad business worth more — by giving advertisers somewhere high-intent to land users. That's a better deal for Meta than charging users a dollar a year ever could have been.

Act 5: The Middlemen Nobody Talks About

Most large companies don't connect straight to Meta's raw API. Getting that to work requires engineers who can handle webhooks, message template approvals, quality score monitoring, and tier-based sending limits. That's weeks of work, not an afternoon.

So a whole category of companies called Business Solution Providers (BSPs) lives between the business and Meta. They handle the technical side and hand businesses a cleaner interface — a shared inbox, a dashboard, pre-built templates, analytics. Meta still collects its per-message wholesale fee. The BSP adds a small markup on top, usually a fraction of a cent per message, and that's their cut.

Gupshup is the dominant BSP in India — reportedly pushing over 10 billion messages a month for companies like Flipkart, Netflix, and Ola. It reported about $360 million in revenue for 2024. Tiger Global backed it to a $1.4 billion valuation. The entire BSP industry exists because Meta built a toll road and decided it didn't want to staff every booth itself.

e-commerce, airline) → BSP like Gupshup/Twilio/
Infobip (adds $0.002–$0.005/msg markup, provides
dashboard + inbox) → Meta Cloud API (delivers to
2B users). BSP player chips below: Gupshup 10B+
msgs/mo, Twilio, Infobip, 360dialog, Wati, Haptik,
AiSensy, MessageBird." width="800" height="604">

Act 6: The side quest that didn't quite work — WhatsApp Pay

f you'd told me in 2020 that WhatsApp Pay would have under 0.4% of India's UPI market in 2025, I'd have called that an embarrassment. WhatsApp had 500 million users in India. The UPI rails were already built. NPCI eventually removed the user cap entirely in December 2024. Everything looked set up for a quick win.

It didn't play out that way. By June 2025, WhatsApp Pay was processing about 67 million transactions a month. PhonePe was doing 10.5 billion. Google Pay, 7.5 billion. The gap isn't close — it's not even in the same conversation.

of June 2025: PhonePe ~48% (purple), Google Pay
~37% (blue), Paytm ~10% (cyan), WhatsApp Pay
<0.4% (green). Source: NPCI data. WhatsApp Pay
processed 67M transactions vs PhonePe's 10.5B+." width="800" height="346">

The reasons aren't complicated. WhatsApp came in late when PhonePe and Google Pay had already made UPI feel obvious and natural. Early versions of WhatsApp Pay had reliability problems. And there's a subtler issue: people don't naturally trust their messaging app with money. WhatsApp is where you send memes. PhonePe is where you pay rent. Those are different mental buckets, and habits don't move easily.

Meta isn't really trying to win the payments race at this point. WhatsApp Pay exists so that when a customer buys something inside a WhatsApp chat, the payment step doesn't kick them out of the app. It's infrastructure for in-chat commerce, not a standalone product competing with PhonePe.

Act 8: What's Coming Next: AI That Runs Your Storefront

In early June 2026, Meta launched something called Meta Business Agent globally. It's an AI that sits inside WhatsApp and can answer customer questions, suggest products, book appointments, and triage support tickets before handing off to a human. Not a bot with buttons — an actual LLM-based agent.

The pricing works like OpenAI's API: you pay by token usage. Small businesses get a subscription plan; larger ones pay based on how many AI interactions they run. Every token still touches Meta's servers, which means every conversation is a compute cost that Meta can now bill for. That's a genuinely new revenue stream — B2B AI infrastructure — layered on top of the existing messaging business.

By late March 2026, these agents were handling around 10 million conversations a week before the global launch. Zuckerberg's line from the earnings call: "Business AIs will enable tens of millions of businesses to scale these conversations." I think he means it.

One move that's worth paying attention to: in January 2026, Meta blocked third-party AI chatbots — OpenAI's ChatGPT, Perplexity — from the WhatsApp Business API. The official reason was volume issues. The practical result: Meta AI is now the only general-purpose AI assistant that works inside WhatsApp. Italy's competition regulator opened an inquiry. The European Commission noticed. Whether this holds up is a separate question, but the intent is clear.

So, the full picture

Step back and the whole model comes into focus. WhatsApp is free for users, always will be, no ads inside chats. But it's not a charity. Every bank OTP, every delivery update, every "your order is confirmed" message is a small fee landing in Meta's account. Multiply that across a billion daily active users and thousands of businesses sending millions of messages a day, and the numbers start making sense.

(1) WhatsApp Business Platform — Direct revenue,
$2B annual run-rate, paid by enterprises via BSPs.
(2) Click-to-WhatsApp Ads — Indirect, $10B+ run-rate
+60% YoY, paid by advertisers as Meta ad revenue.
(3) WhatsApp Pay — Strategic play, <0.4% UPI share,
minimal direct revenue. (4) Meta Business Agent —
Emerging, token-based AI pricing, 10M conversations/
week as of March 2026." width="800" height="589">

The $19 billion wasn't a bet on ads inside WhatsApp chats. It was a bet on WhatsApp becoming unavoidable infrastructure — the pipe that every business eventually has to pay to use, just to reach its own customers. A dollar a year from users was never going to get there. A fraction of a cent per message, at a billion messages a day, actually does.

This is part of my "Architecture Behind Everyday Apps" series, where I dig into the engineering and business mechanics behind apps we use every day without thinking twice.

The 320-Millisecond Conversation

Udit Jain — Mon, 15 Jun 2026 07:46:39 +0000

How AI voice mode went from a five-second Q&A bot to something that interrupts you mid‑sentence — and everything that had to be rebuilt to get there.

Open ChatGPT's voice mode, or Gemini Live, and try talking over it mid-sentence. The AI just stops. Not after it finishes the sentence it was on, not after an awkward half-second lag. It stops the way a person stops when they realize you've started talking over them.

A couple of years ago, this same interaction was painful. You'd ask a question, watch a spinner, wait three to five seconds, and get back a flat, oddly-paced reading of a paragraph. Now it feels closer to a phone call. Same companies, same basic idea, "talk to an AI", but somewhere along the way it stopped feeling like using a tool and started feeling like talking to something.

So what actually changed? Not "the model got smarter" — though it did, a bit. The bigger story is a near-total rebuild of how audio moves between your phone and the model, touching everything from how these models are trained down to which network protocol your browser happens to be using.

Three Models Playing Telephone

Until recently, every AI voice assistant, including the first version of ChatGPT's voice mode, worked the same way under the hood: three separate models bolted together in a row, each one handing its output to the next.

Each handoff adds latency, and it adds up fast — roughly 2.8 seconds for GPT-3.5 and 5.4 seconds for GPT-4 to respond in voice mode. But honestly, the delay wasn't even the worst part. The real loss was information. The moment your voice became text, everything that wasn't literally the words — your tone, your pacing, whether you sounded annoyed or amused — was gone. The LLM never heard any of it. And the TTS model at the very end had no idea what emotion the reply should carry, because all it ever got was a string of text too.

It's the classic telephone-game problem: three models that don't share a brain, passing notes to each other through a narrow, text-shaped slot.

One Model, Audio In, Audio Out

The fix — which OpenAI shipped first with GPT-4o in mid-2024, and Google followed not long after with Gemini 2.5 Flash's native-audio models — collapses all three stages into a single model.

OpenAI's own system card describes GPT-4o as "an autoregressive omni model, which accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs… trained end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network." In plain terms: there's no handoff anymore. The network that hears you is the same one that decides what to say, and the same one that says it.

How the Audio Actually Moves

Making this work in real time meant OpenAI had to build something new: the Realtime API. Instead of the usual request-response pattern, where every message is its own fresh API call, it opens one persistent connection — either a WebSocket or a WebRTC connection — and keeps it open for the whole conversation.

The audio itself usually moves as 16-bit PCM at 16kHz or 24kHz (Gemini's Live API specifically wants 16kHz PCM in, 24kHz PCM out). G.711 works too, for telephony, but there's a catch: converting its 8kHz audio up to the model's native 24kHz tacks on another 50–100ms and makes it sound worse.

Knowing When to Stop Talking

That "stops the instant you talk" behavior comes down to Voice Activity Detection (VAD). The basic version, server VAD, just watches for silence — pause for about 500ms and it assumes you're done. Semantic VAD is the newer, more interesting one: it's a classifier that judges whether what you just said sounds finished. Trail off with "ummm…" and it waits longer. End on something that sounds like a complete thought, and it jumps in almost immediately.

When you actually interrupt, the system fires response.cancelled, and whatever audio the model hadn't finished playing yet gets truncated out of the conversation history — so the model only "remembers" what it actually said out loud, not what it was about to say. WebRTC handles this part automatically, since the server already knows exactly how much audio has played. Over a plain WebSocket, your client has to track that itself and tell the server where the cut happened.

The Trade-offs

First trade-off: you lose the transcript you could inspect. When everything ran through text, moderation was simple — you just read the text. With audio-native models, OpenAI still runs that old text-based check on a transcription, but adds a second, real-time classifier that watches the audio as it's being generated and checks whether the voice still matches one of the approved presets, shutting things down if it drifts. OpenAI says this setup "catches 100% of meaningful deviations from the system voice" in their internal tests — mostly there to stop the model from ever accidentally cloning a user's own voice.

Second trade-off, and it's bigger than most people realize: voice is just expensive. Here's what the original Realtime API pricing looked like, token for token:

That's roughly 20x the price of text input and 10x the price of text output. This is the actual reason voice features sit behind paid tiers with daily limits while text chat feels basically free: every second you spend talking has a meter running on it in a way that typing never did.

Third one: switching to WebRTC isn't automatically a win either. Some teams have swapped a working WebSocket pipeline for WebRTC expecting better audio by default, and gotten the opposite. WebRTC's jitter buffer is built to smooth out unpredictable timing between two network peers, but AI-generated audio doesn't arrive on that kind of schedule. Without rebuilding the same frame-pacing logic (one 20ms frame at a time, with prebuffering), the audio actually sounds worse over WebRTC than it did over a plain socket.

The Philosophical Close

Here's the thing that gets me about all this: almost none of it is really about the model getting "smarter." Better reasoning, more facts, a bigger context window — that's not really the story here. This is about closing the gap between thinking and speaking until you can't feel it anymore, and then rebuilding an entire safety and pricing model around the fact that text, the one format these companies could cheaply moderate, log, and price, might not exist as a middle step anymore.

It's a strange trade when you sit with it: betting that feeling present in a conversation — interruptible, emotionally responsive, instant — is worth more than being able to check what the AI actually said.

232 milliseconds isn't really the achievement, if you think about it. The achievement is that getting there meant these companies had to make their own systems harder to inspect on purpose, then build a whole new layer of classifiers just to make up for it. Talking to an AI that actually feels like a person turns out to be inseparable from a much harder problem: trusting a system you can no longer just read.

Sources

OpenAI — GPT-4o System Card (arXiv:2410.21276)
OpenAI — "Introducing the Realtime API"
OpenAI — "Introducing gpt-realtime and Realtime API updates for production voice agents"
OpenAI Platform Docs — Realtime conversations, Voice Activity Detection, Managing costs
Google AI for Developers — Gemini Live API overview & Gemini 2.5 Flash Native Audio
Latent Space — "OpenAI Realtime API: The Missing Manual"
Production engineering write-ups: Skywork, Effloow, Forasoft, DEV Community (WebRTC/WebSocket pacing)

You're Not Bad at AI. You're Just Prompting It Wrong.

Udit Jain — Thu, 11 Jun 2026 06:13:05 +0000

A friend of mine spent 20 minutes complaining that ChatGPT was "useless" last week.

I asked to see his prompt.

"Give me some marketing ideas."

Five words. He expected magic.

The model wasn't broken. The prompt was. And honestly, this is the case for most developers I see reaching for AI and getting frustrated — not because the models are bad, but because the input was garbage.

Here are 10 techniques that actually change what you get back. Been using these for a while and the difference is real.

The Core Problem First

Before the list — you need to understand why bad prompts produce bad output.

Think about this:

I describe a world to you. Dry and sandy. Warring clans fighting over a rare resource buried in the dunes. People wearing moisture-absorbing suits to survive. Now I ask — what are the giant creatures in this world?

You'd say sandworms. Dune.

But if I'd just said "there's a world, what are the creatures?" — you'd have said trolls. Cyclops. Something completely generic.

That's exactly what's happening with AI. The puzzle pieces you give it shape what comes back. Give it nothing, get the most statistically average response possible. That's why everyone's AI output is starting to look identical.

It's not a model problem. It's an input problem.

The 10 Techniques

1. World Building — Give It Context Before the Question

Before you send any prompt, ask yourself: how much context have I actually given?

Who are you in this situation?
What's the real goal?
Who's the audience?
What have you already tried?
What does a good answer actually look like?

Compare these two:

Write me a README for my project.

I'm a solo dev building a CLI tool in Go that converts markdown to PDF.
Target audience is developers who want quick doc generation without a
browser dependency. Write a README with quick start, installation, and
usage sections. Keep it under 200 lines. No badges.

Same task. Completely different outputs.

System prompts at real AI products work the same way. Harvey (the legal AI) works not because of smarter code — someone built a world into its prompt. Long lists of examples. Verbal if-else statements. That's the whole trick.

2. Stop Asking for Summaries. Ask for What Most People Get Wrong.

Default prompt: "Summarize this book" gets you obvious facts you already half-knew.

Better structure:

1. Summarize the book briefly
2. Give me red pill insights — things most people miss or have backwards
3. Give me actionable evidence from those insights
4. Give me things the book believes that most of the world doesn't.

Most summaries converge on consensus. The useful stuff lives in the gap between what most people believe and what the evidence actually says. Ask for that gap directly.

3. Meta-Prompting — Let AI Write Better Prompts for Itself

You don't have to be good at prompting. Just ask the model to help you.

For image generation:

Here's my idea: [describe your concept].
Now write me a Midjourney prompt for this.
Focus only on what matters visually. Don't over-specify background elements.

GPT knows what diffusion models tend to over-bake. Its prompt will be cleaner than yours on a first pass.

For UI or frontend work, instead of "build me a page like Stripe's" — ask GPT to break down every visual component of Stripe's landing page first. Get the full spec list: colors, typography, layout logic, spacing, interactions. Then use that list as your actual prompt.

Soft multi-stop gradient. Teal to desaturated yellow. Pale off-white radial center. Blurred edges.

Specific specs beat vague aesthetic references every time.

4. Multi-Level Teaching — Ask for Three Explanations at Once

This one changed how fast I pick up new concepts.

Teach me [concept] in three modes:
- Mode 1: I'm 5 years old, no jargon at all
- Mode 2: I'm a CS undergrad with basic knowledge  
- Mode 3: I'm a senior engineer who wants the nuances and edge cases

Three explanations, one response. Read the simple one first, get the rough shape of the idea, then go deeper when it clicks.

Documentation can't do this. Stack Overflow can't do this. With AI, "I still don't get it, go simpler" is just one sentence in the same window.

5. Gap Finder — Ask What You're Missing

Probably the most underused technique on this list.

Based on everything I've told you about [topic / project / code],
what are the gaps in my understanding?
What am I missing?
Where is my reasoning weak?

Why it works: there's no ego in the room. When a colleague points out your blind spots, there's social friction — tone, politics, people softening the feedback. With AI, you just get the gaps listed out.

Practical version for code review:

Here's my architecture and approach: [paste it]

What's weak, missing, or wrong here?
What would a senior engineer push back on?

You'll almost always find something you didn't see.

6. Confidence Scores — Stop Trusting Everything It Says

AI models are built to be helpful. The side effect is they lean toward confirming what you want to hear — wrong info gets the same confident tone as correct info.

One line changes this:

For each fact or claim you make, add a confidence score (1–10).
If you're below 7 on anything, flag it so I know what to verify.

Once you ask for a number, the model shifts. You'll start seeing "I'm about 60% confident in this" instead of flat assertions. Anything below 8 — look it up before it goes anywhere important.

This matters most for security advice, library versioning, performance benchmarks, anything going into production docs.

7. Voice-to-Text Prompts

Most devs write short prompts because typing feels like friction. Result: 5-word prompt, 5-sentence generic response, frustration.

Simple fix: record it instead of typing it.

Open your phone's voice recorder. Talk through the whole thing — what you're building, what you've tried, what broke, what a good answer looks like, edge cases you're worried about. 5 minutes of talking gets you more usable context than most people type in 30 minutes.

Transcribe it, paste it in as the prompt.

One thing worth knowing — this is voice to transcription to paste, not the live AI voice conversation mode. Live voice runs a smaller, faster model to keep latency low. For a detailed technical prompt, you want the full model. Record, transcribe, paste. Different workflow.

8. How to Stop Your AI Writing From Looking Like AI Writing

Relevant if you write docs, READMEs, blog posts, or any external content with AI help.

Kill this sentence pattern first:

"X isn't just about Y"
"X goes beyond Y"  
"This is more than just Y"

These are textbook LLM constructions. Rewrite them as direct affirmative statements. That single change removes most of the obvious tells.

Then use your own writing as a style reference:

Here are some things I've written before: [paste 3–5 paragraphs of your own writing]

Now help me write [new thing] in this same voice.
Avoid filler phrases and generic AI sentence structures.

It won't perfectly replicate your voice, but it'll sound like a person wrote it rather than a shared template.

9. What Should I Learn Next?

Take the gap-finding idea one step further:

Based on what you know about my interests and what I've been working on,
what should I learn next and why?
What concept or skill would give me the most leverage right now?

The most useful things you'll learn aren't things you knew to search for. They come from connections between things you already know — concepts from adjacent fields that reframe something familiar.

I ran a version of this prompt and it surfaced "costly signaling" — an evolutionary psychology concept that explains a lot about how attention and respect actually work in practice. Never would have Googled it. Didn't know it existed. Ended up being genuinely useful for thinking about product decisions.

The more context AI has about your work over time, the better these get.

10. Emotional Framing Actually Affects Output

A Google DeepMind paper found that telling AI to "take a deep breath" before solving math problems improved accuracy. LLMs trained on human writing absorbed traces of how humans respond to emotional language, and those traces affect output.

In practice:

# More thorough responses
"Take your time on this."

# Less hallucination on complex tasks
"This is important. Think through it step by step before answering."

# More careful with numbers and stats
"If you give me inaccurate data here, it will cause real problems downstream."

That last one sounds strange. It works. Try it on any prompt where accuracy matters and compare to a version without it.

The Bigger Picture

Most devs use AI as a task machine. Write this function. Explain this error. Generate this boilerplate. And the output feels hollow because the input was hollow.

The devs getting real value are mostly using it differently — asking AI to stress-test architecture decisions, surface gaps in their understanding, review their reasoning before they commit to an approach. The code output is almost secondary. What changes is how clearly they understand the problem before they start.

Writing a good prompt forces you to know what you actually want. That clarity, more than any model update or clever technique, is what changes what comes back.

Which of these do you already use? Any prompting patterns I missed? Drop them in the comments.

Inspired by Varun Mayya — his original video goes deeper on several of these with solid examples.