DEV Community: MemeChatAI

Real-time subscriptions with Firebase and React Native

MemeChatAI — Mon, 08 Jun 2026 18:21:57 +0000

Every chat app works fine until two people use it at once. You send a message, it saves, and the other person sees nothing until they close the screen and open it again. So you add a refresh button. Then a timer that re-fetches every few seconds. Now you're hammering the database, burning reads, and the conversation still feels a half second behind real life.

We hit this building Meme Chat AI, an app where you trade memes back and forth with a bot. Messages were a read you triggered, so the screen only knew what was true the last time you asked. Firestore's real-time listeners are how we stopped asking and let the data come to us.

What onSnapshot actually does

A normal Firestore read with get() gives you the data once, like taking a photo. onSnapshot opens a live connection instead. You hand it a query, and Firestore calls you back the moment anything matching that query changes, with the new state already in hand.

firestore()
  .collection('chats')
  .doc(chatId)
  .collection('messages')
  .orderBy('createdAt', 'asc')
  .onSnapshot(snapshot => {
    const next = snapshot.docs.map(d => ({ id: d.id, ...d.data() }));
    setMessages(next);
  });

That callback fires on the first load, and then again on every new message, every edit, every delete. You stop writing fetch logic and start reacting to changes.

Wiring it into a component

The listener has to live for as long as the screen does and then go away cleanly. In a component that means opening it in useEffect and returning the unsubscribe function so React tears it down on unmount.

useEffect(() => {
  const unsubscribe = firestore()
    .collection('chats')
    .doc(chatId)
    .collection('messages')
    .orderBy('createdAt', 'asc')
    .onSnapshot(snapshot => {
      setMessages(snapshot.docs.map(d => ({ id: d.id, ...d.data() })));
    });

  return unsubscribe;
}, [chatId]);

onSnapshot hands you back its own unsubscribe function, so returning it from the effect is the whole cleanup story. When chatId changes, React runs the cleanup and opens a fresh listener for the new chat.

The bug you will write at least once

Forget that return unsubscribe and the listener keeps running after the screen is gone. Switch chats a few times and you have four listeners alive at once, all calling setMessages on a component that no longer exists. It shows up as memory creeping, duplicate updates, and the occasional warning about setting state on something unmounted. The fix is always the same line you skipped.

Writes feel instant for free

The part I didn't expect: the sender doesn't wait on the server. Firestore applies your write to the local cache first and fires your own listener immediately, then syncs to the backend in the background. So the person typing sees their message land right away, and the person across the room sees it a moment later when the server confirms. You get optimistic updates without writing a single line of optimistic update code.

What it bought us

No refresh button. No polling timer quietly draining the read quota. Type on one device and it shows up on another without anyone asking the database whether anything changed. New features that touch messages inherit the live behavior automatically, because they read from the same listener instead of rolling their own fetch.

None of this is special to a meme app. It's one listener per screen, opened in an effect and cleaned up on the way out, with the cache making your own writes feel instant. The win was deciding the screen should react to the data instead of going out and asking for it.

If you want to see the live updates in a shipped app, Meme Chat AI is on the App Store.

Theming a React Native app in one place with NativeWind

MemeChatAI — Mon, 08 Jun 2026 14:49:34 +0000

Every app starts with one brand color in one file. Then there are nine copies of #6C4DFF scattered across buttons, a header, a loading spinner, and a settings row nobody has opened in months. The day someone asks to nudge the brand a little warmer, you are grepping for hex codes and hoping you caught them all.

We hit this building Meme Chat AI. Styles were defined per component, so the design lived in forty StyleSheet.create blocks instead of in one place. NativeWind is how we pulled it back into a single source.

What NativeWind actually is

It's Tailwind for React Native. You write utility classes in a className prop and they compile to native styles, no runtime stylesheet objects to maintain by hand.

<Pressable className="bg-primary rounded-2xl px-4 py-3">
  <Text className="text-on-primary font-semibold">Send</Text>
</Pressable>

That's the surface change. The part that mattered more for us was where those names like bg-primary come from.

One config, every screen

The token names resolve from a single Tailwind config. Colors, spacing, radius, and font sizes all live there, and every component reads the same definitions.

// tailwind.config.js
module.exports = {
  theme: {
    extend: {
      colors: {
        primary: "#6C4DFF",
        "on-primary": "#FFFFFF",
        surface: "#0E0E12",
        muted: "#9A9AA8",
      },
      borderRadius: { card: "20px" },
    },
  },
};

Now bg-primary means the same purple in every file. When that purple changes, it changes in one line and the whole app moves with it. No component owns its own copy of the brand, so there is nothing to hunt down later.

Dark mode stops being a project

Because the theme is centralized, light and dark are two values of the same token instead of two parallel stylesheets. You mark the variant inline and NativeWind picks the right one from the system setting.

<View className="bg-white dark:bg-surface">
  <Text className="text-black dark:text-white">Memes incoming</Text>
</View>

There's no theme-switch plumbing threaded through every component and no second set of styles to keep in sync with the first. The thing that usually turns into a multi-day retrofit became a prop.

What it bought us

Design changes that used to touch dozens of files now touch the config. New screens inherit the brand for free because they're built from the same tokens as everything else, so they look consistent without anyone enforcing it by hand. And the styles sit next to the markup they apply to, which made the components easier to read than a class name pointing off to a stylesheet elsewhere in the file.

None of this is special to a meme app. It's one config holding the design, utility classes reading from it, and dark mode falling out of the same tokens for free. The win was deciding the theme lives in exactly one place, and letting every screen borrow from it instead of keeping its own copy.

You can see where it landed in Meme Chat AI.

Keeping a chat app's token bill flat as conversations grow

MemeChatAI — Mon, 08 Jun 2026 02:59:37 +0000

Every chat feature has the same quiet problem. The first message costs almost nothing. The hundredth message costs a fortune, because by then you are re-sending the entire backlog on every single turn.

We hit this building Meme Chat AI, a chat app where the assistant talks back in memes. A conversation that ran long enough would start sending five, ten, twenty thousand tokens of history with each reply, most of it old and irrelevant to what the user just typed. The model still has to read all of it, you still pay for all of it, and latency creeps up the whole time. Here is what we did about it, and the rate limiter we put in front of it so a single client can't run the bill up on its own.

The shape of the fix

The naive options are both bad. You can send the whole transcript (cost grows without bound) or send only the last few messages (the model forgets what happened earlier in the chat). We wanted neither.

The pattern we landed on is a rolling summary plus a verbatim window. Every prompt looks like this:

[ stable system / persona prompt ]
[ summary of older turns ]
[ last N turns, word for word ]
[ the current user message ]

Older turns don't get dropped. They get folded into a running summary. Recent turns stay exactly as written, because that's the part the model actually needs at full fidelity to answer the next message. Nothing is ever silently lost: a message is either inside the verbatim window or inside the summary.

Sizing the window by tokens, not message count

Our first version capped the window at a flat message count. That turned out to be the wrong knob.

A flat count punishes everyone equally, which means it punishes the wrong people. A user on a higher tier has a much larger input budget to work with, so there's no reason to start summarizing their conversation as aggressively as a free user's. But a fixed "keep the last 12 messages" rule did exactly that.

So we size the window from the token budget instead. Take the plan's input allowance, subtract the fixed overhead that rides along in every prompt (the persona prompt, the summary slot, the current turn), and let the verbatim tail fill most of what's left:

function verbatimBudgetTokens(maxInputTokens: number): number {
  const headroom = maxInputTokens - PROMPT_OVERHEAD_TOKENS;
  if (headroom <= 0) return 0;
  return Math.round(headroom * 0.85);
}

That 0.85 is deliberate. Our token count is an estimate, and the provider's count is the one that bills you. Leaving a margin means a small drift between the two estimates never pushes the assembled prompt over the model's actual input limit. There's also a hard ceiling on message count sitting on top of the token budget, purely as a safety bound so a flood of tiny one-word turns can't balloon the prompt or the database reads. In normal use the token budget is what gates; the count cap almost never bites.

Truncation is a fallback, not the main mechanism

The summary handles the long-term growth. But assembly still does a final check before anything goes to the model: build the prompt, count it, and if it's somehow over budget, drop the oldest verbatim message and recount. Repeat until it fits.

let current = recent.slice();
let messages = build(current);
let inputTokens = countMessagesTokens(messages);

while (inputTokens > maxInputTokens && current.length > 0) {
  current = current.slice(1);
  messages = build(current);
  inputTokens = countMessagesTokens(messages);
}

The system prompt, the summary, and the current turn are never candidates for dropping. They're load-bearing. Only the recent-history tail gets trimmed, oldest first. In practice this loop rarely does anything, because the window was already sized to fit. It exists for the edge case where a single pasted wall of text blows past the estimate, and it guarantees we never hand the API a prompt it will reject.

The cheapest token is the one you stop re-sending

A subtle source of bloat was attachments. When a user sends an image or a GIF, that turn is expensive. The image parts alone can be a couple hundred tokens for one still and several times that for a GIF that gets sampled into frames. The model needs all of that on the turn the image arrives. It does not need it five turns later.

So once an attachment turn ages into history, we collapse it to a short text placeholder instead of re-sending the pixels:

// historical turn that once carried an image
"[User sent an image]"

The model keeps the thread of "the user showed me something here" without paying the visual token cost on every subsequent turn. Only the current turn is ever allowed to carry real image data.

Two things worth knowing about caching

Two design choices are really about the prompt cache, which most providers now price at a steep discount for tokens they've seen before.

First, the big static persona prompt goes first and stays byte-identical across every turn and every user. Anything user-specific (their name, their language, any per-user memory) lives in a second block after it, so the expensive cacheable prefix never changes shape from one user to the next.

Second, the summary only changes when we actually re-summarize. As long as it's stable, the [persona][summary] prefix stays cacheable between turns. That's also why we don't re-summarize on every message. We batch it: the background summarizer only folds aged-out turns into the summary once enough of them have accumulated, by count or by token volume. Re-summarizing constantly would churn the prefix and throw away cache hits to save a trivial amount of summary length, which is a bad trade.

The summarizer itself runs as a background job on a cheaper utility model, decoupled from the request path. The user's reply never waits on it.

Rate limiting, kept boring

Token discipline controls cost per conversation. It does nothing about a client hammering the endpoint. For that we put a small per-IP limiter in front of the streaming function, backed by the database we already had rather than a new piece of infrastructure.

It's a fixed window: one document per IP per hour, an atomic increment, reject once the count crosses the threshold.

const hourBucket = Math.floor(Date.now() / WINDOW_MS);
const docId = `${ipKey(ip)}_${hourBucket}`;

return db.runTransaction(async (tx) => {
  const snap = await tx.get(ref);
  const count = snap.data()?.count ?? 0;
  if (count >= REQUESTS_PER_HOUR) return false;
  tx.set(ref, {
    count: FieldValue.increment(1),
    expireAt: Timestamp.fromMillis((hourBucket + 2) * WINDOW_MS),
  }, { merge: true });
  return true;
});

A few details that matter more than the algorithm:

The IP is hashed before it ever touches storage, so we're not keeping a log of raw client addresses. The bucket carries an expireAt, so a TTL policy sweeps old documents and the collection doesn't grow forever. And the limiter fails open when there's no IP to key on or when it's running locally, so development against a single localhost address doesn't trip the cap every few minutes. The cost is one read and one write per request, which is cheap next to an LLM call.

A fixed window has a known weakness: a client can fire a full window's worth of requests at 1:59 and another full window at 2:00. A sliding window or token bucket smooths that out. For our traffic the simple version was the right amount of engineering, and you can always tighten it later without touching anything upstream.

What it bought us

Long conversations stopped getting linearly more expensive. Cost per turn flattened into a band set by the plan's budget instead of climbing with the message count. Older context survives as a summary rather than vanishing, recent context stays exact, and the persona prompt stays cached across turns. The rate limiter caps the blast radius of any single client for the price of one extra read and write.

None of this is exotic. It's a summary buffer, a token budget, a placeholder for old attachments, and a counter in a database. The useful part was picking the token budget as the thing to scale on, and treating the cache prefix as something to protect rather than an afterthought.

All of it runs in production behind Meme Chat AI if you want to see where it ended up.

Adding subscriptions to a React Native meme bot with RevenueCat

MemeChatAI — Sun, 07 Jun 2026 13:35:01 +0000

I built a chatbot that sends you memes, digs up gifs, and roasts you. It's called MemeChatAI, and it is, very much on purpose, brainrot. You talk to it, it talks back in the dumbest, funniest way it can manage. That's the whole pitch.

So here's the thing nobody warns you about when you build a joke app: the joke is free, but charging money for it is a real engineering project. The bot was the fun weekend part. Billing was the part that could actually lose people's money if I got it wrong, and that's the part I want to talk about, because RevenueCat ate most of it for me.

The problem I did not want to own

If you've never shipped in-app subscriptions before, here's the short version of what you're signing up for. Apple has its own receipts and renewal rules. Google has different ones. Both need server-side validation if you want to trust anything. Then there's restoring purchases, upgrades, downgrades, free trials, grace periods when someone's card bounces, and the lovely edge case where a user buys on their iPhone and then logs in on an Android tablet and expects their stuff to be there.

I did the math on building all of that myself and decided I'd rather not. For a roast bot. I'm not proud, but I'm also not sorry.

RevenueCat is basically the layer that sits between your app and both stores and turns all of that into one SDK and one webhook. That's the elevator pitch, and in my case it mostly held up.

One SDK, both stores

The app is React Native on Expo, using react-native-purchases (v10). The same code path runs on iOS and Android. Apple's StoreKit and Google's Billing Client are both hiding behind one API, so I'm not maintaining two native billing integrations that drift apart over time.

Prices, products, and the trial all live in the RevenueCat dashboard, not in my code. The app pulls them down as "offerings" and renders whatever comes back:

const offerings = await Purchases.getOfferings();
const pkg = offerings.current.availablePackages[0];

await Purchases.purchasePackage(pkg);

Nothing is hardcoded. Pricing localizes per region automatically, and "Restore Purchases" is one line:

await Purchases.restorePurchases();

The quiet win here is that I can change a price or tweak the trial from a dashboard without shipping an app update or sitting in a review queue for two days. Pricing is config now, not code. That alone has saved me more than once.

Entitlements are the only source of truth

This is the part that made the rest sane. Instead of my app trying to reason about raw receipts, everything resolves to a single entitlement called pro. From there I map RevenueCat's products onto four internal tiers: free, basic, plus, and power. That mapping lives in one place and is shared between the client and the server, so the two can't disagree about what a "plus" user is allowed to do.

There's also a listener that fires the moment anything changes:

Purchases.addCustomerInfoUpdateListener((info) => {
  updatePlanFromEntitlements(info.entitlements.active);
});

Buy, renew, upgrade, cancel: the UI flips over without a refresh or a manual poll. The first time I tested an upgrade and watched the higher tier just appear, I'll admit it felt a little magic.

Purchases follow the account, not the phone

On login I tie the RevenueCat customer to the Firebase user:

await Purchases.logIn(firebaseUid);

That one call is why "buy it on your iPhone, use it on your Android" works at all, and why my backend can always trace a purchase back to the right profile. I genuinely did not appreciate how annoying this is to do by hand until I didn't have to.

The webhook is where I stopped trusting the client

The client is fast but it's also a liar sometimes. Networks drop. People background the app mid-purchase. So the actual billing record lives server-side.

A Cloud Function listens for RevenueCat webhook events (INITIAL_PURCHASE, RENEWAL, PRODUCT_CHANGE, EXPIRATION, CANCELLATION, BILLING_ISSUE, TRANSFER, and friends) and writes the user's plan to the database. RevenueCat is the source of truth, full stop.

A few things I made sure of, because billing bugs are the worst kind of bug:

Events are idempotent. I dedupe by event ID, so a retry can't apply the same upgrade twice.
Writes are transactional.
Sandbox and test events are gated out of production, so my own testing never touches a real user's plan.

The client still does an optimistic "you're upgraded, go enjoy it" write so the app feels instant. But there's a rank guard: a stale client can never downgrade a plan that the webhook authoritatively set. The webhook always gets the last word and reconciles. I lost an afternoon to a race condition before I added that guard, and I'd rather you not.

The stuff I got for free

A pile of things I would have built badly came included:

The 7-day free trial is tracked off RevenueCat's trial signals, so I know the difference between someone in trial and someone who actually converted. Upgrades grant the new tier immediately; downgrades wait politely until the current billing cycle ends. Billing issues and grace periods are RevenueCat's problem to manage, and my app just logs and waits. When someone wants to cancel, I hand them the native Apple or Google "manage subscription" screen instead of pretending I should be in the middle of that.

There's also a test store for local dev, so I can run the whole purchase flow without standing up real App Store and Play products first. For iteration speed that's huge.

Is there a catch? Sort of.

I want to be honest, because I find "this tool is perfect" posts useless. Handing your billing source-of-truth to a third party is a real dependency, and I thought hard about it. If RevenueCat has a bad day, my purchases have a bad day. There's a cost once you cross their free tier, and you're trading some control for all this convenience.

For a solo-ish project shipping a meme bot to two stores, that trade was obviously worth it. For a company whose entire business is subscriptions at massive scale, I'd at least want to think about it longer. Your call.

One accuracy note while I'm here: RevenueCat doesn't see or store your card. Apple and Google run the actual transaction. RevenueCat manages entitlements on top of that. I'd been fuzzy on this myself before I read the docs, so I'm spelling it out.

So

The dumb part of MemeChatAI took a weekend. The billing took real care, and most of that care went into the half-page of webhook logic above, not into reimplementing two stores' worth of receipt validation. That's the trade I'd make again.

If you want to get roasted by a bot, it's on the App Store, and there's more at meme-chat-ai.com. Fair warning: it has no manners. That's a feature.

I built an AI chat app because I was tired of AI sounding like a corporate memo

MemeChatAI — Sat, 06 Jun 2026 05:00:31 +0000

Every AI assistant I tried gave me useful answers but the writing always felt like it came from HR. Four paragraphs, a bullet list, and "I hope this helps!" tacked on at the end. I'd ask something simple and get back a wall of text that read like a policy document.

The thing is, nobody actually talks like that. I don't, my friends don't, and when I want a quick answer I don't want to feel like I'm reading a memo.So I decided to build one that talks the way people actually do. An assistant that gives you real answers but sounds like a person. Specifically the kind of person who lives in the replies, sends you a meme when you're being dramatic, and still somehow knows the answer to your question.

That became Meme Chat AI. The assistant is called Brainrot Bot. It helps you rewrite dry texts so they don't sound flat, explain things without the textbook fog, give honest feedback on half-baked ideas, and find the right caption or angle for whatever you're working on.

The whole bet behind it was that being useful and having a personality are not a tradeoff. Most AI products act like they are, like the only way to be taken seriously is to sound serious. I don't think that's true, and building this app has mostly confirmed it.

I'm building this in public and I'll keep posting here about what's working, what broke, and what I picked up along the way. Follow along if you're into the indie app process or just curious where this goes.