Seb Hoek

Posted on Apr 17 • Edited on Jun 11

AI Built My Game Portal - Then the Firebase Bill Arrived

#ai #performance #softwareengineering #webdev

My AI-built browser game portal was growing. That was good news - until Firebase bills started rising and performance got worse.

Now was the moment where I as a software engineer had to step in.

With an increasing user base playing more and more games every day, I slipped out of the free usage tier for Firebase Storage which I use for persistency.

Image: The costs increased to about 10 USD per months with a clear upward trend

Also, I noticed that the perceived performance of my HTTP services degraded over time.

This is less than optimal. What happened? It probably had to do with how AI has set up the HTTP requests and database queries.

How can I investigate the causes and what can I do to fix it? Let's dive into it!

Finding the real bottlenecks
Problem #1: A nightly job burning reads
Problem #2: One endpoint doing too much
Problem #3: Random seeds were surprisingly expensive
Back to free tier
Conclusion

Finding the real bottlenecks

For me, observability is one of the most useful tools for finding and understanding problems.

Rather than guessing, I used three sources of data:

Google Cloud Billing Reports to see where costs came from
Firestore Query Insights to identify expensive collections
API latency metrics in Google Cloud Observability to spot slow endpoints

What the metrics revealed

The billing report identified Firestore as the only contributor to the costs. And within Firestore, it was clear that some collections had too many reads for the daily active users of my gaming portal.

With up to 1 million reads per day, my small system exceeded the free tier threshold of 50k reads per day by more than factor 10.

Image: Firestore usage showed too many reads and writes before the optimization

The Firestore Query Insights indicated that some collections like the profile, game completion and highscores were the main source of the reads.

After having set up the HTTP API metrics in Google Cloud Observability, I could see that the profile resource was queried too many times and had a high latency, and that the same applied to the random seed generator resource.

With this information, I could challenge my coding assistant:

Which parts of the code read those collections too often?
Why is the profile resource slow and called so frequently?
Why is the random seed generator so slow?

Three areas stood out immediately:

a nightly cleanup job,
the profile endpoint,
and the random seed generator.

Together they were driving most of the cost and latency.

Problem #1: A nightly job burning reads

The first surprise came from a background job that users never even saw.

The nightly cleanup function, written by the coding assistant, had an N+1 read pattern that scaled poorly with the number of profiles. At small scale I didn't notice it, but with real usage it became a major cost driver.

What was going wrong

The job iterated over every profile document and then ran a subquery per profile to find old game starts:

// loads ALL profiles
const profilesSnapshot = await db.collection(PROFILE_COLLECTION).get(); 

for (const profileDoc of profilesSnapshot.docs) {
    const oldGameStarts = await profileDoc.ref
        .collection(GAMESTATS_COLLECTION)
        .where("startedAt", "<", cutoffDate)
        .get(); // separate query per profile
    // ...
}

This means the job always performed one collection scan plus N additional subqueries, where N is the number of profiles — even if most profiles had nothing to clean up.

In practice, with ~500 profiles but only ~30 containing stale data, the job still executed ~501 reads instead of ~30 relevant reads.

How we fixed it

We replaced the per-profile loop with a collection group query that directly targets only the documents that need cleanup::

const oldGameStarts = await db
    .collectionGroup(GAMESTARTS_COLLECTION)
    .where("startedAt", "<", cutoffDate)
    .get();

const batch = db.batch();
for (const doc of oldGameStarts.docs) {
    batch.delete(doc.ref);
}
await batch.commit();

This shifts the cost from being proportional to the number of profiles to being proportional to the number of matching documents.

In the same example, that reduced the work from ~501 reads down to ~30.

Result

This single change removed a large portion of the Firestore cost baseline. It also made the cleanup job scale with actual data size
instead of user count, which was the underlying issue.

Fixing the cleanup job removed a major source of waste, but the profile endpoint was still dragging both cost and latency.

Problem #2: One endpoint doing too much

The second hotspot was the profile endpoint which was heavily used throughout the portal.

The profile endpoint had become one of the slowest and most expensive parts of the system. It was queried frequently, responded too slowly, and generated far too many database reads.

The analysis revealed that the real issue was not one single bug, but several small inefficiencies that had accumulated over time.

What was going wrong

Several small inefficiencies compounded into one expensive endpoint.

1. Too many duplicate requests

When the profile page opened, multiple React components requested the same profile data at nearly the same time. Because there was no deduplication, several identical requests were sent in parallel.

2. Each request loaded more data than necessary

The backend always loaded additional subcollections such as game stats and recent completed games, even though most components that requested profile data did not need them.

3. Maintenance tasks ran during normal user requests

The endpoint also triggered cleanup jobs and daily event generation. Some of this work only needed to run once per day, but it was being checked on every request.

4. Extra network overhead on every call

The frontend forced a fresh Firebase auth token before each API request, creating an unnecessary extra roundtrip to 3rd-party services.

5. No effective response caching

Even if nothing had changed, the browser still downloaded the full profile response again.

How we fixed it

Together with the AI assistant, I optimized the endpoint in several layers:

1. Reuse cached auth tokens

I replaced getIdToken(true) with getIdToken(), allowing Firebase to use cached tokens until they actually expire.

2. Lazy loading

Game stats and completed games were removed from the default profile response and moved to separate endpoints. They are now only fetched when the user opens those sections in the profile view.

3. Move maintenance off the hot path

A lastMaintenanceAt timestamp now ensures cleanup and daily event generation only run once per day.

4. Request deduplication and caching with ETags

I added a short-lived in-memory cache on the frontend so simultaneous requests could reuse the first response instead of hitting the backend multiple times.

Additionally, if the profile has not changed, the server now returns 304 Not Modified, so the browser can reuse its cached version.

Result

The profile page became noticeably faster, backend latency dropped, and Firestore reads were reduced significantly.

Instead of one endpoint doing five jobs on every request, it now does only the work that is actually needed.

Image: Reduced reads on the profile collection after applying multiple improvements

After the profile endpoint, one last expensive pattern remained: seed generation.

Problem #3: Random Seeds Were Surprisingly Expensive

The final issue came from a feature that seemed harmless: random seed generation.

A seed is a number used to initialize a game so that players share the same world state. The system organizes seeds into hourly, daily, and weekly pools.

What was going wrong

Every backend request to retrieve a seed called getActivityWeights(), which computed selection weights based on multiple Firestore documents. Each seed in the pool was stored as a separate document.

Depending on the pool size, this resulted in 8 to 50 Firestore reads per request.

With ~200 daily users requesting seeds, this alone produced roughly 50k reads per day — effectively consuming the entire free tier budget.

How we fixed it

The issue wasn’t the weighting logic itself, but how it was stored.

Instead of computing weights by reading multiple documents on every request, we moved the computed state into the existing seedPools/{poolType} document, which was already being updated whenever a game finished.

Now the system maintains a seedWeights map directly inside that document.

When a seed is requested, the backend only reads this single document instead of fetching multiple entries from the pool.

Result

This reduced seed-related usage from ~50k reads per day down to ~2k reads per day.

The logic stayed the same, but the read pattern collapsed from N documents per request to 1.

After fixing all three issues, Firestore usage dropped back into free-tier limits.

Back to free tier

The optimized database reads directly resulted in going back into the free tier of Firebase, as the image below indicates.

Image: The billing report shows that the daily costs went to zero after the optimizations

In addition, the perceived performance improved is also visible in the HTTP API performance metrics. Most services respond within 100ms to 500ms, and the amount of requests of the profile resource was significantly reduced after the optimizations.

I am very happy now that costs dropped back into the free tier, and the system felt fast again. And I believe my users can feel the difference as well.

Conclusion

As discussed in earlier posts, AI code assistants help to ship and validate ideas fast. It is possible to create functioning and maintainable software at speed never seen before.

However, it seems that AI-generated code often prioritizes working solutions over efficient ones. Human review is still needed to optimize resource consumption (and therefore costs), scaling, and performance - ideally before cost explosions or performance degradation.

For me, AI coding assistance paired with human software engineering expertise is a game changer for the speed of shipping features and maintaining software systems.

DEV Community

AI Built My Game Portal - Then the Firebase Bill Arrived

Contents

Finding the real bottlenecks

What the metrics revealed

Problem #1: A nightly job burning reads

What was going wrong

How we fixed it

Result

Problem #2: One endpoint doing too much

What was going wrong

1. Too many duplicate requests

2. Each request loaded more data than necessary

3. Maintenance tasks ran during normal user requests

4. Extra network overhead on every call

5. No effective response caching

How we fixed it

1. Reuse cached auth tokens

2. Lazy loading

3. Move maintenance off the hot path

4. Request deduplication and caching with ETags

Result

Problem #3: Random Seeds Were Surprisingly Expensive

What was going wrong

How we fixed it

Result

Back to free tier

Conclusion

Top comments (0)