John Carroll

Posted on May 27, 2021 • Edited on Feb 3, 2023

Surprisingly simple method to increase Firestore performance & reduce cost

#firebase #rxjs

This post is intended for people already familiar with Firebase's Firestore as well as RxJS observables.

Recently, I realized that I could increase performance and reduce the cost of Firebase's Firestore using a simple query cache.

Edit: having used this in production for a while, I find this technique is mainly a performance improvement and does not have a large impact on cost (possibly because I've also enabled firestore persistence which already reduces cost).

The problem

While Firebase's Firestore is pretty fast to begin with, because it doesn't support automatically joining data together (i.e. SQL joins) a common workaround is to use observables to do something like the following (example taken from rxFire docs):

import { collectionData } from 'rxfire/firestore';
import { getDownloadURL } from 'rxfire/storage';
import { combineLatest } from 'rxjs';
import { switchMap } from 'rxjs/operators';

const app = firebase.initializeApp({ /* config */ });
const citiesRef = app.firestore().collection('cities');

collectionData(citiesRef, 'id')
  .pipe(
    switchMap(cities => {
      return combineLatest(...cities.map(c => {
        const ref = storage.ref(`/cities/${c.id}.png`);
        return getDownloadURL(ref).pipe(map(imageURL => ({ imageURL, ...c })));
      }));
    })
  )
  .subscribe(cities => {
    cities.forEach(c => console.log(c.imageURL));
  });

In this example, a collection of cities is being loaded and then image data needs to be separately fetched for each city returned.

Using this method, queries can quickly slow down as one query turns into dozens of queries. In a small app I made for my nonprofit, all these queries added up over time (as new features were added) and all of a sudden you'd be waiting 5-30 seconds for certain pages to load. Any delay is especially annoying if you navigate back and forth between pages quickly.

"I just loaded this data a second ago, why does it need to load everything again?"

What I wanted to do was cache query data for a period of time so that it could be quickly reused if someone navigates back and forth between a few pages. However, without giving it much thought, it seemed like implementing such as cache would take time and add a fair amount of complexity. I tried using Firestore persistence with the hope that this would automatically deduplicate queries, cache data, and increase performance, but it didn't have as much of an impact as I'd hoped (it did reduce costs somewhat, but it also unexpectedly reduced performance).

Turns out it was really easy to create a better cache.

The solution

I implemented a simple query cache that maintains a subscription to a query for some configurable amount of time, even after all observers have unsubscribed from the data. When a component executes a new query, instead of immediately calling Firestore, I check the query cache to see if the relevant query was already created. If it was, I reuse the existing query. Else, I create a new query and cache it for the future.

The code:

import { Observable, Subject } from 'rxjs';
import stringify from 'fast-json-stable-stringify';
import { delay, finalize, shareReplay, takeUntil } from 'rxjs/operators';

/** Amount of milliseconds to hold onto cached queries */
const HOLD_CACHED_QUERIES_DURATION = 1000 * 60 * 3; // 3 minutes

export class QueryCacheService {
  private readonly cache = new Map<string, Observable<unknown>>();

  resolve<T>(
    service: string,
    method: string,
    args: unknown[],
    queryFactory: () => Observable<T>,
  ): Observable<T> {
    const key = stringify({ service, method, args });

    let query = this.cache.get(key) as Observable<T> | undefined;

    if (query) return query;

    const destroy$ = new Subject();
    let subscriberCount = 0;
    let timeout: NodeJS.Timeout | undefined;

    query = queryFactory().pipe(
      takeUntil(destroy$),
      shareReplay(1),
      tapOnSubscribe(() => {
        // since there is now a subscriber, don't cleanup the query
        // if we were previously planning on cleaning it up
        if (timeout) clearTimeout(timeout);
        subscriberCount++;
      }),
      finalize(() => { // triggers on unsubscribe
        subscriberCount--;

        if (subscriberCount === 0) {
          // If there are no subscribers, hold onto any cached queries
          // for `HOLD_CACHED_QUERIES_DURATION` milliseconds and then
          // clean them up if there still aren't any new
          // subscribers
          timeout = setTimeout(() => {
            destroy$.next();
            destroy$.complete();
            this.cache.delete(key);
          }, HOLD_CACHED_QUERIES_DURATION);
        }
      }),
      // Without this delay, very large queries are executed synchronously
      // which can introduce some pauses/jank in the UI. 
      // Using the `async` scheduler keeps UI performance speedy. 
      // I also tried the `asap` scheduler but it still had jank.
      delay(0),
    );

    this.cache.set(key, query);

    return query;
  }
}

/** 
 * Triggers callback every time a new observer 
 * subscribes to this chain. 
 */
function tapOnSubscribe<T>(
  callback: () => void,
): MonoTypeOperatorFunction<T> {
  return (source: Observable<T>): Observable<T> =>
    defer(() => {
      callback();
      return source;
    });
}

I can use this cache like so:

export class ClientService {
  constructor(
    private fstore: AngularFirestore,
    private queryCache: QueryCacheService,
  ) {}

  getClient(id: string) {
    const query =
      () => this.fstore
        .doc<IClient>(`clients/${id}`)
        .valueChanges();

    return this.queryCache.resolve(
      'ClientService', 
      'getClient', 
      [id], 
      query
    );
  }
}

Now, when the ClientService#getClient() method is called, the method arguments and identifiers are passed to the query cache service along with a query factory function. The query cache service uses the fast-json-stable-stringify library to stringify the query's identifying information and use this string as a key to cache the query's observable. Before caching the query, the observable is modified in the following ways:

shareReplay(1) is added so that future subscribers get the most recent results immediately and also so that a subscription to the underlying Firestore data is maintained even after the last subscriber to this query unsubscribes.
Subscribers to the query are tracked so that, after the last subscriber unsubscribes, a timer is set to automatically unsubscribe from the underlying Firestore data and clear the cache after a user defined set period of time (I'm currently using 3 minutes).
delay(0) is used to force subscribers to use the asyncSchedular. I find this helps keep the UI snappy when loading a large dataset that has been cached (otherwise, the UI attempt to synchronously load the large data which can cause stutter/jank).

This cache could be further updated to allow configuring the HOLD_CACHED_QUERIES_DURATION on a per-query basis.

Conclusion

This simple cache greatly increases performance and potentially reduces costs if it prevents the same documents from being reloaded again and again in rapid succession. The one potential "gotcha" is if a query is built using Date arguments. In this case, you need to be careful about using new Date() as an argument to a query since this would change the cache key associated with the query on every call (basically, this would prevent the cache from ever being used). You can fix this issue by normalizing Date creation (e.g. startOfDay(new Date()) using date-fns).

Hope this is helpful.

Top comments (10)

Sam Corcos - in Tahoe • Jan 25 '22

This post is super helpful!

John - are you free to chat sometime? I'm playing around with a Firebase project and I'm looking for a (compensated) consultant to help guide me through a few complicated bits.

John Carroll • Feb 8 '22

Back from my travels. If you'd still like to chat, DM me on Twitter and I'll send you my email address.

Sam Corcos - in Tahoe • Feb 17 '22

DM sent :)

John Carroll • Feb 2 '22

I’m currently out of town and away from a computer. I’ll endevor to respond in a week or two.

Maslow • Dec 28 '21

Thanks for this solution. I was trying to implement a similar thing but I was afraid the queries would keep on watching a data not relevant anymore. I guess the timeout solves this issue. Are you still using this solution as of today?

John Carroll • Dec 29 '21 • Edited

Are you still using this solution as of today?

Yup. Though I've found that any impact on cost is limited (at least compared to enabling Firestore persistence). It's mainly a performance improvement. I also added the ability to configure the HOLD_CACHED_QUERIES_DURATION on a query by query basis. For my app, there are a few queries that make sense to hold for the lifetime of the application since they are used so often.

Maslow • Nov 16 '22

I've recently jumped back into this code again and it's interesting because in the end it didn't reduce the cost that much. But now I understand something a little bit better. Keeping a query in cache reduces the cost if this query is called several times from several different places or if you do a lot of back and forth in the app. But when you leave a page which needs a query results, you'll still get charged for every update to this query for 3min (or the duration you chose) even if you don't need them anymore. Does that sound right @johncarroll ?

John Carroll • Nov 16 '22

But when you leave a page which needs a query results, you'll still get charged for every update to this query for 3min (or the duration you chose) even if you don't need them anymore. Does that sound right @johncarroll ?

Yes

Maslow • Dec 30 '21

I haven't pushed to prod yet but for what I've tested I'm expecting a huge cost reduction. If not then it means that Firebase already agregates those queries in the SDK. I'm expecting ~70% less reads. (Fyi I'm not using persistence since I had huge performance issues with it).

Grandschtroumpf • Aug 18 '21 • Edited

Be careful with shareReplay(1). It never unsubscribes in version 6. You need to specify refCount: true as a params to unsubscribe when there is no subscriber anymore. As you can see in the code, the refCount is false by default :
github.com/ReactiveX/rxjs/blob/6.6...
To avoid this pitfall you can do shareReplay({ bufferSize: 1, refCount: true })

Note: I think that this is working as expected in version 7 though