loading...
Cover image for How to count documents in Google Cloud Firestore?

How to count documents in Google Cloud Firestore?

malaniuk profile image malaniuk Originally published at Medium ・6 min read

Issue Description

All known(probably) relational databases and a lot of NoSQL databases, has a pretty simple interface for getting a total number of documents/rows/tables. Typically such interfaces support by DB Engine from a box.

Most of the developers that start to work with Firestore for the first time expect the same in Cloud Firestore SDK. But there is no built-in function for it.


Official Firestore github repository has a few Feature Requests for ‘count() documents’ function, that was created a few years ago. Checking the comments we can understand that a team did not plan to implement a feature in future releases.
https://github.com/firebase/firebase-js-sdk/issues/236


Trying to find a way to count the total number of documents in a firestore collection you will find a huge amount of StackOverflow questions. All of them have various hacks and workarounds, with a lot of limitations and bugs.

Possible solutions

I will go through all the possible solutions that I found and trying to analyze their weaknesses.

Snapshot Size

The first solution is pretty simple and straight forward. The idea is to get all documents and count them.

db.collection('collectionName').get()
  .then(snapshot => console.log(snapshot.size));

✅ Simple to implement.
✅ Fine, for small-size collections (10–250 documents).
❌ Return the wrong value for huge collections (1000+ documents).
❌ Increase latency for simple operation(need to fetch all documents before count).
❌ Overuse of Firestore Reading Limits — Each operation will use N reading operations instead of ~1. Obviously it will rapidly increase your budget.


Write On Create

Probably the first idea will be to store a count number in a separate collection. And increase it each time we create a new item.

const newDocData = { /* */ };

const docCollection = admin.firestore().collection('collectionName');
const statisticCollection = admin.firestore().collection('statCollectionName');
// create new document in collection
return docCollection.doc().set(newDocData).then(() => {
  // increase collection counter value
  return statisticCollection.update({
    docCounter: admin.firestore.FieldValue.increment(+1)
  });
});

✅ No needs to fetch all documents to count it. Save Firestore budget.
❌ You need to place code to change counter in each place you create/delete docs. Hard to handle errors in transactions or batch create/delete.
❌ Not possible to handle items, that were created/removed from Firestore Web UI/Firestore Client.


Write Listener

Google Cloud Functions / Firebase Functions — Give us the possibility to create Lambda functions that can be triggered on special events.

Firestore has events to track write operations on collections/documents. Implementations of it look native and organic for such a problem.

There are a lot of references for such a solution across the internet.

const statisticCollection = admin.firestore().collection('statCollectionName');
// setup cloud function listener
export const documentWriteListener = functions.firestore
  .document('collectionName/{id}')
  .onWrite((change, context) => {

    if (!change.before.exists) { // if new document created
      statisticCollection.update({
        docCounter: admin.firestore.FieldValue.increment(+1)
      });
    } else if (change.before.exists && change.after.exists) {
      // document updated - Do nothing
    } else if (!change.after.exists) { // document deleted
      statisticCollection.update({
        docCounter: admin.firestore.FieldValue.increment(-1)
      });
    }

  return;
});

❌ It looks like a perfect solution but it not working properly. If you will try to run this function and then create documents(100 as an example). The final counter value will be more than >100.

Let's investigate wats wrong with this solution, and why it is not working as expected.


Firestore Trigger Limitations

Alt Text

The last point tells as that each trigger function will be executed at least once. This means it can be triggered a few times in case of some issues, instance replication, etc.

It is a main point that we need to keep in mind to create a perfect solution.


Final Solution

A final solution will be based on the Write Listener solution. But we need to fix duplication writes for a counter. And improve the solution for multi counters.

Each firestore event has a context ID. This ID guaranteed to be unique for each create/delete operation.

Lets first create a separate collection to store Events by ID. Each event should be a separate document with a few fields, timestamp, collectionName, and value.

// a list of collections names
const collectionsToSave = [
    COLLECTIONS.USER,
    COLLECTIONS.POST,
    COLLECTIONS.TAG,
    COLLECTIONS.COMMENTS,
];

const docEventsTrigger = () => {
  // trigger on all collections and documents
  return functions.firestore.document('{collectionId}/{docId}')
    .onWrite((change, context) => {
      // cut out all events that not related to our collections
      if (!collectionsToSave.includes(context.params.collectionId))
        return Promise.resolve();
      // cut out all update events
      if (change.before.exists && change.after.exists)
        return Promise.resolve();
      // store event and collection id
      const id = context.eventId;
      const collection = context.params.collectionId;
      // create a server timestamp value
      const timestamp = admin.firestore.FieldValue.serverTimestamp();
      // set a value +1 if new document created, -1 if document was deleted
      const value = !change.before.exists ? 1 : -1;
      // create new Event
      const newEventRef = admin.firestore().collection(COLLECTIONS.ADMIN_EVENTS).doc(id);
      // set data to new event and save
      return newEventRef.set({ collection, timestamp, value });
  });
};

Now run this trigger, and create an item, to check events creating fine.

Alt Text

The next step will be to count these events and write a number to a separate collection. And as an improvement to clean up events collection. As we do not need these values anymore. (Can be skipped for a low load system, less than <100 events per day).

// a separate function to count events values
const calcCollectionIncrease = (docs, collectionName) => { 
  return docs
    // get only events of current collection
    .filter(d => d.collection === collectionName)
    // calc total sum of event values
    .reduce((res, d) => (res + d.value), 0);
};

const collectionsToCheck = [
    COLLECTIONS.USER,
    COLLECTIONS.POST,
    COLLECTIONS.TAG,
    COLLECTIONS.COMMENTS,
];

const docEventsCleanUp = () => {
  // scheduled run on every 5 minutes, can be extended due to your system load.
  return functions.pubsub.schedule('every 5 minutes')
    .onRun((context) => {
      // we will take only old event, that was cr3eated more than 5 minutes ago
      const limitDate = new Date(new Date() - (1000*60*5));
      // get 250 last events, sorted from old to new
      const lastEvents = admin.firestore()
        .collection(COLLECTIONS.ADMIN_EVENTS)
        .where('timestamp', '<', limitDate)
        .orderBy('timestamp', 'asc').limit(250);
      // ref to statistic document 
      const statDocRef = admin.firestore().doc(COLLECTIONS.ADMIN_STAT_DATA_COUNT);

      return admin.firestore()
        .runTransaction(t => (t.get(lastEvents).then(snap => {
          // if no events do nothing
          if (snap.empty) return Promise.resolve(0);

          const size = snap.size;
          // map data for all docs to reuse it later
          const snapData = snap.docs.map(d => d.data());
          // Dictionary to store counters
          const updateCountersDict = {};
          // Count events values per collection
          collectionsToCheck.forEach(collection => {
            updateCountersDict[collection] = admin.firestore.FieldValue
              .increment(calcCollectionIncrease(snapData, collection));
          });
          // updat4e counters
          t.update(statDocRef, updateCountersDict);
            // in case counters was successfully updated, delete old events
            snap.docs.map(d => t.delete(d.ref));
            return size;
          })))
        // log result to google cloud log for debug
        .then(result => console.log('Transaction success', result))
        .catch(err => console.log('Transaction failure:', err));
  });
};

On a line, we set a limitDate to a value (currentTime-5minutes). It is because of 1 point in limitation document. Which means that each trigger can take up to 10 seconds to execute.


Finally, we need to get the exact value of the documents number. Even if it was not moved to counter from events.

We can do it using a simple script, that takes a last saved counter + count events for current collection.

const collectionToCheck = COLLECTIONS.TAG;
// ref to statistic document
const keyStatCountRef = admin.firestore().doc(COLLECTIONS.ADMIN_STAT_DATA_COUNT).get();
// ref to events collection filtered by one tag
const keyEvents = admin.firestore().collection(COLLECTIONS.ADMIN_EVENTS)
    .where('collection', '==', collectionToCheck).get();
// simultaneously run to query
Promise
  .all([keyStatCount, keyEvents])
  .then(([doc, eventsSnap]) => {
    // last counter value
    const statCount = doc.data()[collectionToCheck];
    // events value
    const eventsSum = eventsSnap.docs.map(d => d.data().value).reduce((res, val) => res + val, 0);

    return statCount + eventsSum;
  });

✅ Working Properly
❌ Increase write operations twice 2N + delete operations 1N. But a counter read operation take ~1(the first solution take N operation each read).
❌ Complex setup. It will be good to have a more simple solution, but.


Alt Text

I have tested the current solution with bulk creation and removing over 2k documents in few seconds. Everything working properly for me for a long time.


👌 Thank you for reading. I hope My Article was helpful to somebody that faces the same problem.

🙌 Share your own experience, so we can discuss it and improve the solution.


🏗️Feel free to check my side projects:

dummyapi.io
rgbtohex.page

Posted on by:

Discussion

pic
Editor guide