Jonathan Gamble

Posted on May 28, 2021 • Edited on Mar 8, 2023 • Originally published at code.build

How to Build a Scalable Follower Feed in Firestore

#firebase #firestore #twitter #relational

UPDATE 3/7/23

Here is my final article on this subject will all versions:

https://code.build/p/GNWh51AdUxUd3B8vEnSMam/building-a-scalable-follower-feed-with-firestore

As the author of adv-firestore-functions, I feel like I have figured out how to hack every problem the Firestore Team refuses to solve internally, except how to connect the relational data of a follower feed.

I have stared at this Stack Overflow Question for hours (will update my answer on it after I post this), and filled pages of Microsoft OneNote on my IPAD with conceptual ideas. I think I have finally found a workable, scalable, solution.

SPOILER: - this is conceptual and has not been tested...

The Problem

Allowing users to have followers and to follow other users is not a problem. You could easily add and remove a user by the collections: users/followers and users/following..., but here is the kicker... how do you pull only posts by users you are following and sort them by createdAt in descending order. Keep in mind a user may have 2 million followers, but may only follow say 1000 people.

Following Angle: Pulling 1000 different users' latest posts to populate my feed will not only cost me extraneous reads, it will be slow to sort, and I don't need all feeds. I only want 5 or 10 at a time. Also, the 10 latest posts may only be by 3 users, so it gets even more complicated.

Followers Angle: The noSQL way of doing things would theoretically be to update every user's feed with a copy of my post every time a new post is created. If a user has 2 million followers, I need to make 2 million copies. This is a crap load of writes, and could timeout in a firestore function. Again, this is very slow as well.

Problem Specifics

So we need to basically make an index, not timeout the firestore function, try and limit the number of writes, and be able to sort them when reading without slowing down the client.

~Easy Peasy~

Non-Scalable Method

One method you may see is to use arrays, and array-contains like in Fireship.io's Data Modeling Course. Obviously documents have limits, so my estimate is a max of 10,000 items in an array from this data. Jeff (from Fireship) also puts the latest 3 posts on a user document, which creates the problem of getting anything beyond 3 posts.

Theoretically you could copy the user data into a second (or third, forth, etc) for every 10,000 users... but then you still have an inexact user count if you use my colCounter() function, for example, and all the other problems from above.

But, Fireship did get me going in the right direction... (if you're reading this, you need to check out his courses regardless, all kinds of topics outside of firebase - fireship.io).

Problem 1: Solving relations...

However, arrays are the key here...

I believe a better way to model the data is like so:

users/$userId/followers

users/$userId/followers_index

posts/$postId

_relations/_relationId

Each relation document contains:

{
  userId: 12ksl2123k,
  postId: 12sk2skeeiwo2,
  createdAt: 5/2/21,
  followers: [
    3k2l12k3ls,
    g2lss9837ie,
    titsiel22,
    ...
  ]
}

And you have a copy of this for each group of 10,000 followers a user has. I will get into data consistency in a bit, so hold tight.

The key here is that if a user has 20,000,000 followers, I only need 2000 copies of each posts (20,000,000 / 10,000). This is HUGE!

Sidenote

users/userId/followers is a collection with all followers. users/userId/followers_index is a collection of documents with the followers array ready to be copied. That way you don't read all followers one by one ever. Again, 2000 docs for 20,000,000 followers...

Creating the relation index...

My goal was to write something in my adv-firestore-functions that does this automatically like so, but sadly I may never return to Firestore development due to the reasons here.

It would run on a postWrite trigger and look like this:

await relationIndex(change, context, {
  fields: ['createdAt'],
  array_collection: `users/${author$}/followers_index`,
  array_name: 'followers'
});

(just like my search functions...)

I would have added options, but generally speaking it would have created 2000 documents for 20 million users automatically, for example. It would also add the createdAt field for sorting (or whatever fields from the post document necessary for your user case). This assumes the id of the followers collection is the userId. Like the rest of my search indexes, if any of the fields in posts were changed or the post was deleted, it would auto update these documents. Here are some ideas from my package on how to do that if you decide to implement this.

I would have written a second function for data consistency. In order to keep the data consistent, you need to update all documents by a user a user is subscribed to. So, if user 123 unsubscribes from user 456, 123 needs to be removed from the follower array for every post user 456 has ever created. If this is just posts, it may only be dozens or hundreds. If this is videos it could be thousands, and tweets may be tens of thousands, but I believe that is even more rare. Most cases will be 1-30 documents, not a big deal.

If a user is removed, that document will always have 9999 items in the array (or less). It makes more sense instead of more complex functions to always have 10,000 users on each document. Users don't unsubscribe as often.

This would all be done on a users/followers write trigger (which would also add the user to users/followers_index). This document would look like:

count: 52,
followers: [
  123ksl2,
  2k3l22l,
  3920132,
  s2l2235,
  ...
]

...an array of follower with the total count. The docID is whatever.

You would also need a third trigger for when users are deleting their accounts, but you may not want to even allow that option (just disable).

Finally, you get the user feed on the front end like so:

db.collections('_relations')
.where('followers', 'array-contains', CURRENT_USER)
.orderBy('createdAt', 'desc');

You could index all the post info on that document, or you could just pull the document from the postId on the relation doc using pipes, for example. The point is, you have options...

Problem 2: Firestore Function Limits

The next problem I believe I solved, is firebase function limits... so my theory is simple: run multiple functions for chunking. The same function triggers itself until the updates are completed. Firebase functions have time and memory limits...

Internally my package would have created _functions/${eventId} using the firestore event Id from the firestore functions. I do similar things in my package with _events, you just never needed to understand it.

The postWrite trigger from above would basically create a new document from the eventId like so:

{
  lastDoc: (reference to first follower document),
  collection: 'users/followers_index',
  field: 'followers',
  chunk: 500
}

And the _functions collection would have another trigger that repeats updating the lastDoc document reference until all documents have been read...

The function would get db.collection('users/followers').startAfter(lastDoc) in chunks of 500 and add it to the post relation index document.

After there are no more followers left, the trigger loop ends...

Is your head exploding yet?!

The point here is not about the followers, but about the concept of Bulk Reads and Writes by saving your chunks and place into a separate document. I would have probably updated my bulk delete and bulk update functions to do this as a side node.

This concept would also have been used to unfollow etc... You may not even need this, since even 2000 documents can be handled by firestore functions easily... we know batch can handle 600.

This thing is freakin scalable...

Conclusion

I am writing this article for two reasons.

1.) To show proof of concept and get it out of my head
2.) To hope someone someday uses some of these ideas

I would love if someone wrote this, as I probably never will. I am exhausted of noSQL in general, but love challenges. I am currently developing in DGraph, but Supabase.io seems really interesting, as well as NHost.io. They all solve problems I never want to solve again using noSQL, and the perhaps weakest in features, Firestore.

If anyone wants to write this, feel free to send me a pull request. In fact, any updates to my package are welcomed.

Keep ideas flowing, and keep speech free...

Update 8/12/21

I thought I would give a little more specifics on how this theoretically works:

users/{userId}

{
  ...user data...
  followers_index: [
    slejf,
    23k2l2,
    ...
  ],
  latestFollowersIndex: slejf
}

The followers_index would be a list of all doc ids for the followers index (each with 10,000 users), and the latest one being latestFollowersIndex. This is NOT an array of followerId, but an array of followers_index, which itself is an doc with an array of followers...

A user follows another user
- Your client adds a new doc to users/{userId}/followers/{followerId}
- the followers collection triggers an onWrite function that:
  - gets latestFollowersIndex from user doc
  - if count is >= 10000 on latestFollowerIndex doc, then create new followers_index doc, set latestfollowersIndex to new doc
  - adds the followerId to the followers field in users/{userId}/followers_index/{latestFollowersIndex}
  - increases count field
  - get all postIds collection('posts').where('userId', '==', userId)
  - foreach postId create new doc _relations/_relationId (with whatever post doc data you want)
  - copy followers array from users/userId/followers_index/latestFollowersIndex --> followers[] to each relations doc -- created for each user's posts -- (there should be a relations doc for each post by each user for each 10,000 followers), so a user with 10 followers and 5 posts = 5 relation docs only
A user unfollows another user
- users/{userId}/followers/{followerId} is deleted
- onWrite trigger removes userId from relevant followers_index doc, which in turn updates relation docs with new array (without that user) and removes one from that count
A user adds a new post
- posts onWrite trigger creates a new _relations doc only foreach 10,000 followers, so < 10,000 followers === 1 doc

Hope this helps give a little more information,

Oldest comments (4)

Pinkovai Krisztian • Aug 9 '21 • Edited

Hello! Great Article!

I would like to know a little more detail on this technique because I myself am trying to do a posts feed with firestore where people's posts are either public, follow or private type. (public - everyone can see the post, follow - only followers of the post creator can see the post, private - only the post creator can see the post).
Do you think that this kind of posts feed makes things even harder? Also, maybe we could talk somewhere so I can get a better grasp about your technique that you described above, in this article?

Thanks!

Jonathan Gamble • Aug 10 '21 • Edited

PM me on Fireship.io slack: fireship.page.link/slack - jdgamble555

ThanHtutZaw • Aug 31 '23

When displaying newsfeed of friends , should I store post id in feed collection or can I query for this specific news feed . The reason is I can't multiple query in firestore .

ThanHtutZaw • Aug 31 '23 • Edited

const postQuery = query(
collectionGroup(db, posts),
where("authorId", "in", friendsList),
where("visibility", "in", ["Friend", "Public"]), // this will not work
where("createdAt", ">=", new Timestamp(1693409835, 2000000)),
orderBy("createdAt", "desc"),
limit(10)
);