Jonathan Gamble

Posted on Sep 18, 2021 • Edited on Mar 8, 2023 • Originally published at code.build

Firestore Many-to-Many: Part 6 - The Final Backend Solution (Follower Feed)

#firebase #firestore #twitter #rxjs

UPDATE 3/7/23

Here is my final article on this subject will all versions:

https://code.build/p/GNWh51AdUxUd3B8vEnSMam/building-a-scalable-follower-feed-with-firestore

Follower Feed Finale

I have been writing so many posts about a Follower Feed, I got burned out on the subject. However, here is the finale, which shows you how to do it. I am finally turning my theory into practice. Okay, not completely, I don't want to build everything, but I do point out the problems in all my options.

Schema / Data Model

Whether you're using a GraphDB, noSQL, or SQL, the data model is still generally the same. You're going to have posts, users, and follows.

Posts
  id
  title
  createdAt
  userId
  ...
Users
  id
  username
  ...
Followers
  follower
  following

The Followers Table in SQL will have a unique key based on the composite index of the two fields.

In noSQL, since we do not need to query based on one user's followers, but based on who one user is following, we have two possible models:

Subcollection - users/{userId}/followers
Root Collection - followers

And, we will create some index collections using FB Functions later.

Now, we need to think about what we are trying to accomplish as an end goal.

Queries

These are generally the same, although I have seen more complicated versions using exists etc for better indexing. The GraphQL obvious will be a little different depending on the framework

SQL 1

SELECT * FROM Posts p WHERE userId IN
(SELECT following FROM Followers WHERE follower = $UID) 
ORDER BY createdAt DESC

SQL 2

SELECT * FROM Posts p
JOIN Followers f ON f.following = p.userId
WHERE f.follower = $UID 
ORDER BY createdAt DESC

GraphQL

query {
  queryPost(
    where: { userId: { follower: { id: $UID } } },
    order: { desc: createdAt }
  ) {
    id
    title
    createdAt
    ...
  }
}

So, what does this mean? Well, we know we really have two many-to-manys, and one connection.

Posts <- Followers -> Users

But, we can't go backwards trying to query the users first, since we need to sort by createdAt, which is on the post document / table / node.

So, we know we have to, and can only, query the posts... which is ultimately what we are trying to get, just a filtered, sorted version.

Technically we can do something similar to this:

const followersRef = db.collection('followers')
.where('follower', '==', $UID);

const following = (await followersRef.data()).following.

db.collection('posts').where('userId', 'IN', following);

But we are limited to following 10 people, and we are really doing two queries on the frontend, instead of a backend join.

So, let's add some automatic indexes:

Indexes

First, we are going to add a new collection called feed to hold our main indexes, then:

1.) Create followers_index

- Followers OnWrite
- arrayIndex function

2.) Update followers_index

- Users onWrite
- updateJoinData function

See Scalable Arrays

3.) Create feed, which is a special type of post index

    a. Post onWrite - CREATE
    b. get userId from doc as $UID
    c. grab all documents from users/$UID/followers_index
    d. foreach document DOC, create feed/DOC
    e. DOC contains postId, userId, createdAt, followers array
    f. Post onWrite - DELETE - make sure to handle deleting all
       these feed posts when a post is deleted by searching
       for all feed/docs where postId = id

This is pretty basic code. I may go back and add this later, but nothing special here. Your first question may be about C. Keep this in mind:

If each followers_index holds 10,000 users, then we only grab 100 documents for 1 million followers. If you're lucky enough to have 100 million followers, then yes, you would need to create 10,000 documents. That is actually doable in a Firestore function with my Bulk Update Function in theory. The most amount of followers ever recorded on any platform is under 200 million.

My Bulk Update Function is far from perfect. Since the Firestore Batch Function cops out at 500 actions, you need to use my bulk-update function, which basically loops through that. Yes, bulk-update uses set, so it can create documents.

The other problem with the Bulk Update is that Firestore Functions have a 9 minute limit you have to manually configure. You can also speed up the functions in general if you write them in Go.

You can use the Bulk Delete for deleting the feed documents.

If the Post document or User Document gets updated in this case, you don't need to do anything. You're just storing their IDs.

COMPLEXITY N = # followers_index documents
(between 1 and 20,000)

4.) Update the connection when a user follows / unfollows

This is equally as bad as step 3. If a user follows or unfollows another user, which is actually a rare action, all the feed documents need to be updated.

    a. users/{userId}/followers_index - onWrite
    b. also add / remove user from all feed docs:
       .where('userId', '==', userId)
       .where('followers', 'array-contains', followerID)

COMPLEXITY N = # Posts / Tweets
Youtube max videos (1.4 million), Twitter max tweets (37 million)

So, here in lies the real problem with this model... following and unfollowing...

However, I think most situations people are creating blogs where most users don't make it to 1000 blog posts.

So, how do we solve this problem?

I talked about a theoretical self-triggering function in my other post. I think this is still possible. I may write it one day, but you really need to look at your data model.

The Query

db.collection('feed')
.where('followers', 'array-contains', userId)
.orderBy('createdAt', 'desc');

You then need to use rxjs functions to join the user document and the post document on the frontend with either:

A custom pipe function like from Fireship.io
You could save the reference docs instead of the docIds and use my expandRef functions

Final Thoughts

So what do you do if you MUST use Firestore for a follower feed? It depends on what you want to and can limit.

LIMITS

Fireship Model - Limit to only each user's last few posts (5-20ish)
Version 2 - Limit the number of tweets all together - so not scalable at all
Version 3 - Limit the number of people a user can subscribe to but not subscribers (my favorite)
This Version - Limit the complexity of following / unfollowing, but overall scalable
Use Extra Database - No limits, just not technically using Firestore directly --- Redis Graph here. This is the safest and best option if you are building a serious app.

I believe Firestore is made to scale, but not made for complex apps. This is a complex app. So, with a little help from one of the options above, you can still stick with Firestore.

What would I honestly do?

I would use Version 3 for a simple app that is scalable and limit the number of people you can follow. I would not use this version personally, but again, it depends on your needs. Redis Graph is a good option too.

What am I honestly doing?

I am not using Firestore for any serious app. Yes, for play apps, not serious for the reasons here. I love Firestore as a hobby, but IMHO that is all it is.

I am about to switch all my production level apps to another database platform... that subject is TO BE CONTINUED IN A FUTURE POST...

Let me know any ideas I may have missed, as I tried to include ever detail of possibilities in this post.