TLDR;
If you've been reading along you'll know I'm building a widget to provide some fun interactions in blogging posts to support the 4C community.
In this article I cover building out the data model in Firebase using FireStore. I'll touch on the security rules, and the use of Cloud Functions to create an API.
Motivation
I'm describing the process of building the interactive widget below, vote and see how other people are thinking about serverless:
Vote Below!
Requirements
Our widget requires the following:
- A user can create an account as a content creator
- With an account a user can provide a display name, an avatar, an HTML personal biography and a URL for their profile site
- With an account a user can create an "article" or a "comment"
- An article allows the user to specify the URL of one of their posts and have that tracked and enhanced by the widget. Articles will be recommended on other instances of the widget
- A comment allows the user to create a unique configuration of the widget that they can embed in comments or other parts of a post
- Comments and articles allow the content creator to configure the widgets to be shown
- When a widget is shown the system will track the number of views and unique visiting users for that configuration
- Widgets are able to provide the reader with achievements and points for interacting with the content
- Widgets may provide additional responsive and interactive capabilities that are used by plugin developers to create great experiences. For instance performing polls or providing quizzes. A robust and secure method of handling these responses will be provided by the widget framework.
Architecture
I decided to build the widget backend framework using only Firebase. I chose to use Firebase authentication, Firestore as a database, Firebase storage and Firebase Functions to provide an API.
I host the widget using Firebase Hosting.
Firebase Authentication
All users of the widget are signed in, but unless you are a content creator then this is an anonymous login and its used to track your points and the answers you provide in responses to the plugins creating the widget experience.
Content creators sign in using Email, Github or Google to create an account that is allowed to access the admin area of the website. These users can create configurations of the widget to fit with the content they are creating.
Firestore
All of the data is stored in Firestore, a description of the choices of structure, security and tables follows below. Firestore is easy to use but can become rapidly costly as you pay for each read of data. This has continually exceeded the free 50k limit on most days I've published content using the widget. I'll go into further detail about how I've addressed this as best I could work out.
It's important to note that Firestore does not have any in built aggregation queries which is pretty limiting for a widget that desires to perform reporting. Aggregations mostly have to be created by updating counters as the data is written, reading volumes of data for reporting would become very expensive, very quickly.
Firebase Functions
The Functions feature of Firebase allows you to create an API and also to create "triggers" that perform operations as data is updated. I've used both of these techniques to create the widget.
Firebase Storage
I don't need to store much, but I do allow users to upload an avatar and I store this in Firebase Storage (in a file keyed by their user id). That's all.
Firebase Hosting
The widget framework is built as a React app, it's deployed to Firebase Hosting which serves it for both the admin and the runtime interfaces. There's not much to say here except that I've used the rules to ensure that it works well as a SPA, by writing every sub path to read index.html.
// firebase.json
{
...
"hosting": {
"public": "build",
"ignore": [
"firebase.json",
"**/.*",
"**/node_modules/**"
],
"rewrites": [
{
"source": "**",
"destination": "/index.html"
}
]
}
Data Model
To support the requirements I came up with this data model:
User Writable Collections
At the core of this model are the collections that a content creator can write to:
All of the other collections require a logged in user (anonymous is fine) and are read only.
IDs
There are only 3 ID types used in the collections. The articleId is generated by nanoid whenever a new article is added, the user
.uid
comes from Firebase Auth and the tag
is a text string, there are some special ones that start __
but otherwise they come from the user specification.
Users
The user record generated by Firebase is also used to populate a record of my own in the userprofiles
collection. The data for displayName
, photoURL
and email
are copied across every time that they change.
In addition entries in this collection include a description
for the biography and a profileURL
to optionally contain somewhere to link to, if the user's avatar is clicked when it is shown in the widget.
Articles
A user can create articles. Comments are articles with a comment
field set to true
.
The user can only create, update and delete articles inside their own userarticles
sub collection of articles
.
When a userarticles/article is saved a Firebase Function Trigger copies the record to the main articles
table. For security purposes it is possible for a system admin to ban an article in the main articles
collection and the function ensures that this cannot be overwritten by the user. In addition when a user deletes an article it is not deleted in the main collection, but the enabled
flag is set to false
.
An article comprises some meta information about the original post (if it isn't a comment) so that this may be used to recommend the article when other users display the widget.
We'll look in detail at the trigger in a moment as it:
- sanitizes all HTML content
- creates other entries in the "counts" and "responses" collections and keeps core field in these up to date.
Article Response information
When I first put together the data model I had the "count" information and the "responses" in a single collection, however, this proved costly as it caused all currently running instances of the widget to redraw whenever anyone viewed an article.
What I want to happen is, when you are viewing the result of a poll, if another user votes your screen immediately updates. There is no point doing this update though if another user only saw the poll and didn't interact yet. By separating out the "counts" and the "responses" I was able to significantly reduce the amount of reads and reduce the cost of the system.
Firebase has the excellent
onSnapshot
function to notify you of table writes in real time, this provides for an exciting score update animation as you interact and the pleasure of watching the results of a poll change as others vote.onSnapshot
works with individual records and collections.
Below you can see the various tables that track interactions with an article. The clouds show the Functions API calls that are writing to these tables:
Counts
Counts contains a list of all of the unique visitor ids and uses this to track a unique visitor count in addition to a total number of views.
Counts does also contain a copy of the responseCount
so that it can be reported to the content creator by reading a single record.
The trick to saving reads in Firebase is to synchronise data so that you can read it all back in one go.
Responses
The contents of the responses in the responses collection is down to the author of the plugin. Only interactive plugins like polls and quizzes need to use these features. The responses collection has a number of API calls that ensure the responses of individual users are kept separate providing a very robust way to interact.
Plugin authors use this data to render their user interfaces and update it using the respond
and respondUnique
methods.
tags
The tags table is a collection of counters, they are used to track the popularity of tags associated with articles and comments and to track other things like the total number of views for all 4C content managed by the widget.
Firebase has some pretty heavy limits on concurrency and write speed (1 record update per second), for this reason fast moving counters end up being 'sharded' across a number of entries. In the case of the widget, we shard total views into 20 separate keys and then add up the values in all 20 to get the total answer. A shard in this case is just a tag name with a random number between 0 and 19 added to the end of it.
User Scores
The only other collection contains a score for the user. It also contains a list of the achievements they have earned.
Scores are automatically awarded for viewing and interacting with content. A plugin author may also add additional items based on their design - for instance quizzes award points for correct answers.
Enforcing Security
A number of methods are used for enforcing security in the app. An integration of App Check and Recaptcha v3.0 attempts to stop illegal calls to the API functions and a definition of the rules for Firestore access provides the way to stop a malicious user writing things that they shouldn't.
Firestore rules are applied in sequence, the final rule bans all reads and writes:
rules_version = '2';
service cloud.firestore {
match /databases/{database}/documents {
match /responses/{document=**} {
allow read: if request.auth != null;
allow write: if false;
}
match /counts/{document=**} {
allow read: if request.auth != null;
allow write: if false;
}
match /tags/{document=**} {
allow read: if request.auth != null;
allow write: if false;
}
match /articles/{document=**} {
allow read: if request.auth != null;
allow write: if false;
}
match /userarticles/{userId}/{document=**} {
allow read: if request.auth != null;
allow update, delete: if request.auth != null && request.auth.uid == userId;
allow create: if request.auth != null && request.auth.uid == userId;
}
match /scores/{userId} {
allow read: if request.auth != null;
allow write: if false;
}
match /userprofiles/{userId} {
allow read: if request.auth != null;
allow update, delete: if request.auth != null && request.auth.uid == userId;
allow create: if request.auth != null;
}
match /{document=**} {
allow read, write: if false;
}
}
}
Cloud Functions do not have these rules applied and hence they can be used to write to the read only tables.
Triggers
The source code (which is available on GitHub) applies a number of trigger functions, but the most interesting one is the creation or update of an article. The Firestore Function onWrite is a catch all for create, update and delete:
exports.createArticle = functions.firestore
.document("userarticles/{userId}/articles/{articleId}")
.onWrite(async (change, context) => {
Here we say we want to run this function every time a user writes an article.
if (!change.after.exists) {
const id = change.before.data().uid
await db
.collection("responses")
.doc(id)
.set({ enabled: false }, { merge: true })
await db
.collection("counts")
.doc(id)
.set({ enabled: false }, { merge: true })
return
}
If the after does not exist the record has been deleted, we tell both the responses and the collection this information.
const data = change.after.data()
sanitizeAll(data)
data.comment = data.comment || false
delete data.banned
await change.after.ref.set(data)
Here we are sanitizing the HTML and setting the comment flag (null is not good enough for Firestore queries as a false, it must be explicit). We also don't allow the incoming record to change the banned
property of the master article.
The last line above writes the data back into the users copy of the record.
await db
.collection("articles")
.doc(data.uid)
.set(data, { merge: true })
This is now writing the master article record.
Next we setup the response and count, or update them if they already exist:
const responseRef = db.collection("responses").doc(data.uid)
const responseSnap = await responseRef.get()
if (responseSnap.exists) {
await responseRef.set(
{
processedTags: data.processedTags || [],
author: data.author,
enabled: data.enabled,
comment: data.comment || false
},
{ merge: true }
)
} else {
await responseRef.set({
types: [],
enabled: data.enabled,
created: Date.now(),
author: data.author,
comment: data.comment || false,
responses: {},
processedTags: data.processedTags || []
})
}
const countRef = db.collection("counts").doc(data.uid)
const countSnap = await countRef.get()
if (countSnap.exists) {
await countRef.set(
{
processedTags: data.processedTags || [],
author: data.author,
enabled: data.enabled,
comment: data.comment || false
},
{ merge: true }
)
} else {
await countRef.set({
enabled: data.enabled,
created: Date.now(),
author: data.author,
visits: 0,
comment: data.comment || false,
uniqueVisits: 0,
lastUniqueVisit: 0,
lastUniqueDay: 0,
recommends: 0,
clicks: 0,
processedTags: data.processedTags || []
})
}
})
}
Conclusion
Firebase turned out to be flexible enough to build the widget, but it is very limited on reporting and has to be carefully watched to avoid costs associated with reading lots of data. The article "recommendation" will feature next time, but this was a serious cause of read usage.
Top comments (5)
Nice work, the only thing I would change is the backend service. Using functions is not the cheapest way to host a serverless backend. You should use Cloud Run, it can handle multiple simultaneous request, which functions can't
Ah interesting, thanks for the info! I think Cloud Functions scale to multiple "servers" when there are many calls and that each instance runs only one call at a time. I'll checkout Cloud Run :)
Exactly, Cloud Functions can only handle one request at the time, if one instance gets 2 simultaneous request then another instance will be created, and you will be charged double. With Cloud Run you can handle up to 250 simultaneos request in a single instance, and it scales nice as well. You also has a free tier that can take more advantage of because of this capacity of handling multiple requests. Also I forgot to tell you: you should use GCP API Gateway service as a middleware between your frontend app and your backend (Cloud Run or Cloud Function, it doesn't matter) to protect against unauthenticated calls and to your backend's undefined paths calls. It also has a good free tier.
That sounds good. So I'm currently using App Check with Recaptcha 3 and protecting each individual call, but this sounds like a good way to go.
And here is an embed in a comment :)