We've been working on
covidfyi.in for almost 2 months now and have gone through significant-tech changes already.
We're a platform where the data for all the government released resources like hospitals, helplines, doctors, etc are aggregated by our dedicated data team and that's presented on our frontend in an easy to find manner.
We started with our backend made in Django and PostgreSQL and hosted on Heroku.
But quick enough, we started running into limitations.
First was, Heroku dyno sleeps after 30 minutes of inactivity, hence, the first request after the server has gone to sleep takes too long to respond, upwards of 10 seconds.
The second limitation is more niche, our data team is using a google sheet to manage and collate the data - they felt a form-based admin dashboard would be too restrictive. So, whenever the data team updated the google sheet, we in the tech team downloaded the CSV and manually updated the database by running a pre-defined utility command. This was also limiting in another sense - whenever the data team wanted to add a new column, we would manually have to add that in the database schema inside our code - too tedious in my opinion.
We searched, but in no way, we could figure out how to include new columns in our schema at runtime. But this research wasn't fruitful at least when it was limited to the scope of an RDBMS.
Now, how did we overcome these limitations? Keep reading to find out :)
The first step was to shift from Django to Flask, because flask is lightweight and Django is an elephant, we didn't want a huge framework standing in the way of what we wanted to accomplish.
In the second step, we shifted from PostgreSQL to MongoDB, this is the change that brought the MOST return on investment. But why? it simplified our workflow by at least 100 times.
It's simple what we did, and makes me think how we were stupid enough to not think of this in the first place.
I wrote a script that extracts the JSON from our google sheet, sanitized the response, and inserted it into our MongoDB database as is and this script runs every 12 hours to update our DB.
This does two things - simplifies the workflow, and also at the same time allows for a dynamic schema.
Thirdly, we shifted from Heroku to a VM on GCP, this eliminated most of our performance issues.
And did I mention, we've a caching layer setup using Redis which caches most of our API responses, hence makes it super quick.
In short, our current architecture is - Flask + MongoDB + Redis + GCP.
If you've got any suggestions of how we could improve our architecture, drop in a comments please.