Migrating a Monolithic SaaS App to Serverless — A Decision Journal (6 Part Series)
If you've read any of my articles, you probably know I’m a big advocate of building greenfield apps on serverless. There are countless tutorials and case studies of people who have done this, but there aren't many stories out there of migrations of legacy production apps (with real paying users) to serverless, and all the challenges that these entail.
To help this, I’ve decided to publicly document my journey of migrating a production app from a server-based architecture to serverless on AWS.
Here's what I hope you’ll get from this series:
- A warts-and-all account of converting a monolithic server-based system to a serverless microservices architecture.
- A deeper understanding of key services in the AWS serverless suite, and their strengths and weaknesses.
- An opportunity to ask questions or make suggestions on approaches to solve a particular problem.
- Insights into my decision reasoning, my SaaS business and some figures.
- Links to resources that helped me understand or implement a part of the migration.
- Lots of code samples.
Firstly, some personal background. Though my main job is as a full-stack developer/consultant, I’ve also been running my own bootstrapped SaaS product for over 5 years now. Autochart is a website visitor analytics/lead management app for automotive dealer websites. It has been slowly growing in customers to a point where it now has hundreds of users and provides a significant portion of my income.
Autochart has gone through a few architectural iterations over the years and I have introduced a few serverless microservices around the edges over the last 18 months (using API Gateway, Lambda and Kinesis). However, downstream of these microservices the core of the system is still a web portal/API built as a monolithic Express.js app running on containers in AWS ECS, with an mLab MongoDB database.
Migrating a stable production system to a new architecture is not something you should do lightly. Migrating to serverless in particular will almost certainly involve a vast rewrite of your existing codebase, unlike, say, if you are migrating a traditional server-based app to run inside containers where the changes are generally limited to the infrastructure level. You should do your own cost-benefit analysis before proceeding.
My main motivations for doing this are:
- to reduce costs and resources (mLab is great service but expensive).
- to be able to deploy new features independently with less risk. The current codebase has accrued some tech debt which makes it difficult to make changes in certain areas.
- to reduce/remove intermittent errors when customer websites gets large traffic bursts.
- to reduce security liability by cutting usage of scores of third party libraries (that an ExpressJS monolith requires).
- to stop spending time patching EC2 instances.
- to make it easier to move from N.Virginia AWS region to Ireland as it’s closer to my customer base and also reduces my GDPR liability.
- to build a deeper real-world expertise in AWS serverless technologies that I can apply to future products of mine or my clients.
I will be working on this migration just a few hours each week over the next several months. I have no hard deadline to meet. There are a few constraints however:
- Must be seamless to customers: no downtime or detriments to service. I care for my customers, they provide me with an income and I don’t want to betray their trust and support. It’s my own skin in the game here so I need to keep risks low.
- Migrate in chunks, avoiding large big-bang cutovers as far as possible. This means less stress for me worrying about something breaking. It also allows me to bank small wins as I go along and to easily pause work if something more important comes up (or if I change my mind altogether on the benefits outweighing the investment of my time).
- The REST API contract cannot change as it's used by the front-end SPA (which I don't want to touch) and also by a few inbound integrations from third parties.
- Ensure rollbacks are possible if something does goes wrong.
- mLab were recently acquired by MongoDB Inc, so I’ll be forced to migrate my database to their Atlas service within the next 8 or so months.
At the moment I only have the above high-level goals and constraints in mind. I haven’t yet done any detailed migration planning and there are still many unanswered questions and decisions I need to make.
I will be using this series of posts to discuss these with you before I make and execute upon them. To give you a taster, here are some questions I expect I’ll need to answer:
- What should I replace MongoDB with as my main database? Probably DynamoDB, but what alternatives should I consider?
- How will I map and migrate my existing data from MongoDB into DynamoDB/alternative?
- Will I need to synchronise data changes to both MongoDB and DynamoDB during a cutover period where both are still in use?
- What service boundaries can I identify within my monolith API that I can separate out into microservices?
- Should I use a mono-repo or separate repos for each microservice?
- How can I reroute single API requests at a time to my new APIGW/Lambda functions (e.g. using a strangler pattern?
- How will I test the new APIGW+Lambda API endpoints?
- How should I move authentication and authorisation from my existing PassportJS implementation inside the Express app to API Gateway? Should I use Cognito, a custom authoriser or something else?
In software design (and possibly also in life), I prefer deferring a big decision until a time at which I can’t proceed on a critical path without making it. So although I have thoughts and leanings on all of the questions listed above, I haven’t yet made a definite decision and am not going to do so just yet.
The first question I need to make a decision on is:
Do I start with the MongoDB to DynamoDB migration or with the monolith Express to APIGW+Lambda code rewrite?
Either choice would take me down a very different route so I need to think this through. Let’s dive into the pros (✅) and cons (❌) of both…
- ✅ MongoDB is the main bottleneck whenever the system is under load.
- ✅ mLab is my single biggest cloud bill item, so the sooner it’s removed, the sooner I can realise these cost savings.
- ✅ I have a deadline of roughly 8 months to migrate from mLab to Atlas. I can avoid this altogether if I have MongoDB out of the picture by then.
- ❌ DynamoDB’s indexing and query model is quite different to MongoDB’s, so would require a lot of analysis to determine the best design.
- ❌ How to manage the cutover period when two databases are running side-by-side is complex and risks getting data out of sync.
- ❌ I need to make across-the-board updates to the legacy monolith codebase in order to replace Mongoose (a Node.js MongoDB data mapper library) calls with DynamoDB API calls. This code may need refactored again when it’s moved to Lambda functions.
- ❌ There are a lot of steps (and risks) involved in getting the first piece of DynamoDB code into production, e.g. designing the target schema and indexes, writing a script to migrate the data, come up with side-by-side running plan and update the app code to change an API endpoint to use the new db.
- ✅ The Express app is already almost stateless, so I have minimal concerns about inconsistent state when routing API requests over to the new code.
- ✅ I won’t have to patch the EC2 app server instances once this is complete (mLab handle this for the MongoDB replica set).
- ✅ There aren’t many steps involved in getting the first API Gateway+Lambda code into production usage, so I can get a quick win sooner.
- ✅ The Express app is running on Node.js v6 whose support goes end-of-life in April 2019. Upgrading to v8 will come automatically as I incrementally move the code to Lambda.
- ❌ Given MongoDB is within a VPC, I’ll need to VPC enable my Lambda functions and put up with the performance/scaling limitations that this incurs.
- ❌ I won’t realise any significant cloud bill savings until all API endpoints have been migrated over and I can turn off the EC2 instances and load balancer. Even then, the cost of the EC2 instances is low compared to the mLab bill.
Weighing up the pros and cons of both paths, I’ve decided to go with option 2 — start with the code rewrite.
This will allow me to get code into production faster and in smaller chunks. Another reassuring factor for taking this path is that it’s similar to the path that AWS Serverless Hero Yan Cui took on Yubl’s road to Serverless architecture . I’m hoping to use many of the techniques I learned from taking Yan’s excellent Production-Ready Serverless course as part of this migration.
Before I get started into the migration proper, I’m going to set up 2 new AWS accounts (dev/staging and production) for the resources I’ll be creating as part of the migration. Currently my staging and production environments are in a single account, along with a few unrelated static websites (don’t do this at home, kids). However, I want to get my house in order and isolate these going forward, so I’ll use AWS Organizations to help structure my accounts.
After this, I’ll look at identifying service boundaries within the monolith API with a view to coming up with a sequence of microservices that I can extract one by one in a logical order.
In my next post, I will share my findings of these investigations along with more information on the “as-is” architecture with you.
Do you have questions or suggestions, or disagree with any of my reasoning?
Can you see something obvious that I'm missing? If so, great! That's why I'm doing this 🙂. Please tell me in a comment below.
✉️ If you'd like to get future updates in this series as soon as they're ready, you can subscribe here.
You also might enjoy:
Originally published at winterwindsoftware.com.