How is large-scale zero-downtime migration achieved?
Ankush Thakur Aug 9
My recent post seeking help on SPA UX! was very well received on dev.to and served to clear my doubts and calm my fears. Since then, I've come to realize just how useful a platform like dev.to can be, where you are free to ask 'softer' questions and not get, ahem, downvoted.
So here's something I as a mid-level dev worry a lot about: zero-downtime migrations. Imagine you're running a fairly busy e-commerce store. The front-end is a single page application, the backend is a REST API with WebSockets and the usual bells and whistles, and just to make it more specific, the database is MySQL.
Now, there's been a system-wide refactor that has brought breaking changes to the front-end, the API as well as the database. What would the steps be, in general, to perform a zero-downtime migration here? Is it even possible?
My biggest nightmare is dealing with two things:
- The people already connected, and new people constantly connecting, to the older version of the database which us accruing wrong data (so to speak) every second.
- The requests being handled by the web server or the web socket data being transported, which doesn't let up for a second. If we simply switch web servers to the new backend, all these processes will be killed, and people who are, say, in the middle of making a transaction or submitting a long complaint will be screwed.
If it helps, please feel free to break down your answer/insights into different categories:
- Front-end migration
- API migration
- Database (schema) migration
Finally, you're welcome to point me towards other blogs and links, but please do summarize the strategy/advice in a couple of lines at least.
Thank you so much, folks! This is something I've been meaning to ask somebody for a long, long time, but there was no platform (or the people were too haughty to indulge 'noobs'). I really look forward to learning something awesome! 🙏 🙏 🙏