DEV Community

Andrei Dascalu
Andrei Dascalu

Posted on

Refactoring - Migrating to a cloud provider

Strangely this challenge proved to be the most straightforward bit.

The customer imposed Azure as a cloud, so that was that.

Our requirements for this:

  • shared (not dedicated) application infrastructure. To control costs, we don't want to dedicate infrastructure to customers. A full-pipeline production customer may have 5 base environments, each with some number of running instances. We don't want to automatically add a 32Gb VM if they need another instance and there may be some unused resources on an existing one. We also don't want to manually provision smaller ones or have a gazillion different VM pools.
  • easy way for developers to cough up a new environment without micromanaging routes
  • a customer will have the following environment levels: dev (automatically or manually deployed)/auto (for automated testing)/test (our acceptance and some manual testing)/accept (customer acceptance)/prod
  • each given environment could be scaled up to a number of running instances, automatically or manually

On our side, we looked at traditional deployment pipelines that would take the code and script a delivery process on a VM which would be part of the Azure equivalent of AWS' autoscaling groups. That would mean, operationally, to maintain some routing lists on the external load balancer level.

However, this would mean that the load balancer would have to route a given domain or path rule to a given VM (or to all VMs in a group), so we would have to provision and configure a different local proxy if we wanted to have multiple environments on a VM.

For example, our loadbalancer would need to route, say *, * etc. But where? We don't know on which VM a running instance may be. We could label them, but then when scaling happens, we need to make sure an instance only has the proper labels to service a given customer. Also, we don't have different load balancers per customer.

The existing system was sort-of configured like this, except that the local proxy was a single Apache instance that also handled the PHP interpretation. Multi-tenant done properly (with shared infrastructure) would mean dedicated webservers which could be restarted individually, with the common routing done at proxy level.

Too complicated to do manually ...

But fortunately most of us were versed in the art of containers and we managed to cook up a Dockerized development environment in a couple of days. It was a no-brainer then to decide to use Kubernetes in Azure.

The system went like this:

  • Azure AKS with nginx-ingress and a couple of static IPs (both outgoing and incoming)
  • configmaps would hold the per-customer configuration
  • a build would create and push a container to a registry
  • a daemon inside the AKS cluster itself would poll the registry and deploy new builds automatically to QA environments
  • HPAs would enable some basic autoscaling based on memory/CPU usage but later we would add more interesting rules.

Changes done to the application:

  • make it stateless (this was very time consuming): since containers are disposable, the application must not write files in local paths (or even shared paths, if multiple instances are expected to run when scaled up) which are needed later (for example: file uploads).
  • logging to stdout: AKS collects stdout/stderr from containers, so the application should not write logs to files, but directly to output. Fortunately, there's Monolog!
  • use Azure for customer uploads: there's a thing called Flysystem which provides a filesystem abstraction that allows seamless access between local filesystem (like copy from local tmp) and various cloud storage systems.

Developer experience:

  • a developer would need to copy/adjust a deployment/configmap/service and ultimately ingress, usually by editing out the relevant labels
  • we ended up scripting with yq (CLI yaml find/replace tool) and later on packaging with helm
  • much later the configmaps were encrypted with sops and Azure KM and kept in codebase.

Phew! This was by far the fastest bit. Two days to make enough changes to create local docker-compose system, two more for the initial setup in Azure .... but quite some time to make the application stateless. Uploads were a fairly quick thing to do, but for some time afterwards we would keep discovering unexpected places where the application relied on locally produced files. Of course, often used things were quickly discovered and fixed but more obscure features came back with a vengeance (then again, obscure features were always a pain since they never found a place in test suites).

Onwards, to glory!

Discussion (0)