At Datawire we are helping many organisations with deploying applications to Kubernetes. Often our most important input is working closely alongside development teams helping them to build effective continuous integration and continuous delivery (CI/CD) pipelines. This is primarily because creating an effective developer workflow on Kubernetes can be challenging -- the ecosystem is still evolving, and not all the platform components are plug-and-play -- and also because many engineering teams fail to realise that in order to "close the loop" on business ideas and hypotheses, you also need to instrument applications for observability. We often argue that the first deployment of an application into production through the pipeline is only the start of the continuous delivery process, not the end as some think.
All of us are creating software to support the delivery of value to our customers and to the business, and therefore the "developer experience" (DevEx) -- from idea generation to running (and observing) in production -- must be fast, reliable and provide good feedback. As we have helped our customer create effective continuous delivery pipelines for Kubernetes (and the associated workflows), we have seen several patterns emerge. We are keen to share our observations on these patterns, and also explain how we have captured some of best patterns within a collection of open source tools for deploying applications to Kubernetes.
Everything we do as engineers begins with an idea. From this idea a hypothesis emerges -- for example, modifying the layout of a web form will improve conversion, or improving the site's p99 latency will result in more revenue -- and we can extract appropriate metrics for observation -- conversion and page load latency in our example.
Once we have agreed our hypothesis and metrics we can then begin to write code and package this ready for deployment on Kubernetes. We have created the open source Forge framework to assist with the entire development process, from automatically creating and managing boilerplate Kubernetes configuration, to allowing us to parameterise runtime properties and resources that can facilitate deploying applications to Kubernetes with a single CLI instruction.
If we are working on a hypothesis that requires "exploration" -- for example, refactoring existing functionality, or solving a technical integration issue -- we often whiteboard ideas and begin coding using techniques like Test-Driven Development (TDD), taking care to design observability (business metrics, monitoring and logging etc) in as we go. If we are working on a hypothesis that requires "experimentation" -- for example, a new business feature -- we typically define Behaviour-Driven Development (BDD)-style tests in order to help keep us focused towards building functionality "outside-in".
We attempt to develop within environments that are as production-like as possible, and so frequently we build local services that interact with a more-complete remote Kubernetes cluster deployment. We have created the open source tool Telepresence that allows us to execute and debug a local service that acts as if it is part of the remote environment (effectively two-way proxying from our local development machine to the remote Kubernetes cluster).
We like to "release early, and release often", and so favour running tests in production using canary and dark launches. This way we can expose new functionality to small amounts of real users, and observe their behaviour in relation to our hypothesis. As with any deployment to a production environment, there is obvious a certain amount of risk, and we mitigate this by introducing alerting and automated rollbacks when serious issues are detected. We have created the open source API gateway Ambassador for this purpose.
Ambassador is built using the popular Envoy proxy that emerged from the work by Matt Klein and his team at Lyft. Ambassador allows "smart routing" of traffic when deploying applications to Kubernetes, and the underlying technology has been proven to operate at-scale within Lyft. Once an application is receiving production traffic we can observe metrics based on our earlier hypothesis. We typically use Prometheus to collect data and Grafana to display the results via dashboards. We've created the open source prometheus-ambassador project to enable the easy export of metrics from Ambassador, for example latency and the number of 5xx HTTP responses codes returned.
Once we have analysed our metrics the development cycle can begin again, either iterating on our existing solution and running additional canary experiments, or if we have proven (or disproven) our original hypothesis we can generate another new idea and hypothesis.