Let's discuss an extremely common anti-pattern I've noticed with teams that are relatively new to containers/cloud-native/kubernetes, etc. More so than when building traditional monoliths, cloud-native applications can be incredibly complex and, as a result, need a relatively sophisticated development environment. Unfortunately, this need often isn't evident at the beginning of the cloud-native journey. Development environments are an afterthought – a cumbersome, heavy, brittle drag on productivity.
The best teams treat development environments as a priority and devote significant DevOps/SRE time to perfecting them. In doing so, they end up with development environments that "just work" for every developer, not just those who are experienced with containers and Kubernetes. For these teams, every developer has a fast, easy-to-use development environment that works for every developer every time.
Before we go further, let's get on the same page about what we mean by a development environment in this context. When working with cloud-native applications, each service depends on numerous containers, serverless functions, and cloud services to operate. For this post, a development environment is a sandbox in which developers can run their code and dependencies for testing. It's not the IDE, compiler, debugger, or any of those other tools.
You're working on a new project or planning to modernize an old one. The team has read all about the whiz-bang nifty new cloud-native technologies, like containers, Kubernetes, etc. So, you decide to take the plunge and build a cloud-native app.
The team realizes that a core group of DevOps/SREs will be necessary to get everything running in a scalable, reliable, and automated setup. Site reliability engineers are hired/trained and get to work. They setup up Kubernetes, CI/CD, monitoring, logging, and all of the other tools we've learned are critical for a modern application.
Everyone knows that it's the DevOps/SRE team's job to get all of this stuff up and running. However, development environments aren't top of mind. The site reliability team considers it their duty to focus on production and CI/CD – Development is the developer's job. At the same time, the developers think it's their job to deliver application features, not to maintain infrastructure. It's not really anyone's responsibility to focus on developer experience before CI/CD, so it's neglected.
Unfortunately, an ad hoc approach to development environments tends to emerge. Whenever there's a new service, whatever developer happens to be working on it, realizes they need some way to boot their dependencies and test their code. They Google around and figure that Docker Compose is a reasonable way to do this. They copy and paste some example, tweak through trial and error until it's working, and move on. The quality fo this initial compose file ranges widely depending on the DevOps knowledge of the engineer who happened to write it. Sometimes it's pretty solid; sometimes, it's brittle and slow.
Worse, this process repeats. Every time there's a new service, it gets a new git repository, and some new engineer finds themselves writing a compose file. Perhaps this new file is copied from an existing project. Perhaps it's developed from scratch. Either way, now we have two compose files that need to be maintained and updated as the app changes over time. This process repeats and repeats until all services have their own ever so slightly different configuration files that are a nightmare to maintain.
As a result of this (all too common) process. We see several typical issues:
- Development environments are unmanageable. They spread across dozens of repositories in dozens of subtly different copy-and-pasted docker-compose files. Keeping these up to date in a fast-changing application is impossible.
- Development environments are incomplete. They only deal with containers because they are the easiest for an individual developer to get up and running with docker-compose. Everything else developers need to test (serverless functions, databases, specialty cloud services) requires manual effort.
- Developers waste time focusing on things that aren't their specialization. Just as most backend engineers can't CSS their way out of a paper bag, there's no reason for every frontend/AI/data engineer to be experts on the current DevOps trends. Developers shouldn't spend time configuring and debugging development environments — they should spend time building features.
So how do we avoid this all-too-common scenario? The good news is that it's not particularly challenging to do so if you're intentional and proactive. The best teams tend to follow a couple of principles to ensure a great experience.
There's a team that is explicitly responsible for providing development environments for all developers. That team can be the DevOps/SRE team, or a dedicated developer productivity team. The key is that it's someone's job to focus on this issue. Furthermore, that person is likely someone with a large amount of DevOps expertise that will produce better outcomes more efficiently.
The development environment must be managed centrally by the site reliability team responsible for it. A single git repository contains all of the configuration and scripts necessary for a developer to get going. When the site reliability team changes something, they do so once in that central repository, and all developers benefit. Furthermore, typically, the development environments run in a centrally managed cluster in the cloud. As a result, it's easy for the site reliability team to ensure things work consistently for everyone, and debug problems when they do arise.
Their development environments are fully automated. A single command brings up everything a developer needs to test their code. Developers don't need to do nearly any manual setup work beyond the code changes they're actively working on.
Achieving these goals isn't easy. It requires a significant and sustained investment from the site reliability team, and buy-in from developers and management to succeed. However, while the cost can be significant, it's small relative to the wasted time and effort saved by giving every developer a fast environment that just works every time. At Kelda, we're working hard to make this dream attainable for every developer.
Try Blimp to see how you can improve development speed
Read more about Docker internals -- see how registry credentials are stored.
By: Ethan Jackson