Motivation
2 years ago I have switched from software development to DevOps in a Danish high-end technology company. The team, that I had joined, consisted of 5 very talented engineers, who delivered build agents, pipelines, automation scripts etc. to the whole R&D of more than 400 hardware and software engineers. Joining this talented and ambitious DevOps team back then, it seemed that there is no limit to what we can achieve. We could see so many new good solutions out there to ease our customers` day-to-day hurdles. Unfortunately, quite fast we had noticed, that adding more technologies to the stack made us vulnerable to bus-factor and stretched us quite thin. In addition, we had no consensus of where to keep secrets that made it hard managing them, shared pipelines implemented by us, were hard to debug/run on a dev machine, the number of self hosted solutions grew rapidly requiring more and more time to attend them... So we had started working on development guidelines for the team, which we called Katas back then.
Development guidelines
Here are some of the guidelines that we would like to share with you. The idea with these guidelines is to minimize the impact on the organization and deliver high quality tools and infrastructure while keeping a high pace for it. Please, leave a comment, if you have similar experiences or more to add.
The development guidelines are not listed in a prioritized order.
Programming languages
Limit the number of programming languages
It is so fun to solve a specific problem with the best tool there is. It is not fun at all to maintain solutions that use technologies you do not know. In the start we used Groovy, NodeJS, Powershell(Core), DotNet(Core), Bash, Java, Python. Unfortunately, we have some solutions now that we have stopped maintaining and reject any new feature requests, because we do not have the right competencies and no time to acquire them.
How to choose programming languages
For us the deciding factor in this case is our approach to inner source all what we develop. We want to use the same programming languages as other engineers in the company. All our scripts, tools etc are open to contribute for everyone inside the company. We want people to help us out instead of being blocked by us. And that works quite well! :)
Design principles
Always prefer a cloud-based solution
It is so easy to download a docker image, create a VM for it, deploy and use it right away. Especially if a solution is free of charge. No budget negotiations, no approvals needed. Unfortunately, the maintenance of such VM/docker combo should not be underestimated. Your users will ask for new versions of the SW, your IT department will require to comply with the company's security policy. And it will happen when you have the least time for it.
CI/CD must be able to push/release/execute what you are building, not just locally.
How many times have I heard: it works on my computer so I am done, am I? NO. You are done, when you have a pipeline for your project.
Design so that local builds and CI builds work the same way
This one is tricky. The more patches/hack you will add to the pipeline to compensate a faulty tooling, the more time will you have to use when debugging the local builds that do not work.
The image above shows a build stage of a DotNet project in a Jenkins pipeline. A developer will have to use time and effort before she configures local IDE to work the same way as the pipeline.
All in all, you want to develop your pipelines/tooling so that it is easy to setup in the IDE. In the example above a simplification and integration in dotnet build
would be a desired solution.
Deployment strategies
This section is mostly relevant to the organizations, that are larger or has a similar size, that I work for. While updating 2 build agents can also be challenging, still the impact of updating 40+ build agents has a much larger impact. In the latter case, the experience suggests to use Canary deployment strategy, when introducing new changes to the VM templates for your build agents. Normally, you do not know how people are using the build machines you provide, what kind of projects they are building. Introducing a change slowly minimizes the impact to the organization.
Open source vs Inner source
Choose to contribute to an open source project instead of developing your own custom solution. Maintenance is a killer. Your team will have to learn your solution. Normally a community of an open source project is much larger than your team. Naturally, there are several risks (worth considering!) associated with this approach:
- Your pull requests can get stuck for a year or longer
- It can take longer time to develop a contribution than a local hack. But you will get your code reviewed from someone knowledgeable in the field and you will learn something - no doubt!
Managing secrets
Do not distribute your secrets across build systems, environment variables on the deployment machines and other places. It is a mess! Github does not even let you read the secret back, once it is put there. Keep the secrets in one (or several) place and manage access to them.
Monitoring and telemetry
Working without monitoring and telemetry is like walking in the mine field in the dark. You want to stay ahead of the issues, such as servers running out of the disk space, before they hit the organization. In addition, you want to spend the time and energy on the tools and pipelines, that are being used most. Rich telemetry and monitoring data can help you to take decisions, where to put your focus most.
Acknowledgements
I would like to thank Rasmus Jelsgaard and Vaida Laganeckiene for inputs, ideas when materializing development guidelines and proof-reading when working on this post.
Top comments (0)