DEV Community

Cover image for The CTO DevOps Handbook: Simple Principles and Examples
Michael Zion for MeteorOps

Posted on • Edited on • Originally published at meteorops.com

The CTO DevOps Handbook: Simple Principles and Examples

Nail the DevOps part as your company's CTO


The goal of this handbook is to give you clarity on DevOps:

  1. Understand what’s DevOps (in simple words)
  2. Know what’s possible with DevOps (in simple goals)
  3. Get simple “when-to-do-what” DevOps guidelines ‍

I added a bonus at the bottom of the article.
It's a production-ready setup example you could take inspiration from.

Who this article is for

You might be a founder who wishes to get started with DevOps the right way.

You might be a CTO of a 1,000 employees company who wishes to get simple principles.

Or, maybe you’re a Software Engineer, and you want to understand if your company’s DevOps approach is good.

If you’re looking for a simple DevOps playbook, this is it.

Understand the desired result

Two things your company needs to be able to do

  1. Serve its product to customers
  2. Build and improve the product

Abilities you need to build, improve, and serve software

  1. Run experiments and test changes

DevOps has a simple meaning

Developers and Operators have shared responsibility for building and improving the system.

In practice:

  1. Developers are responsible to “Operate”
  2. DevOps Engineers are responsible to enable to “Operate” AND do some of it themselves

Operate = provision, monitor, secure, configure, deploy, scale.

Choose a balance: Enabler, Doer, or Automator

The DevOps role will end up as a balance between:

  1. Enabler: Provides the tools and knowledge to fulfill the DevOps goals
  2. Doer: Does the tasks that fulfill the DevOps goals
  3. Automator: Automates any repeating operation

Know what things you should enable, do, or automate

  • Provision infrastructure
  • Secure the system
  • Deploy workloads
  • Monitor the system
  • Recover from issues
  • Scale up or down
  • Track & test changes
  • Automate processes

Choose the right tools

  • Has state management = Saves time automating state-aware processes (e.g., Terraform)
  • Has a big community & good docs = Saves time dealing with common issues (e.g., Kubernetes)
  • Has multiple interface types: API, CLI, UI = Saves time integrating with the existing system (e.g., Vault)

You can also read about choosing tools here.

Set useful goals

There are DevOps goals that adopting them will focus you on the right direction:

  1. One-Click Environments: makes e2e tests easy and quick
  2. Atomic Commits: provides confidence that a tested change will work in production
  3. Separate the Shared & Env-Specific Parts: enables e2e tests as the company scales up

If you want to learn about more useful DevOps goals, feel free to book a free consultation here.

Enablers: Choose the Tools-to-Knowledge Balance

Developers can either have the knowledge or the tools to do something.

  • More knowledge-reliance: if you want the developers to contribute to the DevOps efforts
  • More tools-reliance: if you want to abstract the operations from the developers

If the balance between the two is not intentional, it’s accidental.

Doers: Have a good reason to do it

  1. Is it a one-time task?
  2. Does it teach you how the developers work?
  3. Are you directly accountable for the results of the task?

If you answered “no” to the above questions, enable or automate it instead.

Doing more = Learning the system's use-cases

Doing too much = Not scalable, too-much knowledge-reliance

Automators: Have a good reason to automate it

  1. Did it happen before?
  2. Is it likely to happen again?
  3. Will automating it take less time than doing it?
  4. Will automating it teach you an important company process?

If you answered “yes” to 2 out of the 4 questions - automate it!

More automations = Less reliance on knowledge to operate the system.

Too much automations = No system awareness.

P.S. - you can also enable developers to automate it.

Create available DevOps Capacity

The DevOps needs of a company have spikes.

One month you need 2 DevOps Engineers, and half of that the next month.

Switchovers between big efforts and small tasks are common.

This is true, especially for new companies.

Break the assumption: “DevOps tasks must be done by a DevOps Engineer”.

There are 3 types of DevOps capacity

  1. Non-Flexible: A full-time DevOps Engineer on the team
  2. Semi-Flexible: Key developers that can contribute to the DevOps goals
  3. Fully-Flexible: A flexible DevOps Services company or freelancer

You can read more about calculating the DevOps capacity your company needs here.

When to focus on what: Common Dilemmas

When: You work alone, and the system is simple

Focus: On simplifying the development - Dockerize your apps, Create a post-commit pipeline that runs tests

When: You need to be able to create new environments quickly (for development, or for clients)

Focus: On implementing “One-Click Environments”: Using IaC (e.g., Terraform) + Deployment tool (Depends on the platform).

When: You want to e2e test every code modification, but there are many code modifications

‍> Focus: On splitting the “One-Click Env” into a “base” with shared resources, and “env” with env-specific resources

When: You want to unify & standardize how you deploy, monitor, scale, configure, and secure your workloads

Focus: On implementing an orchestrator such as Kubernetes

When: You want you have many moving parts and wish to be certain a tested change will work

Focus: On implementing GitOps and consider a Monorepo (the sooner the better)

When: You want the DevOps efforts to be done by the dev team

Focus: On using “actual” IaC tools (Pulumi Typescript/Python), Full “how to operate” (see above) documentation‍

Never: Invest lots of time in new tech without a strong reason

Always:

  • Have your code in Git
  • Monitor the basic stuff: CPU, Memory, Disk, Network, App Logs, Cloud Costs
  • Architect for high-availability
  • Test before you deploy

BONUS: An example setup for a CTO approaching Production

Image description

2 AWS Accounts

  • One for development and staging
  • Another for production

Monorepo in Github

  • Docker-Compose for local development

2 Infrastructure-as-Code projects: 'base' & 'apps'

  • base = shared resources (e.g., VPC, RDS, ECS Cluster, EKS Cluster)
  • apps = env-specific resources (e.g., Lambda Functions, ECS Services, Kubernetes Namespaces)
  • config file per environment

Github Actions Workflow: Development workflow

  • Checkout branch and locally develop + test changes
  • Create a Pull Request: Deploys a Pull-Request ‘apps’ environment on the ‘development’ environment ‘base’
  • On merge to main: Deploys from the ‘main’ branch an ‘apps’ environment onto the ‘development’ environment ‘base’
  • Manual: Deploy from the ‘main’ branch onto the ‘staging’ / ‘production’ environment ‘base’

Setup Notes

  • Avoid mentioning an environmnent's name in the code for conditional resources deployment
  • Use each environment’s config file to declare if a resource should be created
  • Could be implemented using Terraform, Terragrunt, Pulumi, CDK, and other IaC tools
  • Production should have 2-instances of every workload for high-availability

If you’d like to see this setup in your startup, click here to book a call 👈🏼

P.S. - I'll be updating this page occasionally, so you might want to visit again


Another Bonus: DevOps Dictionary for Human Beings

Term Definition Tools
Environment A working instance of the entire system
CI (Continuous Integration) Enable developers to collaborate by agreeing on a single source-of-truth (master/main) Jenkins, Github Actions, GitlabCI
CD (Continuous Delivery) Create an artifact that’s ready for production (tested, tagged) JFrog Artifactory, Nexus, AWS ECR
CD (Continuous Deployment) Every available deliverable (artifact) gets deployed automatically ArgoCD, Jenkins, AWS CodeDeploy
Monitoring / Observability Collect metrics/traces/logs from apps and infrastructure, analyze them, and display them, and setup alerts Prometheus, Jaeger, Elasticsearch, Fluentd, OpenTelemetry
Infrastructure The resources on which the workloads run, in which the data is stored, and through which the network flows Servers, Databases, Network Routers & Switches
Cloud Infrastructure Same as the above, but specifically in the cloud AWS EC2, AWS RDS, GCP Compute Engine, Azure Virtual Machines
Cloud Computing & Data services served from remote locations for you to build your system AWS, Azure, GCP
Containerization & Virtualization Technologies utilizing Kernel & OS features to create virtual machines, or isolate process (AKA run containers) Docker, vSphere, KVM
Secrets Management Storing and retrieving sensitive configurations (e.g., tokens, passwords) Hashicorp Vault, AWS Secrets Manager, SealedSecrets
Configuration Management Usually refers to preparing servers for workloads (e.g., creating directories & files, starting processes) Ansible, Chef, Puppet
Version Control Saving the code in a versioned way (Git) Github, Gitlab
GitOps Making the system is the same as it’s described in Git Flux, ArgoCD, Jenkins
Monorepo All of the company’s code is in one Git Repository NX, Turborepo
Polyrepo Multiple Git repositories for different components
IaC (Infrastructure-as-Code) Creating Cloud infrastructure with idempotent code and state management Terraform, Pulumi, CDK, Crossplane
Deployment Execute, serve, or install the artifacts ArgoCD, Jenkins, AWS CodeDeploy, Scripts (Bash, Python, etc.)
Orchestrator Dynamically allocating workloads to a pool of nodes Kubernetes, Nomad, AWS ECS
Authentication & Authorization Making sure each person, workload, or resource, has access only to what’s necessary (other workloads and resources) AWS IAM, OpenID, OpenVPN, Twingate, Istio
Service Discovery Exposing available workloads using DNS Consul, CoreDNS

Get more practical advice

I post small nuggets of practical advice on the "MeteorOps Newsletter".
You can subscribe here 👈🏼

Top comments (6)

Collapse
 
sreejinsreenivasan profile image
sreejinsreenivasan

Great article, what hat you wear professionally? CTO or DevOps?

Collapse
 
michaelzion profile image
Michael Zion

Thank you, and I'm wearing a DevOps hat!
The background is that I spoke to many CTOs and found myself repeating some advice to most of them.
Figured I'd compile it into a handbook :)

Collapse
 
silent_mobius profile image
Alex M. Schapelle

Great article @michaelzion !

Collapse
 
michaelzion profile image
Michael Zion

Thank you Alex!

Collapse
 
nevodavid profile image
Nevo David

Great article Michael!
Thank you for posting!

Collapse
 
michaelzion profile image
Michael Zion

Thank you for the feedback!