James Lee

Posted on May 17

DevOps Implementation Checklist: From Code Management to CI Culture

#git #automation #devops #cicd

1. Code Management & Branching Strategy

Before setting up any CI pipeline, answer these foundational questions:

Question	Options / Considerations
Where is code hosted?	GitHub / GitLab / Bitbucket / self-hosted
VCS choice	Git (recommended) vs SVN
Branching model	Git Flow / GitHub Flow / Trunk-Based Development
CI access to repo	Does your CI server have read access? Webhook configured?
Repo structure	Monorepo vs multi-repo — impacts pipeline complexity
Version numbering	Semantic versioning (`MAJOR.MINOR.PATCH`) — define upfront
Dependency management	npm / pip / Maven / Go modules — naming conventions?
Code Review	PR-based review required before merge to main?

Branching Model Comparison

Git Flow (release-heavy teams):
main ──────────────────────────────────────▶
      \──── develop ────────────────────────▶
                \── feature/x ──/
                \── feature/y ──/
                      \── release/1.0 ──/
                                  \── hotfix ──/

GitHub Flow (continuous delivery teams):
main ──────────────────────────────────────▶
      \── feature/x ──PR──/
      \── feature/y ──PR──/

Trunk-Based (high-frequency CI teams):
main ──────────────────────────────────────▶
      \─ short-lived branch (< 1 day) ─PR─/

Rule of thumb: the longer a branch lives without merging, the higher the conflict risk and the wider the conflict scope.

2. CI Server Setup

Choosing a CI Server

Tool	Notes
Jenkins	Most widely adopted; rich plugin ecosystem; self-hosted
GitLab CI	Tightly integrated with GitLab; YAML-based pipelines
GitHub Actions	Native to GitHub; excellent for open source projects
TeamCity	JetBrains product; strong .NET support
GoCD	Pipeline-first design; good for complex workflows
Azure DevOps	Microsoft ecosystem; strong enterprise support

Setup Checklist

CI Server Setup:
├── Resource planning
│     ├── How many concurrent jobs needed?
│     ├── CPU/memory per job (build + test)
│     └── Agent/runner count
├── Pipeline design
│     ├── Standardized pipeline template
│     ├── Parameterized builds (branch, env, version)
│     └── Multi-branch / multi-repo support
├── Integrations
│     ├── Code repository (webhook on push/PR)
│     ├── Artifact repository (push built packages)
│     └── Notification (Slack / email on failure)
└── Code quality
      └── SonarQube integration (static analysis)

Standard Pipeline Structure

Code Push / PR
     │
     ▼
┌─────────────────────────────────────────────────────┐
│  Stage 1: Checkout + Dependency Install             │
├─────────────────────────────────────────────────────┤
│  Stage 2: Static Code Analysis (SonarQube / lint)   │
├─────────────────────────────────────────────────────┤
│  Stage 3: Build / Compile                           │
├─────────────────────────────────────────────────────┤
│  Stage 4: Unit Tests + Coverage Report              │
├─────────────────────────────────────────────────────┤
│  Stage 5: Package Artifact                          │
├─────────────────────────────────────────────────────┤
│  Stage 6: Push to Artifact Repository               │
└─────────────────────────────────────────────────────┘

3. Artifact Repository & Build-Once Principle

The Build-Once Principle

A core DevOps practice: compile once, deploy everywhere.

❌ Without Build-Once:
Dev env:   source → build → artifact A → deploy
Test env:  source → build → artifact B → deploy
Prod env:  source → build → artifact C → deploy
Problem:   "It worked in test!" — because artifact B ≠ artifact C

✅ With Build-Once:
CI:        source → build → artifact (+ MD5) → artifact store
Test env:  artifact store → verify MD5 → deploy
UAT env:   artifact store → verify MD5 → deploy
Prod env:  artifact store → verify MD5 → deploy
Guarantee: every environment runs the exact same binary

Two key benefits:

Saves build time — large codebases can take minutes to compile; rebuilding per environment wastes CI resources
Guarantees consistency — what you tested is exactly what you ship to production

Artifact Promotion Flow (JFrog Artifactory)

Build completes
     │
     ▼
Upload to Artifactory
+ attach metadata:
  ├── repo / branch
  ├── version (semver)
  └── CommitId
     │
     ▼
Deploy to SIT ──▶ tests pass ──▶ tag: sit-passed
     │
     ▼
Deploy to UAT ──▶ tests pass ──▶ tag: uat-passed
     │
     ▼
Deploy to Prod ──▶ tag: prod-released

Tags act as promotion gates — a stage only proceeds if the required tag is present on the artifact. No tag = no promotion.

Choosing an Artifact Store

Package Type	Recommended Tool
Java (JAR/WAR)	Nexus Repository / JFrog Artifactory
Node.js (npm)	Nexus / Verdaccio / GitHub Packages
.NET (NuGet)	Nexus / Azure Artifacts
Docker images	Harbor / ECR / Docker Hub
Python (pip)	Nexus / PyPI private mirror
Generic files	Nexus / MinIO / S3

Artifact Versioning Strategy

Recommended format:
{service-name}-{semver}-{git-short-sha}
e.g. order-service-1.4.2-a3f9c12

This enables:
├── Exact code → artifact traceability
├── Rollback to any previous version
└── Audit: who built what, from which commit

Retention Policy

Branch Type	Retention
Feature branch builds	Keep last 5 builds
Release candidates	Keep last 10 builds
Production releases	Keep indefinitely (or 1 year)
Tagged releases	Never auto-delete

4. Test Strategy & Automation

The Four Automation Layers

Before writing a single test, define what gets automated at each layer:

┌──────────────────────────────────────────────────────────────┐
│  Layer 4: Infrastructure                                     │
│  Prepare test data and environments                          │
│  Tools: Ansible / Chef / Puppet / Jenkins / Docker           │
├──────────────────────────────────────────────────────────────┤
│  Layer 3: UI Tests                                           │
│  End-to-end functional tests across one or more apps         │
│  Focus on critical user journeys only                        │
│  Tools: Selenium / Appium                                    │
├──────────────────────────────────────────────────────────────┤
│  Layer 2: Service / API Tests                                │
│  Interface interaction tests between services                │
│  Tools: Postman / SoapUI / REST-assured                      │
├──────────────────────────────────────────────────────────────┤
│  Layer 1: Unit Tests                                         │
│  Method / class / package level; integrated into CI          │
│  Tools: xUnit / JUnit / pytest / Jest                        │
└──────────────────────────────────────────────────────────────┘

The Test Pyramid: Effort Distribution

                  ▲
                 /UI\          10%  — few, slow, expensive
                /─────\
               / Service\      20%  — moderate, API-level
              /───────────\
             /  Unit Tests  \  70%  — many, fast, cheap ✅
            /─────────────────\

Invest 70% of testing effort in unit tests, 20% in service/API tests, and only 10% in UI tests. The pyramid shape reflects both quantity and cost — invert it and your pipeline becomes slow and brittle.

Four-Level Environment Test Matrix (L0–L3)

┌─────────────────────────────────────────────────────────────┐
│  L3  Production / Pre-prod                                  │
│      UI automation / performance / exploratory              │
│      Manual + automated    Slow, expensive                  │
├─────────────────────────────────────────────────────────────┤
│  L2  UAT / Test environment (full system + test data)       │
│      UI / performance / acceptance / API tests              │
│      Manual + automated                                     │
├─────────────────────────────────────────────────────────────┤
│  L1  Test / Debug environment (current deploy unit only)    │
│      API tests / component tests                            │
│      Manual + automated                                     │
├─────────────────────────────────────────────────────────────┤
│  L0  CI environment (no external dependencies)              │
│      Unit tests + static code analysis                      │
│      Fully automated    Fast, cheap ✅                      │
└─────────────────────────────────────────────────────────────┘

CI Pipeline Integration: Four-Stage Flow

Stage 1 — Code Commit / PR
─────────────────────────────────────────────────────────────
Developer pushes to Git or opens a PR
     │
     ▼
CI server triggers: compile → static analysis → unit tests (L0)
     │
     ▼
Results notified to developer immediately

Stage 2 — Deploy to SIT (System Integration Test)
─────────────────────────────────────────────────────────────
L0 checks pass
     │
     ▼
Verify artifact MD5 → auto-deploy to SIT environment
(all dependent services available)
     │
     ▼
Register APIs to API management platform
Run service-to-service API tests (L1)
     │
     ▼
Tests pass → tag artifact: sit-passed

Stage 3 — Deploy to UAT (User Acceptance Test)
─────────────────────────────────────────────────────────────
sit-passed tag confirmed on artifact
     │
     ▼
Verify artifact MD5 → auto-deploy to UAT environment
     │
     ▼
Automated test platform runs:
├── Requirement-linked test cases (targeted)
├── Nightly full regression suite
└── Test case + test data management via platform
     │
     ▼
Tests pass → tag artifact: uat-passed

Stage 4 — Environment Management
─────────────────────────────────────────────────────────────
SIT + UAT environments managed by Environment Deployment Platform:
├── Customized deployment per automation test requirements
├── Pre-condition checks before test execution
└── Environment health validation before each test run

Test Type Comparison

Test Type	Scope	Speed	Cost	When to Run
Unit tests	Single method/function	Fast (seconds)	Low	Every commit
Integration tests	Multiple modules/services	Medium	Medium	Every PR merge
Acceptance tests	Business scenarios end-to-end	Slow	High	Pre-release
UI tests	Full browser/mobile simulation	Very slow	Very high	Scheduled / pre-release

Code Coverage Guidelines

Target: ≥ 80% code coverage

⚠️ Important distinctions:
High coverage ≠ good tests
Coverage tools find UNTESTED code — they don't validate test quality

Practical approach:
1. Don't chase 100% from day one
2. Use coverage reports to identify critical untested paths
3. Prioritize coverage for core business logic first
4. Add tests during refactoring — before changing code, not after
5. Write tests for every bug fix — cover the exact failure scenario

The Refactoring Rule

Before refactoring:
┌──────────────────────────────────────────────┐
│  1. Write acceptance tests around the        │
│     features that will be affected           │
│  2. Run tests → all green ✅                 │
│  3. Refactor                                 │
│  4. Run tests again → still green ✅         │
│  5. Safe to merge                            │
└──────────────────────────────────────────────┘

5. Test & Deployment Environment Preparation

Environment Planning

Typical environment chain:

Dev (local) ──▶ CI ──▶ Test/Debug ──▶ UAT ──▶ Pre-prod ──▶ Production
    L0            L0       L1           L2        L3           L3

One Pipeline, Every Environment

Use the same pipeline and same deployment scripts for every environment — including production.

❌ Anti-pattern:
deploy-to-test.sh    (different script)
deploy-to-uat.sh     (different script)
deploy-to-prod.sh    (different script)
→ Scripts diverge over time → production surprises

✅ Best practice:
deploy.sh --env=test
deploy.sh --env=uat
deploy.sh --env=prod
→ One script, environment-specific config injected at runtime

Config / Script Separation

Different environments have different IPs, OS configs, and middleware settings — but that doesn't mean different scripts:

├── deploy.sh                  # single deployment script (env-agnostic)
└── config/
      ├── base.yml             # shared across all envs
      ├── test.yml             # test env: IPs, ports, DB endpoints
      ├── uat.yml              # UAT env overrides
      └── production.yml       # prod env overrides

Secrets managed separately via:
Vault / AWS Secrets Manager / K8s Secrets / .env (never committed)

Smoke Testing After Every Deploy

After deployment completes, always run a smoke test before marking the deploy successful:

Deploy completes
     │
     ▼
Smoke test script runs automatically:
├── Health check endpoint:  GET /health → 200 OK ✅
├── Core API check:         GET /api/v1/ping → expected response ✅
└── Dependency checks:
      ├── Database:  query returns (empty is OK, connection must succeed) ✅
      └── Cache:     Redis ping → PONG ✅
     │
     ▼
All checks pass → deploy marked successful → pipeline proceeds
Any check fails → deploy marked failed → alert sent → rollback triggered

Why smoke tests matter:

Without smoke test	With smoke test
Deploy "succeeds" but app is broken	Broken deploy detected immediately
Team discovers issue via user reports	Pipeline catches it before users do
Hard to tell: app bug or env issue?	Clear diagnosis: which dependency failed

Smoke tests are also called deployment verification tests — they confirm the app is alive and its dependencies are reachable, not that every feature works correctly. That's what the full test suite is for.

Deployment Automation Checklist

Item	Options
Scripting language	Python / Shell / PowerShell / Ansible
Multi-environment config	Environment-specific config files; secrets via vault
Multi-branch deployment	Branch → environment mapping (e.g. develop → test, main → prod)
Rollback mechanism	Re-deploy previous artifact version
Health check after deploy	Smoke test script before marking deploy successful

6. Building a CI Culture

Tools alone don't make CI work. The team's habits and mindset matter just as much.

Core CI Principles

① Integrate early and often

Time without merging ──▶ conflict risk grows exponentially

Day 1:  small diff, easy merge ✅
Day 5:  medium diff, some conflicts ⚠️
Day 14: large diff, painful merge, possible rework ❌

If a branch might conflict with another in-flight branch, add a feature flag or release toggle to prevent them from interfering at deploy time.

② Keep the build green at all times

Build breaks → fix immediately, not "later"

Why:
├── A broken build blocks everyone downstream
├── The longer it stays broken, the more errors accumulate
└── Developers lose trust in the CI system

Fast feedback is critical — unit tests should return results in minutes. If the full test suite is slow, run unit tests first and return results to the developer before they context-switch to another task.

③ Make build status visible

Options:
├── Dashboard / build radiator (visible to whole team)
├── Slack / Teams notifications on failure
├── Email to the committer who broke the build
└── PR status checks (block merge if CI fails)

④ Tests are part of the definition of done

Feature complete ≠ done
Feature complete + tests passing = done ✅

Benefits:
├── Reduces regression time per iteration
├── Gives confidence when merging to main
└── New features don't break existing behavior

⑤ Write tests when fixing bugs

Bug reported
     │
     ▼
Reproduce the bug → write a failing test that captures it
     │
     ▼
Fix the bug → test passes ✅
     │
     ▼
Bug can never silently reappear

CI Culture Maturity Model

Level 0: No CI — manual builds, manual testing
Level 1: CI exists — automated build on commit
Level 2: CI + tests — automated build + unit tests on every commit
Level 3: CI + full pipeline — build + test + artifact + deploy to test env
Level 4: CI/CD — automated deployment to production with confidence

DEV Community