James Lee

Posted on May 17

CMDB Design: From Business-Centric Modeling to Intelligent Ops

#architecture #devops #infrastructure #systemdesign

The Biggest CMDB Design Mistake

The most common CMDB anti-pattern is trying to build a massive, all-encompassing attribute table — attempting to capture every attribute of every ops object upfront.

This approach fails because it starts from scattered ops objects rather than from the business. The result: enormous effort, poor adoption, and a system nobody trusts.

The right approach: start from business relationships, not from attributes.

The Three-Layer Business Model

A well-designed CMDB is built around three core layers (top-down):

┌─────────────────────────────────────────────┐
│              Business / Project             │  ← e.g. "Payment Service"
├─────────────────────────────────────────────┤
│              Cluster / Zone                 │  ← e.g. "Production Cluster A"
├─────────────────────────────────────────────┤
│              Module / Service               │  ← e.g. "Order API", "Auth Service"
├─────────────────────────────────────────────┤
│              Machines / Hosts               │  ← physical or virtual servers
├─────────────────────────────────────────────┤
│              Attributes & Relations         │  ← properties, dependencies
└─────────────────────────────────────────────┘

In the gaming industry, these layers are typically called: Project → Zone → Service.

The right design question to ask:

"I operate a business. What clusters does it have? What modules are in each cluster? What machines run each module? What attributes do those machines have? How do those attributes relate to each other?"

Build your CMDB by answering these questions — not by collecting every possible attribute from scratch.

Configuration Items (CIs)

A Configuration Item (CI) is any managed object in the CMDB — a host, a domain name, an IP address, a service, etc.

Three Ways to Populate CI Attributes

Using a host as an example:

┌──────────────────────────────────────────────────────────────┐
│  CI Attribute Sources                                        │
│                                                              │
│  1. Agent Auto-Discovery                                     │
│     cpu, memory, disk, network interfaces                    │
│     Tool: Python psutil, puppet facts, ansible setup         │
│                                                              │
│  2. External System APIs                                     │
│     EIP, Region, Zone (cloud providers)                      │
│     Sources: AWS/Aliyun API, Zabbix API, K8s API             │
│              other business system APIs                      │
│                                                              │
│  3. Manual Entry                                             │
│     Business-Cluster-Module relationships                    │
│     Which services run on which hosts                        │
│     Goal: minimize manual fields as much as possible         │
└──────────────────────────────────────────────────────────────┘

Source	Examples	Automation Level
Agent	CPU, memory, disk, NICs	Fully automatic
Cloud/External API	EIP, Region, Zone, K8s pods	Automatic via API
Manual Entry	Business relationships, service assignments	Human input

Three Evolutionary Stages of CMDB

Stage 1 — CMDB 1.0: The Pain Points

Early CMDB systems (e.g. based on oneCMDB) used a CI model with key-value storage. This worked for small-scale infrastructure but broke down as business grew.

Core problems:

Pain Point	Description
Incomplete model	Config scope and coverage were insufficient; relationships and attributes were poorly defined
High maintenance cost	No lifecycle management; data relied entirely on manual updates; always lagged behind reality
Poor data quality	No validation rules or sync checks; ops teams stopped trusting the data

CMDB 1.0 failure pattern:

Manual updates ──▶ data lag ──▶ ops teams distrust data
                                      │
                                      ▼
                            teams maintain shadow spreadsheets
                                      │
                                      ▼
                            CMDB becomes irrelevant

Stage 2 — CMDB 2.0: Application-Centric Design

Starting from 2016, the focus shifted to building an application-centric CMDB with full lifecycle management and flexible API integration.

Two core principles:

① Application-centric modeling

Three-layer model:

┌──────────────────────────────────────┐
│  Application Layer                   │  ← services, APIs, business apps
├──────────────────────────────────────┤
│  Logical Layer                       │  ← clusters, modules, deployments
├──────────────────────────────────────┤
│  Physical Layer                      │  ← hosts, networks, storage
└──────────────────────────────────────┘

This layered model enables ops teams to quickly query the full resource topology for any application — critical for change management and incident diagnosis.

② Flexibility and extensibility

CMDB 2.0 introduced six key capabilities:

Capability	Description
Dynamic model extension	Define CIs, attributes, relationships, data types, and uniqueness constraints online — no code changes
Multi-dimensional queries	Custom multi-field combined queries; full-text search across all CIs
Dynamic API generation	Define and test REST API endpoints online; no deployment required
Fine-grained permissions	Row-level and column-level data access control
Audit logs	Full history of all data changes across the platform
Version baseline & rollback	Compare and roll back both model versions and data versions

Stage 3 — CMDB 3.0: Microservices Architecture

As more systems depended on CMDB 2.0, the monolithic architecture became a liability. A single component failure (rules engine, audit, reporting, API) could bring down the entire platform.

CMDB 3.0 solution: microservices decomposition

CMDB 2.0 (monolith):
┌──────────────────────────────────────────┐
│  Rules │ Audit │ Reports │ API │ Web UI  │  ← one failure = full outage
└──────────────────────────────────────────┘

CMDB 3.0 (microservices):
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│  Rules   │ │  Audit   │ │ Reports  │ │   API    │ │  Web UI  │
│ Service  │ │ Service  │ │ Service  │ │ Gateway  │ │  (Vue)   │
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
     │            │            │            │            │
     └────────────┴────────────┴────────────┴────────────┘
                            Dubbo RPC

Key upgrades:

Each functional module isolated as an independent microservice
Dubbo framework for service governance
Vue.js frontend — improved UX, better team collaboration, lower development risk

Ensuring Data Accuracy

Data accuracy is the lifeblood of CMDB. A CMDB with stale or incorrect data is worse than no CMDB — it actively misleads ops teams.

Strategy 1: Lifecycle Management + Automated Workflows

Every CI has a defined lifecycle:

CI Lifecycle:

[Provisioning] ──▶ [In Service] ──▶ [Maintenance] ──▶ [Decommission]
      │                 │                  │                  │
      ▼                 ▼                  ▼                  ▼
 ITSM flow        auto-update         change flow        cleanup flow
 creates CI       via agent/API       updates attrs       removes CI

ITSM process automation drives state transitions — data updates happen as a side effect of normal ops workflows, not as separate manual tasks.

Strategy 2: Drive Data Consumption

CMDB data stays accurate when it's actively used:

CMDB ──▶ ITSM (200+ integrated workflows)
     ──▶ Monitoring Platform (topology-aware alerting)
     ──▶ Capacity Management (resource utilization reports)
     ──▶ Release Platform (deployment targeting)
     ──▶ Intelligent Ops Platform (root cause analysis)

Like a pond: water stays clean only when it flows. CMDB data stays accurate only when it's actively consumed.

The more systems that read from and write to CMDB, the more incentive teams have to keep it accurate.

Strategy 3: Validation, Sync Checks & Audits

Data quality assurance:

┌──────────────────────────────────────────────────────┐
│  Rule Validation                                     │
│  ← logical consistency checks on write               │
├──────────────────────────────────────────────────────┤
│  Cross-system Sync Comparison                        │
│  ← compare CMDB data vs actual system state          │
│    (agent reports, cloud API, monitoring data)       │
├──────────────────────────────────────────────────────┤
│  Manual Sampling Audit                               │
│  ← periodic human review of random CI samples        │
└──────────────────────────────────────────────────────┘

CMDB as the Foundation of Intelligent Ops

A mature CMDB becomes the central nervous system of the entire ops platform:

┌──────────────────────────────────────────────────────────────┐
│                         CMDB                                 │
│              (source of truth for all ops data)              │
└──────┬───────────────┬───────────────┬───────────────┬───────┘
       │               │               │               │
       ▼               ▼               ▼               ▼
┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐
│   ITSM     │  │ Monitoring │  │  Capacity  │  │  Release   │
│ (200+ flows│  │ (topology- │  │ Management │  │  Platform  │
│  use CMDB) │  │  aware     │  │ (cost/util)│  │(targeting) │
└────────────┘  └────────────┘  └────────────┘  └────────────┘

Three value dimensions:

Dimension	How CMDB Contributes
Business process driver	All resource requests, deployments, and change workflows are driven by CMDB relationships
Data-driven operations	Capacity planning, cost accounting, and business analysis are powered by CMDB data
Intelligent ops enablement	End-to-end topology views enable automated root cause analysis and faster incident recovery

DEV Community