MournfulCord

Posted on Apr 28 • Edited on May 3

How a Simple Python Validator Prevents Config Outages

#python #tutorial #devops #sre

Not all outages I’ve seen were caused by exotic bugs or failing hardware. A majority were caused by simple mistakes: broken configs, malformed YAML, wrong types, or a missing field that nobody noticed before pushing.

I built this tool after hearing how a config push took down a service because someone missed a single field. It wasn’t a devastating failure, just a small oversight that slipped through, but it still cost hours to unwind. That stuck with me. A lightweight validator could prevent the kind of mistakes that quietly pile up and turn into outages. You don't always get that second chance, so catching mistakes early matters.

And the thing is, you don’t always need a full schema system to avoid that, either. Sometimes a small, focused validator is enough to catch obvious problems before they become real ones.

This post walks through a Python tool that: loads YAML or JSON configs, checks for required keys and types, and prints clear, actionable errors.

If you work with configs at all, this is one more tool you can keep in your belt.

Why a config validator matters

Configs break more often than hardware. A single wrong indent, a missing key, or a string where an integer should be is all that it takes.

A lightweight validator gives you:

Fast feedback: Catch errors before they hit CI.
Consistent checks: Ensure every config follows the same rules.
Fewer surprises: Reduce the chance of deployment-day outages.

It’s not meant to replace full schema validation, but to prevent the obvious stuff from becoming outages. The idea behind the tool is simple: load the config, check the structure, and fail fast if something looks wrong.

Loading YAML or JSON in Python

Here’s the loader function:

import json
from pathlib import Path
import yaml  # pip install pyyaml

def load_config(path: str) -> dict:
    file = Path(path)

    if not file.exists():
        raise FileNotFoundError(f"Config file not found: {path}")

    text = file.read_text(encoding="utf-8")

    if file.suffix in (".yaml", ".yml"):
        return yaml.safe_load(text)
    elif file.suffix == ".json":
        return json.loads(text)
    else:
        raise ValueError(f"Unsupported file type: {file.suffix} (use YAML or JSON)")

Defining required keys and expected types

This is the simplest form of validation: “Does this key exist, and is it the right type?”

REQUIRED_KEYS = {
"service_name": str,
"port": int,
"debug": bool,
"allowed_hosts": list,
}

Validating the config

This checks for missing keys and wrong types:

from typing import List

def validate_config(cfg: dict) -> List[str]:
    errors: List[str] = []

    for key, expected_type in REQUIRED_KEYS.items():
        if key not in cfg:
            errors.append(f"Missing required key: '{key}'")
            continue

        value = cfg[key]
        if not isinstance(value, expected_type):
            errors.append(
                f"Invalid type for '{key}': expected {expected_type.__name__}, "
                f"got {type(value).__name__}"
            )

    return errors

Example: good vs. bad config

Good YAML:

service_name: api-gateway
port: 8080
debug: false
allowed_hosts:
- "example.com"
- "api.example.com"

Bad YAML:

service_name: api-gateway
port: "8080"   # wrong type
debug: "false" # wrong type
# allowed_hosts missing entirely

Output

Invalid type for 'port': expected int, got str
Invalid type for 'debug': expected bool, got str
Missing required key: 'allowed_hosts'

Hostname regex validation

import re

HOST_REGEX = re.compile(r"^[a-zA-Z0-9.-]+$")

def validate_hosts(cfg: dict, errors: List[str]) -> None:
    hosts = cfg.get("allowed_hosts")
    if not isinstance(hosts, list):
        return

    invalid = [
        h for h in hosts
        if not isinstance(h, str) or not HOST_REGEX.match(h)
    ]

    for host in invalid:
        errors.append(f"Invalid host value: '{host}'")

Nested key validation (for example: database config)

def validate_database(cfg: dict, errors: List[str]) -> None:
    db = cfg.get("database")
    if db is None:
        return

    if not isinstance(db, dict):
        errors.append("Invalid type for 'database': expected dict")
        return

    if "host" not in db:
        errors.append("Missing 'database.host'")

    if "port" not in db:
        errors.append("Missing 'database.port'")
    elif not isinstance(db["port"], int):
        errors.append(
            f"Invalid type for 'database.port': expected int, got {type(db['port']).__name__}"
        )

Run it with:

python validator.py config.yaml

Why this helps in real environments

In the field, the smallest mistakes cause the biggest (and unnecessary) headaches. A missing key, a wrong type, a bad indent, none of it looks serious until it takes down something that people rely on.

Tools like this don’t replace experience, but they make your experience count. They catch the boring failures before they turn into outages that take up your entire day.

What’s next

In a follow‑up post, I’ll show how I turned the same idea into a small Java CLI for environments where Python isn’t available or where teams prefer a single binary.

If you’ve got ideas for other checks you’d want to see in a validator, I’d love to hear them. The best tools come from real problems.

Top comments (2)

Rahul Joshi • May 1

Config-as-Code is only as strong as its validation layer; using a Python validator to catch schema drifts before they hit production is a textbook example of a proactive 'Shift Left' strategy. I especially appreciate how you’ve highlighted that preventing a single outage through automated checks is worth more than hours of reactive debugging!.

MournfulCord • May 1

Thanks, I appreciate that. Shift‑Left only works when the validation layer is treated as part of the system, not an afterthought. Catching drift early has saved me more than once, so I’m glad that came through in the post!