Last month I pushed a Kubernetes deployment manifest that looked perfectly fine. The CI pipeline failed silently, the deployment never rolled out, and I spent forty minutes reading logs before I realized the problem was a single tab character in my YAML file. YAML doesn't allow tabs for indentation. Only spaces. And the error message from kubectl was completely unhelpful.
YAML is everywhere in modern development. Docker Compose, GitHub Actions, Kubernetes, Ansible, CloudFormation, Helm charts, Swagger/OpenAPI specs. If you work with infrastructure or CI/CD, you write YAML daily. And YAML has a talent for failing in ways that are hard to diagnose.
Here's why, and what to watch for.
Indentation is structure, not style
In JSON, indentation is cosmetic. You can minify an entire JSON object onto one line and it parses identically. In YAML, indentation is the syntax. It defines the hierarchy of your data. And the rules are strict.
# This is valid
server:
port: 8080
host: localhost
# This is broken (tab character before port)
server:
port: 8080
The second example uses a tab. Most editors display tabs and spaces identically at default settings. You cannot see the difference without enabling whitespace rendering. This is the single most common source of YAML parsing errors, and the reason experienced developers configure their editors to insert spaces when the tab key is pressed for any YAML file.
Your .editorconfig should include:
[*.{yml,yaml}]
indent_style = space
indent_size = 2
The boolean trap
YAML has implicit typing. It tries to infer the data type of unquoted values. This leads to a well-known class of bugs.
country: NO
This is not the string "NO". YAML interprets it as the boolean false. The same applies to yes, no, on, off, true, false, y, n, and several other values. They all get parsed as booleans.
This famously caused problems with country code lists and the Norway problem. It also bites you with values like off for configuration toggles that you expect to be strings.
The fix is quoting:
country: "NO"
enabled: "off"
If you want a string, quote it. Always. This is the YAML equivalent of defensive programming.
Multiline strings are four different syntaxes
YAML supports multiline strings, but it has four distinct ways to handle them, and they behave differently:
# Literal block (preserves newlines)
description: |
Line one
Line two
# Folded block (joins lines with spaces)
description: >
This becomes
a single line
# Literal block, strip trailing newline
description: |-
Line one
Line two
# Folded block, strip trailing newline
description: >-
Single line
no trailing newline
The pipe | preserves newlines. The angle bracket > folds lines into a single line with spaces. The minus - suffix strips the trailing newline. There's also a plus + suffix that preserves trailing newlines.
Getting these wrong means your configuration values have unexpected whitespace. I've seen this cause issues with SQL queries embedded in Ansible playbooks and shell commands in CI pipelines where a trailing newline changed the behavior of a downstream tool.
Duplicate keys are silently allowed
database:
host: prod-db.example.com
port: 5432
host: staging-db.example.com
Most YAML parsers will accept this without error. The second host value silently overwrites the first. In a large configuration file, especially one built by merging multiple YAML files, this is a real risk.
The YAML 1.2 spec says duplicate keys "should" be forbidden but doesn't require it. Some parsers offer a strict mode that rejects duplicates. In Python, yaml.safe_load() from PyYAML silently takes the last value. The strictyaml library rejects duplicates.
Anchors and aliases are powerful and dangerous
defaults: &defaults
timeout: 30
retries: 3
production:
<<: *defaults
timeout: 60
staging:
<<: *defaults
Anchors (&) and aliases (*) let you reuse blocks of YAML. The merge key (<<) combines them. This is convenient for reducing repetition, but it creates hidden coupling. If someone edits the defaults block, both production and staging change. In a large file, it can be hard to trace where values actually come from.
More importantly, this feature has been exploited in security attacks. YAML bombs use nested anchors to create exponential expansion:
a: &a ["lol","lol","lol"]
b: &b [*a,*a,*a]
c: &c [*b,*b,*b]
This is why you should always use safe_load instead of load in PyYAML, and why untrusted YAML input should be treated with the same caution as untrusted code.
Common mistakes that validators catch
A good YAML validator catches more than syntax errors. Here's what to look for:
- Mixed indentation -- spaces and tabs in the same file, or inconsistent indentation widths (two spaces in one block, four in another).
-
Implicit type coercion -- values like
on,off,yes,nobeing treated as booleans when you intended strings. - Trailing whitespace -- spaces after a colon-space separator or at the end of a line that can change how values are parsed.
- Missing quotes on special characters -- values containing colons, brackets, or curly braces that need quoting to avoid being parsed as YAML structures.
- Invalid escape sequences -- using backslash escapes in single-quoted strings (single quotes in YAML don't process escape sequences, only double quotes do).
Validating YAML from the command line
Python has a built-in YAML check:
python3 -c "import yaml; yaml.safe_load(open('config.yml'))"
For something more thorough, yamllint is the standard linter:
pip install yamllint
yamllint config.yml
yamllint checks not just syntax but style: line length, indentation consistency, trailing spaces, and truthy values. You can configure it with a .yamllint file in your project root.
For quick validation without installing anything on your machine, I built a YAML validator at zovo.one/free-tools/yaml-validator that catches these issues and highlights the exact line where the parse fails.
YAML's flexibility is both its strength and its biggest footgun. The format is human-readable, which is genuinely valuable. But "human-readable" and "human-writable" are different things, and YAML's implicit typing and whitespace sensitivity mean that the file you think you wrote and the data the parser actually reads can be different. Validate early. Validate often. Your 2 AM self will thank you.
I'm Michael Lip. I build free developer tools at zovo.one. 350+ tools, all private, all free.
Top comments (0)