Jari Haikonen for Polar Squad

Posted on Apr 8

AI vs reality: Why GitLab pipelines confuse LLMs

#ai #gitlab #cicd

The model gave me perfectly valid YAML. The pipeline failed. I asked the model to fix it. It gave me more perfectly valid YAML. The pipeline failed again. After the fourth iteration I just opened the GitLab docs, found the issue in two minutes, and fixed it myself.

This is one of the most common frustrations I have seen with LLMs in DevOps work. The .gitlab-ci.yml file I am working with has 242 commits. 73 of them contain the word "fix". The theme across most of them: the YAML is valid. GitLab disagrees.

GitLab pipelines are not just YAML

GitLab CI/CD pipelines are defined in a .gitlab-ci.yml file, and yes, the format is YAML. But GitLab has its own specific implementation on top of that, with its own keywords, its own scoping rules, and its own runtime semantics. Generic YAML parsers will happily accept a file that GitLab's pipeline linter will reject. And sometimes it does not reject it at all. It just runs differently than you expected.

That gap between "valid YAML" and "valid GitLab pipeline" is where the problems live.

Three ways this plays out in practice

The changes: anchor problem

The pipeline had repeated file path lists in every rule block. The natural LLM suggestion: extract them into YAML anchors and reference them. It produced something that looked completely reasonable:

.frontend-changes: &frontend-changes
  changes:
    - "apps/frontend/**/*"
    - "packages/**/*"
    - ".gitlab-ci.yml"

.mr-frontend:
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
      <<: *frontend-changes

Valid YAML. Not valid GitLab CI in practice. YAML anchors are resolved before GitLab ever processes the config. GitLab only sees the expanded result. The problem is that merge keys (<<:) do not behave predictably inside nested rule structures. After the merge is applied, the resulting shape may not match what GitLab expects for a rules:changes block, so it either silently falls back to "always run" or evaluates incorrectly depending on context.

The anchors exist in the actual file. The <<: syntax is there. It just does not do what it looks like it does. The comment sitting in the production file:

# Sadly gitlab changes do not support with anchors or references to make these DRY

The extends + silent replacement problem

The next suggestion: use extends: to compose rule templates.

lint-all:
  extends: [.mr-frontend, .mr-backend]

The model expected this to combine both templates' rules, giving you OR logic: the job runs if frontend files changed or if backend files changed. That is not what happens. extends: does not merge arrays — it replaces them, with the last template winning. In the expanded configuration, lint-all ends up with only .mr-backend's rules. The frontend changes: patterns disappear without any warning.

The reason this went undetected: both templates shared most of the same changes: paths — package.json, packages/**/*, .gitlab-ci.yml, and others. The only difference was the last entry: apps/frontend/**/* versus apps/backend/**/*. Most real commits touch shared files, so the job triggered anyway. But a commit that only changes frontend code and nothing else would silently skip the lint job. That bug was live in the pipeline for months.

For explicit control over rule composition, GitLab's !reference tags let you manually assemble the rules: array from multiple sources. But even then, you only get OR logic: GitLab evaluates rules in order and the first matching rule decides the outcome. There is no way to compose AND conditions — run only when a specific branch condition AND specific file changes are both true — across reusable templates. Every combination has to be written out explicitly. The pipeline has seven of these rule templates that a model will always try to collapse into two or three. It cannot be done without changing the semantics.

# These either cant be DRY because the rules are OR and not AND
# (the - & anchors do not work really well)

The 12-minute revert

One commit consolidated a parallel: matrix Docker build into a single sequential job. The reasoning, left in a comment, was exactly the kind of thing a model writes:

# Single job: on main we run stage then latest (same Docker layer cache,
# second build is fast). Separate jobs would duplicate work; one job with
# sequential builds reuses cache.

Logically correct. The Docker layer cache argument is real. The revert came 12 minutes later with no commit message. The problem was not correctness. parallel: matrix gives you separate job entries in the pipeline UI, separate log streams, the ability to retry one variant independently, and separate pass/fail status per build type. Collapsing into one job trades all of that for a real but secondary cache win.

The model optimized for build efficiency, but the system required failure isolation and debuggability. It did not know how you use the GitLab pipeline UI when something breaks at 2am.

The shape of the problem

Every one of these failures produces valid YAML. CI Lint may pass too. It can simulate pipeline creation for the default branch, but it cannot replicate the full runtime context — which branch triggered the pipeline, which files changed in the merge request, which variables are set. The subtle rule evaluation issues only surface when the pipeline actually runs.

What YAML says	What GitLab does
`<<: *anchor` inside `rules:`	Structurally valid, semantically inconsistent
`extends: [A, B]` with `rules:`	Last template silently replaces the first; overlapping patterns hide the bug
`parallel: matrix` removed	Valid, but changes operational behavior, not just output

The LLM generates syntactically correct config. What it cannot do is predict how that config will behave in a system where the outcome depends on evaluation order, repo state, and pipeline context that are not visible in the file itself. You can give it better documentation and it will still get this wrong, because the knowledge it is missing only appears when the pipeline actually runs.

How to recognize you are in a loop

After a few rounds of this you start to recognize the pattern. The error is not actually changing between iterations, the model is adding complexity rather than addressing the root cause, and you are spending more time writing context than it would take to just look up the answer yourself.

Two or three iterations without meaningful progress is the signal. The right move is to stop, go to the GitLab documentation directly, and use GitLab's built-in CI Lint tool to validate the syntax. Find the actual constraint, fix it yourself, and re-engage the model for the work around it. The model is still useful for the majority of pipeline work. It is just not useful for the parts that require knowing GitLab's specific implementation.

The takeaway

Your tooling-specific experience is not optional when things go wrong. LLMs are useful for writing pipeline structure, generating job definitions, and handling the repetitive parts. But when something breaks in a GitLab-specific way, the fastest path forward is usually you, not the model.

And when the pipeline gives you the same error for the fourth time, open the docs.

If you are curious about the flip side of this, there is an article on two Terraform workflows where LLMs genuinely help: importing existing infrastructure and scaffolding modules from your own documentation. That one is a much more satisfying story.

A note on timing: most of the work described here was done close to a year ago, without MCP or similar tool integrations that give models direct access to documentation and live context. Models may handle some of these cases better today. The underlying gap between "valid YAML" and "valid GitLab pipeline" is still real, but your mileage may vary.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.