DEV Community

Cover image for 🦊 GitLab CI Jobs Attributes Sorter: A Python Script for Consistent YAML Files
Benoit COUETIL πŸ’« for Zenika

Posted on • Edited on

🦊 GitLab CI Jobs Attributes Sorter: A Python Script for Consistent YAML Files

Maintaining consistent GitLab CI YAML files across a project is challenging when multiple developers contribute. A standardized attribute order within jobs improves readability, simplifies code reviews, and reduces merge conflicts. This article presents a Python script that automatically sorts job attributes according to a configurable order, treating YAML files as text to preserve formatting choices.

Initial thoughts

Anyone who has worked on a GitLab CI pipeline with multiple contributors knows the pain: each developer has their own style for ordering job attributes. One puts script first, another starts with extends, and a third prefers image at the top. The result? Inconsistent files, noisy diffs, and unnecessary merge conflicts.

We searched extensively for a tool to sort job attributes in GitLab YAML files but could not find any that met our requirements. The challenge? Most YAML tools rebuild the entire syntax tree when processing files, wiping out comments, blank lines, array formatting styles, and other developer choices in the process. Standard tools like yq have additional limitations β€” they handle emojis poorly and strip blank lines.

The solution: we built a custom script that processes YAML files line by line, respecting the original structure while enforcing a consistent attribute order within jobs.

Why attribute order matters

A well-defined attribute order provides several benefits:

  1. Predictability β€” developers know exactly where to find each attribute
  2. Easier code reviews β€” reviewers can quickly scan jobs when structure is consistent
  3. Reduced merge conflicts β€” when everyone follows the same order, concurrent edits are less likely to conflict
  4. Logical flow β€” ordering attributes by their function creates a natural reading experience

Consider this job before sorting:

deploy-production:
  when: manual
  script:
    - deploy.sh
  extends: .base-deploy
  environment: production
  stage: deploy
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
Enter fullscreen mode Exit fullscreen mode

And after sorting:

deploy-production:
  extends: .base-deploy
  stage: deploy
  script:
    - deploy.sh
  environment: production
  when: manual
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
Enter fullscreen mode Exit fullscreen mode

The sorted version follows a logical progression: inheritance first, then positioning, then execution, then conditions.

Script overview and usage

The gitlab-yaml-sort.py script provides a simple interface for sorting GitLab CI files:

# Sort .gitlab-ci.yml in current directory
python gitlab-yaml-sort.py

# Sort a specific file
python gitlab-yaml-sort.py path/to/.gitlab-ci.yml

# Sort all GitLab CI files in a folder recursively
python gitlab-yaml-sort.py ./ci-configs/
Enter fullscreen mode Exit fullscreen mode

Key features:

  • In-place editing with automatic backup (.bak file created, removed on success)
  • Recursive processing for folders, matching files like *.gitlab-ci*.yml or *.gitlab-ci*.yaml
  • Smart directory filtering β€” ignores .git, node_modules, tmp, and other common exclusions
  • Unknown attributes preserved β€” attributes not in the defined order are pushed to the end

The recommended attribute order

The script uses an external configuration file (gitlab-yaml-sort.yml) that defines the attribute order with explanations:

job:
  extends: placed first since it is overridden by everything below
  stage: defines relationships between jobs, stable
  resource_group: related to other jobs
  dependencies: related to other jobs
  needs: related to other jobs, must include dependencies
  image: defined before scripts, set once and used throughout
  tags: specified by image type or environment level
  parallel: for a given tag
  services: started before scripts
  cache: tunes scripts before and after them, closer is better
  variables: can be extensive, placed just before scripts
  before_script: precedes script, merges into it
  script: core functionality
  after_script: follows script
  coverage: single line, occurs at the end based on scripts outputs
  publish: single line, specifies artifact location
  artifacts: executes after scripts, medium-sized, related to script output
  environment: medium-sized configuration at the end, independent of script content
  interruptible: environment configuration determines interruptible status
  timeout: occurs at the end
  when: single line, works in conjunction with rules
  allow_failure: works together with when and rules
  retry: not triggered when allow_failure:true
  rules: can be extensive in monorepos, positioned last to avoid visual interference
Enter fullscreen mode Exit fullscreen mode

This order follows a logical grouping:

  1. Inheritance and relationships (extends, stage, resource_group, dependencies, needs)
  2. Execution environment (image, tags, parallel, services)
  3. Pre-execution setup (cache, variables, before_script)
  4. Core execution (script, after_script)
  5. Post-execution (coverage, publish, artifacts, environment)
  6. Flow control (interruptible, timeout, when, allow_failure, retry, rules)

How the script works

Standard YAML parsers normalize the output, removing comments, blank lines, and formatting choices. Our script treats the file as text, preserving:

  • Empty lines between sections
  • Comments
  • Multiline string formatting
  • Quote styles

The tradeoff is that the script relies on indentation patterns rather than semantic understanding, but this works reliably for well-formed GitLab CI files.

The script identifies job attributes by their indentation level:

  • Lines with no indentation β€” top-level keys (job names, stages, variables, etc.)
  • Lines with 2-space indentation β€” job attributes to be sorted
  • Lines with 4+ space indentation β€” attribute content, kept with their parent

For each job, the script extracts the attribute name (text before the colon), finds its position in the configured order, and uses that as the sort key. Unknown attributes are pushed to the end.

The script also handles several edge cases:

  • Ignored top-level keywords β€” sections like stages, includes, variables, and workflow at the root level are written as-is without sorting their contents
  • Ignored directories β€” recursive processing skips .git, node_modules, tmp, and other common non-source directories
  • File pattern matching β€” only processes files matching the *.gitlab-ci*.yml naming convention

orange fox staring at a tall shelf in a library, a pile of books in the hands, anime style

Customizing the sort order

To modify the attribute order, simply edit the gitlab-yaml-sort.yml file. The script reads this configuration at runtime, extracting attribute names from the job: section.

This approach offers several advantages:

  • Self-documenting β€” the YAML file explains why each attribute is positioned where it is
  • Easy to customize β€” no code changes required to adjust the order
  • Team-friendly β€” the configuration file can be versioned and discussed in code reviews

The complete script

#!/usr/local/bin/python

# GitLab Yaml sorter

# Script for sorting GitLab CI job attributes by a predefined order.

# Usage:
# usage: gitlab-yaml-sort.py [file|folder]

# Parameters :
# file : the file to to be processed in-place
# folder : a folder where .gitlab-ci.yml (and similarly named files) will be processed recursively
# no parameter : .gitlab-ci.yml in current folder will be processed

# The script edits in-place a GitLab CI YAML file and sorts the job attributes by a predefined order.
# Unknown attributes are pushed to the end.
# If any error occurs while processing, the original file is kept has a .bak file.

import sys
import os
import re
import shutil


def get_attributes_order():
    sort_file_path = os.path.join(os.path.dirname(__file__), 'gitlab-yaml-sort.yml')
    attributes = []

    with open(sort_file_path, 'r') as f:
        lines = f.readlines()

    # Find the job section and extract attribute names
    in_job_section = False
    for line in lines:
        line = line.strip()
        if line.startswith('job:'):
            in_job_section = True
            continue
        elif in_job_section and line and not line.startswith('#'):
            # Extract attribute name (everything before the colon)
            if ':' in line:
                attr_name = line.split(':')[0].strip()
                if attr_name:
                    attributes.append(attr_name)

    return attributes

# Define the order in which job_attributes should be sorted
ATTRIBUTES_ORDER = get_attributes_order()

# Define special keywords to skip sorting of the following block. 'default' is a special keyword but should be sorted
IGNORED_TOP_LEVEL_KEYWORDS = ["stages", "includes", "variables", "workflow"]

# List of directories to ignore
IGNORED_DIRECTORIES = [".git", ".history", "node_modules", "tmp", ".gitlab-ci-local", "build-docs"]

# Define regex for matching filenames
GITLAB_FILENAME_REGEX = re.compile(r'.*\.gitlab-ci.*\.ya?ml$')


def sort_job_attributes(job_attributes):
    sorted_job_attributes = sorted(job_attributes, key=lambda b: ATTRIBUTES_ORDER.index(b[0].lstrip().split(':')[0])
                                   if b[0].lstrip().split(':')[0] in ATTRIBUTES_ORDER
                                   else len(ATTRIBUTES_ORDER))
    return [line for block in sorted_job_attributes for line in block]


def process_file(filename):

    # make a backup of the original file in case of error processing it
    shutil.copyfile(filename, filename + ".bak")

    # Initialize variables for tracking job_attributes and lines
    job_attributes = []
    current_attribute = []
    is_current_block_sortable = True
    last_line_was_empty = False

    with open(filename, 'r') as f:
        lines = f.readlines()

    with open(filename, 'w') as f:
        for line in lines:
            # Check if the line starts with special keywords
            if any(line.startswith(keyword) for keyword in IGNORED_TOP_LEVEL_KEYWORDS):
                # flush current attribute content
                f.write(''.join(sort_job_attributes(job_attributes)))
                job_attributes = []
                current_attribute = []
                if last_line_was_empty:
                    f.write('\n')
                # don't reorganize and write
                f.write(line)
                is_current_block_sortable = False
            # Check if the line is indented in a non-sortable block
            elif line.startswith(' ' * 2) and not is_current_block_sortable:
                # Just write
                f.write(line)
            # Check if the line is indented to sub-sublevel
            elif line.startswith(' ' * 4):
                # Add the line to the current block
                current_attribute.append(line)
            # Check if the line is indented to sublevel : this is the beginning of a new block to be sorted
            elif line.startswith(' ' * 2):
                # Add the current block to the list of job_attributes (if it's not empty)
                current_attribute = [line]
                job_attributes.append(current_attribute)
                is_current_block_sortable = True
            # Handle special case when there are empty lines in attributes (such as 'script')
            elif line.strip() == '':
                last_line_was_empty = True
            # Otherwise, the line is not indented and should be written
            else:
                f.write(''.join(sort_job_attributes(job_attributes)))
                if last_line_was_empty:
                    f.write('\n')
                f.write(line)
                # Reset variables and continue to the next line
                job_attributes = []
                current_attribute = []
                is_current_block_sortable = True
                last_line_was_empty = False

        if current_attribute:
            f.write(''.join(sort_job_attributes(job_attributes)))

    print("successfully sorted job attributes in " + filename)
    os.remove(filename + ".bak")


def process_files_recursively(directory):
    for root, dirs, files in os.walk(directory):
        # Remove ignored directories from the list
        dirs[:] = [d for d in dirs if d not in IGNORED_DIRECTORIES]
        for file in files:
            if GITLAB_FILENAME_REGEX.match(file):
                file_path = os.path.join(root, file)
                process_file(file_path)


if __name__ == '__main__':
    if len(sys.argv) == 1:
        process_file(".gitlab-ci.yml")
    elif len(sys.argv) == 2:
        path = sys.argv[1]
        if os.path.isfile(path):
            process_file(path)
        elif os.path.isdir(path):
            process_files_recursively(path)
    else:
        print("usage: python {} [file|folder]".format(sys.argv[0]))
Enter fullscreen mode Exit fullscreen mode

Wrapping up

Consistent attribute ordering in GitLab CI files is a small investment with significant returns: cleaner diffs, fewer merge conflicts, and improved readability. The gitlab-yaml-sort.py script automates this process while respecting the developer's formatting choices.

The key design decisions that make this tool effective:

  • Text-based processing preserves comments, blank lines, and formatting
  • External configuration allows easy customization without code changes
  • Logical attribute grouping creates a natural reading flow from inheritance to execution to conditions

Consider integrating this script into your pre-commit hooks or CI pipeline to enforce consistent styling across your team.

orange fox staring at a tall shelf in a library, a pile of books in the hands, anime style

Illustrations generated locally by Draw Things using Flux.1 [Schnell] model

Further reading

πŸ”€ Git / 🦊 GitLab

☸️ Kubernetes

πŸ“ Miscellaneous


This article was enhanced with the assistance of an AI language model to ensure clarity and accuracy in the content, as English is not my native language.

Top comments (0)