- Initial thoughts
- Why attribute order matters
- Script overview and usage
- The recommended attribute order
- How the script works
- Customizing the sort order
- The complete script
- Wrapping up
- Further reading
Maintaining consistent GitLab CI YAML files across a project is challenging when multiple developers contribute. A standardized attribute order within jobs improves readability, simplifies code reviews, and reduces merge conflicts. This article presents a Python script that automatically sorts job attributes according to a configurable order, treating YAML files as text to preserve formatting choices.
Initial thoughts
Anyone who has worked on a GitLab CI pipeline with multiple contributors knows the pain: each developer has their own style for ordering job attributes. One puts script first, another starts with extends, and a third prefers image at the top. The result? Inconsistent files, noisy diffs, and unnecessary merge conflicts.
We searched extensively for a tool to sort job attributes in GitLab YAML files but could not find any that met our requirements. The challenge? Most YAML tools rebuild the entire syntax tree when processing files, wiping out comments, blank lines, array formatting styles, and other developer choices in the process. Standard tools like yq have additional limitations β they handle emojis poorly and strip blank lines.
The solution: we built a custom script that processes YAML files line by line, respecting the original structure while enforcing a consistent attribute order within jobs.
Why attribute order matters
A well-defined attribute order provides several benefits:
- Predictability β developers know exactly where to find each attribute
- Easier code reviews β reviewers can quickly scan jobs when structure is consistent
- Reduced merge conflicts β when everyone follows the same order, concurrent edits are less likely to conflict
- Logical flow β ordering attributes by their function creates a natural reading experience
Consider this job before sorting:
deploy-production:
when: manual
script:
- deploy.sh
extends: .base-deploy
environment: production
stage: deploy
rules:
- if: $CI_COMMIT_BRANCH == "main"
And after sorting:
deploy-production:
extends: .base-deploy
stage: deploy
script:
- deploy.sh
environment: production
when: manual
rules:
- if: $CI_COMMIT_BRANCH == "main"
The sorted version follows a logical progression: inheritance first, then positioning, then execution, then conditions.
Script overview and usage
The gitlab-yaml-sort.py script provides a simple interface for sorting GitLab CI files:
# Sort .gitlab-ci.yml in current directory
python gitlab-yaml-sort.py
# Sort a specific file
python gitlab-yaml-sort.py path/to/.gitlab-ci.yml
# Sort all GitLab CI files in a folder recursively
python gitlab-yaml-sort.py ./ci-configs/
Key features:
-
In-place editing with automatic backup (
.bakfile created, removed on success) -
Recursive processing for folders, matching files like
*.gitlab-ci*.ymlor*.gitlab-ci*.yaml -
Smart directory filtering β ignores
.git,node_modules,tmp, and other common exclusions - Unknown attributes preserved β attributes not in the defined order are pushed to the end
The recommended attribute order
The script uses an external configuration file (gitlab-yaml-sort.yml) that defines the attribute order with explanations:
job:
extends: placed first since it is overridden by everything below
stage: defines relationships between jobs, stable
resource_group: related to other jobs
dependencies: related to other jobs
needs: related to other jobs, must include dependencies
image: defined before scripts, set once and used throughout
tags: specified by image type or environment level
parallel: for a given tag
services: started before scripts
cache: tunes scripts before and after them, closer is better
variables: can be extensive, placed just before scripts
before_script: precedes script, merges into it
script: core functionality
after_script: follows script
coverage: single line, occurs at the end based on scripts outputs
publish: single line, specifies artifact location
artifacts: executes after scripts, medium-sized, related to script output
environment: medium-sized configuration at the end, independent of script content
interruptible: environment configuration determines interruptible status
timeout: occurs at the end
when: single line, works in conjunction with rules
allow_failure: works together with when and rules
retry: not triggered when allow_failure:true
rules: can be extensive in monorepos, positioned last to avoid visual interference
This order follows a logical grouping:
-
Inheritance and relationships (
extends,stage,resource_group,dependencies,needs) -
Execution environment (
image,tags,parallel,services) -
Pre-execution setup (
cache,variables,before_script) -
Core execution (
script,after_script) -
Post-execution (
coverage,publish,artifacts,environment) -
Flow control (
interruptible,timeout,when,allow_failure,retry,rules)
How the script works
Standard YAML parsers normalize the output, removing comments, blank lines, and formatting choices. Our script treats the file as text, preserving:
- Empty lines between sections
- Comments
- Multiline string formatting
- Quote styles
The tradeoff is that the script relies on indentation patterns rather than semantic understanding, but this works reliably for well-formed GitLab CI files.
The script identifies job attributes by their indentation level:
-
Lines with no indentation β top-level keys (job names,
stages,variables, etc.) - Lines with 2-space indentation β job attributes to be sorted
- Lines with 4+ space indentation β attribute content, kept with their parent
For each job, the script extracts the attribute name (text before the colon), finds its position in the configured order, and uses that as the sort key. Unknown attributes are pushed to the end.
The script also handles several edge cases:
-
Ignored top-level keywords β sections like
stages,includes,variables, andworkflowat the root level are written as-is without sorting their contents -
Ignored directories β recursive processing skips
.git,node_modules,tmp, and other common non-source directories -
File pattern matching β only processes files matching the
*.gitlab-ci*.ymlnaming convention
Customizing the sort order
To modify the attribute order, simply edit the gitlab-yaml-sort.yml file. The script reads this configuration at runtime, extracting attribute names from the job: section.
This approach offers several advantages:
- Self-documenting β the YAML file explains why each attribute is positioned where it is
- Easy to customize β no code changes required to adjust the order
- Team-friendly β the configuration file can be versioned and discussed in code reviews
The complete script
#!/usr/local/bin/python
# GitLab Yaml sorter
# Script for sorting GitLab CI job attributes by a predefined order.
# Usage:
# usage: gitlab-yaml-sort.py [file|folder]
# Parameters :
# file : the file to to be processed in-place
# folder : a folder where .gitlab-ci.yml (and similarly named files) will be processed recursively
# no parameter : .gitlab-ci.yml in current folder will be processed
# The script edits in-place a GitLab CI YAML file and sorts the job attributes by a predefined order.
# Unknown attributes are pushed to the end.
# If any error occurs while processing, the original file is kept has a .bak file.
import sys
import os
import re
import shutil
def get_attributes_order():
sort_file_path = os.path.join(os.path.dirname(__file__), 'gitlab-yaml-sort.yml')
attributes = []
with open(sort_file_path, 'r') as f:
lines = f.readlines()
# Find the job section and extract attribute names
in_job_section = False
for line in lines:
line = line.strip()
if line.startswith('job:'):
in_job_section = True
continue
elif in_job_section and line and not line.startswith('#'):
# Extract attribute name (everything before the colon)
if ':' in line:
attr_name = line.split(':')[0].strip()
if attr_name:
attributes.append(attr_name)
return attributes
# Define the order in which job_attributes should be sorted
ATTRIBUTES_ORDER = get_attributes_order()
# Define special keywords to skip sorting of the following block. 'default' is a special keyword but should be sorted
IGNORED_TOP_LEVEL_KEYWORDS = ["stages", "includes", "variables", "workflow"]
# List of directories to ignore
IGNORED_DIRECTORIES = [".git", ".history", "node_modules", "tmp", ".gitlab-ci-local", "build-docs"]
# Define regex for matching filenames
GITLAB_FILENAME_REGEX = re.compile(r'.*\.gitlab-ci.*\.ya?ml$')
def sort_job_attributes(job_attributes):
sorted_job_attributes = sorted(job_attributes, key=lambda b: ATTRIBUTES_ORDER.index(b[0].lstrip().split(':')[0])
if b[0].lstrip().split(':')[0] in ATTRIBUTES_ORDER
else len(ATTRIBUTES_ORDER))
return [line for block in sorted_job_attributes for line in block]
def process_file(filename):
# make a backup of the original file in case of error processing it
shutil.copyfile(filename, filename + ".bak")
# Initialize variables for tracking job_attributes and lines
job_attributes = []
current_attribute = []
is_current_block_sortable = True
last_line_was_empty = False
with open(filename, 'r') as f:
lines = f.readlines()
with open(filename, 'w') as f:
for line in lines:
# Check if the line starts with special keywords
if any(line.startswith(keyword) for keyword in IGNORED_TOP_LEVEL_KEYWORDS):
# flush current attribute content
f.write(''.join(sort_job_attributes(job_attributes)))
job_attributes = []
current_attribute = []
if last_line_was_empty:
f.write('\n')
# don't reorganize and write
f.write(line)
is_current_block_sortable = False
# Check if the line is indented in a non-sortable block
elif line.startswith(' ' * 2) and not is_current_block_sortable:
# Just write
f.write(line)
# Check if the line is indented to sub-sublevel
elif line.startswith(' ' * 4):
# Add the line to the current block
current_attribute.append(line)
# Check if the line is indented to sublevel : this is the beginning of a new block to be sorted
elif line.startswith(' ' * 2):
# Add the current block to the list of job_attributes (if it's not empty)
current_attribute = [line]
job_attributes.append(current_attribute)
is_current_block_sortable = True
# Handle special case when there are empty lines in attributes (such as 'script')
elif line.strip() == '':
last_line_was_empty = True
# Otherwise, the line is not indented and should be written
else:
f.write(''.join(sort_job_attributes(job_attributes)))
if last_line_was_empty:
f.write('\n')
f.write(line)
# Reset variables and continue to the next line
job_attributes = []
current_attribute = []
is_current_block_sortable = True
last_line_was_empty = False
if current_attribute:
f.write(''.join(sort_job_attributes(job_attributes)))
print("successfully sorted job attributes in " + filename)
os.remove(filename + ".bak")
def process_files_recursively(directory):
for root, dirs, files in os.walk(directory):
# Remove ignored directories from the list
dirs[:] = [d for d in dirs if d not in IGNORED_DIRECTORIES]
for file in files:
if GITLAB_FILENAME_REGEX.match(file):
file_path = os.path.join(root, file)
process_file(file_path)
if __name__ == '__main__':
if len(sys.argv) == 1:
process_file(".gitlab-ci.yml")
elif len(sys.argv) == 2:
path = sys.argv[1]
if os.path.isfile(path):
process_file(path)
elif os.path.isdir(path):
process_files_recursively(path)
else:
print("usage: python {} [file|folder]".format(sys.argv[0]))
Wrapping up
Consistent attribute ordering in GitLab CI files is a small investment with significant returns: cleaner diffs, fewer merge conflicts, and improved readability. The gitlab-yaml-sort.py script automates this process while respecting the developer's formatting choices.
The key design decisions that make this tool effective:
- Text-based processing preserves comments, blank lines, and formatting
- External configuration allows easy customization without code changes
- Logical attribute grouping creates a natural reading flow from inheritance to execution to conditions
Consider integrating this script into your pre-commit hooks or CI pipeline to enforce consistent styling across your team.
Illustrations generated locally by Draw Things using Flux.1 [Schnell] model
Further reading
π Git / π¦ GitLab
- Efficient Git Workflow for Web Apps: Advancing Progressively from Scratch to Thriving
- Forget GitKraken, Here are the Only Git Commands You Need
- A Python Script Displaying Latest Pipelines in a Group's Projects
- A Python Script Calculating DORA Metrics
- Deploy a Majestic Single Server Runner on AWS
- The Majestic Single Server Runner
- YAML Modifications: Tackling the Feedback Loop Problem
- 15+ Tips for Faster Pipelines
- 10+ Best Practices to Avoid Widespread Anti-Patterns
- Pages per Branch: The No-Compromise Hack to Serve Preview Pages
- Jobs Attributes Sorter: A Python Script for Consistent YAML Files
- Runners Topologies: Pros and Cons
βΈοΈ Kubernetes
- A Convenient Variable Substitution Mechanism for Kustomize
- Why Managed Kubernetes is a Viable Solution for Even Modest but Actively Developed Applications
- From Your Docker-Compose File to a Cluster with Kompose
- A Pragmatic Kubectl Aliases Collection
- Web Application on Kubernetes: A Tutorial to Observability with the Elastic Stack
- NGINX Ingress Controller: 10+ Complementary Configurations for Web Applications
- Awesome Maintained Links You Will Keep Using Next Year
- Managed Kubernetes: Our Dev is on AWS, Our Prod is on OVHCloud
- How to Deploy a Cost-Efficient AWS/EKS Cluster Using Terraform
- How to Deploy a Secured OVHCloud Managed Cluster Using Terraform
- FinOps EKS: 10 Tips to Reduce the Bill up to 90% on AWS Managed Clusters
π Miscellaneous
- Every Developer Should Review Code β Not Just Seniors
- Future-Proof Tech Blogging: Understanding AI's Core Traits
This article was enhanced with the assistance of an AI language model to ensure clarity and accuracy in the content, as English is not my native language.


Top comments (0)