- Initial thoughts
- Prerequisites: The right foundation
- Classic optimizations first
- Extreme optimizations: going beyond
- Breaking down the 3-second job
- Final timing breakdown
- Real project results
- Wrapping up
- Further reading
Initial thoughts
Is it really possible to run GitLab CI jobs in just 3 seconds on a codebase with several million lines of code? The answer is yes, and this article will show you how.
After comparing different runner topologies in GitLab Runners: Which Topology for Fastest Job Execution?, we found that Shell and Docker executors offer the best potential for fast job execution. Now it's time to push those runners to their absolute limits with extreme optimizations.
This isn't about theoretical performanceβthese are real, production results achieved on actual projects. The 3-second example is our lightest job (a JIRA/MR synchronization check), and it consistently runs in ~3-5 seconds on our 1,500,000+ line mono-repo actively maintained by 15 developers. We have jobs at various fixed speeds: one at ~3-5s, some at a few seconds, others at ~2min for resource-heavy builds, and end-to-end tests at ~15min. The optimizations in this article apply to all jobs.
Time to hunt down every millisecond of overhead β no mercy. Because, as French president Macron said, "Pipeline sometimes is too slow".
Prerequisites: The right foundation
Before diving into extreme optimizations, we need the right infrastructure foundation. Based on our topology comparison, this means:
Infrastructure choice:
- β Shell executor (fastest) or Docker executor (fast with isolation)
- β Single well-provisioned server (not autoscaling)
- β Local SSD storage (NVMe preferred)
- β Sufficient CPU/RAM for concurrent jobs
- β Fast network connection to GitLab instance
Why this matters: Every other topology adds fundamental latencies (VM provisioning, pod scheduling, remote cache, shared resources) that cannot be eliminated through configuration. Starting with the wrong topology means you've already lost.
Classic optimizations first
Before going extreme, apply standard GitLab CI optimizations from GitLab CI Optimization: 15+ Tips for Faster Pipelines.
But these optimizations alone won't get you to 3 seconds. They'll improve your average pipeline duration. To reach sub-10-second jobs, we need to dig deeper into GitLab runners arcanes.
Extreme optimizations: going beyond
Now we enter extreme optimization territory. These techniques are specific to Shell/Docker runners and exploit their local filesystem advantages.
Here's what we'll optimize to near-zero overhead:
- Waiting time β Proper server sizing
- Server provisioning β Already exists (no VM/pod creation)
- OS/Container startup β Native shell (1s) or cached images
- Git operations β Shallow fetches, reused local clones
- Cache β Local filesystem, preserved directories
- Artifacts β None in our fastest jobs
- Script execution β Minimal in our fastest jobs
- Termination β No cache/artifact uploads in our fastest jobs
Let's break down each phase and see how to optimize it.
Breaking down the 3-second job
Job execution on a well-sized shell runner
First, let's visualize what a well-optimized shell runner job timeline looks like before extreme optimizations:
Total: already fast on fastest jobs scripts. Now let's optimize each phase to the extreme.
Phase 1: Waiting (~0s)
The problem: Jobs queue when runners are saturated. Your developers are staring at a spinner. Somewhere, a PM is asking "is the pipeline stuck again?".
The solution : proper server sizing
- Size the VM for peak concurrent load (not average)
- CPU: 16 cores for a 15 developers team on a mono-repo. Not the smallest, but cheap comparing to salaries
- Disk I/O: NVMe SSD essential for concurrent git/cache operations
- Concurrency tuned (~16 concurrent jobs for 16 CPU)
β±οΈ Result: ~0s waiting (when properly sized). No more "grab a coffee" excuses β the job finishes before you stand up.
Phase 2: Server (~0s)
Server already exists β shocking, right?
- Shell runner: no provisioning at all
- Docker runner: no VM creation, just container scheduling
- Resources immediately available
- No cloud API calls needed
This is a fundamental advantage of single-server topologies over autoscaling. While Kubernetes is busy scheduling pods, our job is already done.
Phase 3: OS / Shell (~1s)
Think of it this way: Shell executor is a barefoot sprinter, Docker is a sprinter with fancy shoes. Both are fast, but one has less to put on.
For Shell executor (fastest, ~1s):
- No Docker image needed
- No container startup
- Direct shell execution
- Native OS environment
For Docker executor (fast with isolation ~1-4s):
- Pre-pull images to server
- Use lightweight base images (alpine)
- Layer caching on local Docker
- Keep containers warm when possible
Phase 4: Git Clone/Fetch (~1s)
This is where the magic happens. Or rather, where the magic doesn't happen β because the best git operation is the one you barely do.
Strategy and depth
Obviously, to achieve 3s on a large codebase, a close version of the code must already be present locally. The default algorithm is a fetch (which is perfect), meaning that the runner will just reconstruct a few commits in a build directory that already has been used by a previous job, preferably with nearly same code but different commits.
The default is 20 commits reconstructed. In fact, you only need one most of the time (when not doing git stuff).
variables:
GIT_DEPTH: 1 # default to 20
Fetch flags
The default fetch flags are --force --prune --tags. force and prune are very useful to handle git fetch problems and keep local repo size reasonable. You can experiment with these on short-lived runners, at your own risks. We tried this but had to step back for consistency, even on our daily runners.
But fetching tags is almost never a good idea, at least by default. Even though we heavily rely on tag pipelines, we still don't need to fetch any tag. Except for a job that uses older tags to extract versions; for it we manually fetch tags.
Optionally, we can define the refmap to fetch, avoiding unnecessary fetching.
Here are our flags for merge requests :
GIT_FETCH_EXTRA_FLAGS: --force --prune --no-tags
--refmap "+refs/merge-requests/${CI_MERGE_REQUEST_IID}/head:refs/remotes/origin/merge-requests/${CI_MERGE_REQUEST_IID}/head"
Note: add --verbose to show what's happening, and be sure to know what takes time and what not. We still have this flag because there only a few lines added when fetch is optimised, and it does not slow down the job.
Situational: tailored clone path
When long-lived branches have a fair amount of difference (the longer the branch and/or the higher the developer count, the more differences), fetch takes time. On our projects we have thousands commits of differences between long-lived branches at worse time of our cycle!
To keep a sub-second fetch time, we configure clone in paths depending on the target branch. And yes, MR pipelines secret mode is required for this.
So our clone path depend on the pipeline type :
# for MRs. Concurrent ID is after the branch name, to ease old branches cleaning
GIT_CLONE_PATH: $CI_BUILDS_DIR/$CI_PROJECT_NAME/$CI_MERGE_REQUEST_TARGET_BRANCH_NAME/$CI_CONCURRENT_ID
# for long-lived branches (same path, different variable)
GIT_CLONE_PATH: $CI_BUILDS_DIR/$CI_PROJECT_NAME/$CI_COMMIT_REF_NAME/$CI_CONCURRENT_ID
# for tags (rare, no need for distinction)
GIT_CLONE_PATH: $CI_BUILDS_DIR/$CI_PROJECT_NAME/tags/$CI_CONCURRENT_ID
This can only be used when custom_build_dir is enabled in the runnerβs configuration.
Phase 5: Cache (~0s)
Downloading and extracting remote cache takes time. Even with local GitLab cache, you still have to zip/unzip the folder elsewhere. That's like packing and unpacking your suitcase every time you go to the kitchen.
Shared package managers folders
Shared package managers folders on the server will reuse downloaded assets without overhead. When local cache is warm, there are no download/unzip and no zip/upload.
variables:
NPM_CONFIG_CACHE: /cache/gitlab-runner/.npm
NUGET_PACKAGES: $CI_BUILDS_DIR/NuGetPackages
Note: For safety, it is still better to have different folders for dev/staging/prod.
Optional : do not clean unpacked assets
Before executing the scripts, GitLab deletes any past produced files. We can save precious seconds, even minutes, by not deleting them, and, most importantly, reuse them. This is especially useful for node_modules :
variables:
GIT_CLEAN_FLAGS: "-ffdx --exclude=**/node_modules/"
This works best for custom GIT_CLONE_PATH discussed earlier. This could lead to strange behavior in theory : we took the risk for feature branches, and never encountered problems for our 15 developers mono-repo.
Note: For safety, it is still better to clean for staging/prod environment.
Phase 6: Artifacts (none)
The fastest jobs in pipelines do not handle artifacts. We consider none here. Zero. Nada. The fastest artifact is the one that doesn't exist.
Phase 7: Script (~1s)
For the 3-second target, we're specifically measuring fast jobs like linting, formatting checks. Any standard job will naturally take longer. But hey, even our "slow" jobs appreciate having 2 seconds less overhead β that's 2 seconds of their life they'll never get back.
Phase 8: Termination (~0s)
At the end, the job handles artifacts and the cache. The fastest jobs produce no artifact and use local custom cache or none. The job exits so fast, it doesn't even say goodbye.
Final timing breakdown
Here's what the fully optimized timeline looks like:
β‘ Total: ~3 seconds per job (with sub-1s script)
The key insight: We've reduced overhead to ~2s, leaving all remaining time for actual work. Your CI is now faster than your npm start.
Real project results
Again, these aren't theoretical numbers: we experience this extreme speed on a daily basis.
Shell runner - 3 seconds
Docker runner - 13 seconds
Takeaway: If you need isolation, Docker is still very fast with these optimizations. But Shell executor is unbeatable for raw speed.
Example development pipeline - 1min45 for 20 jobs
Most jobs run in parallel on each stage. The pipeline spends minimal time on overhead and maximum time on actual work.
Wrapping up
Achieving 3-second jobs on a multi-million line codebase is possible with the right combination.
These techniques show that single-server Shell/Docker runners, when properly optimized, vastly outperform autoscaling solutions for typical development workflows. The local filesystem advantages are impossible to beat.
Not every job can be 3 secondsβbuilds and full test suites will always take longer. But for fast-feedback jobs, sub-10-second execution is absolutely achievable and dramatically improves developer experience. Your developers will wonder if the pipeline is broken... because it's too fast.
Illustrations generated locally by Draw Things using Flux.1 [Schnell] model
Further reading
This article was enhanced with the assistance of an AI language model to ensure clarity and accuracy in the content, as English is not my native language.








Top comments (2)
The rules-based change detection approach is exactly what made the biggest difference for us too β going from full pipeline runs on every push to targeted job execution cut our CI time by about 70%. The trickier part we've found is maintaining the rules correctly as the codebase evolves. One stale rule that misses a dependency and suddenly you're shipping broken builds with green CI. How are you handling rule validation as the monorepo grows?
Yes, this is tricky.
TLDR; we use
gitlab-ci-local --list-csvwith different scenarios, in a job, as a rules auto-testing mechanism.Another article is coming on this subject early may π