Benoit COUETIL 💫 for Zenika

Posted on Feb 19 • Edited on May 24

🦊 GitLab CI: Achieving 3-Second Jobs on Million-Line Codebases

#gitlab #devops #cicd #performance

Initial thoughts
Prerequisites: The right foundation
Classic optimizations first
Extreme optimizations: going beyond
Breaking down the 3-second job
Final timing breakdown
Real project results
Wrapping up
Further reading

Initial thoughts

Is it really possible to run GitLab CI jobs in just 3 seconds on a codebase with several million lines of code? The answer is yes, and this article will show you how.

After comparing different runner topologies in GitLab Runners: Which Topology for Fastest Job Execution?, we found that Shell and Docker executors offer the best potential for fast job execution. Now it's time to push those runners to their absolute limits with extreme optimizations.

This isn't about theoretical performance—these are real, production results achieved on actual projects. The 3-second example is our lightest job (a JIRA/MR synchronization check), and it consistently runs in ~3-5 seconds on our 1,500,000+ line mono-repo actively maintained by 15 developers. We have jobs at various fixed speeds: one at ~3-5s, some at a few seconds, others at ~2min for resource-heavy builds, and end-to-end tests at ~15min. The optimizations in this article apply to all jobs.

Time to hunt down every millisecond of overhead — no mercy. Because, as French president Macron said, "Pipeline sometimes is too slow".

Prerequisites: The right foundation

Before diving into extreme optimizations, we need the right infrastructure foundation. Based on our topology comparison, this means:

Infrastructure choice:

✅ Shell executor (fastest) or Docker executor (fast with isolation)
✅ Single well-provisioned server (not autoscaling)
✅ Local SSD storage (NVMe preferred)
✅ Sufficient CPU/RAM for concurrent jobs
✅ Fast network connection to GitLab instance

Why this matters: Every other topology adds fundamental latencies (VM provisioning, pod scheduling, remote cache, shared resources) that cannot be eliminated through configuration. Starting with the wrong topology means you've already lost.

Classic optimizations first

Before going extreme, apply standard GitLab CI optimizations from GitLab CI Optimization: 15+ Tips for Faster Pipelines.

But these optimizations alone won't get you to 3 seconds. They'll improve your average pipeline duration. To reach sub-10-second jobs, we need to dig deeper into GitLab runners arcanes.

Extreme optimizations: going beyond

Now we enter extreme optimization territory. These techniques are specific to Shell/Docker runners and exploit their local filesystem advantages.

Here's what we'll optimize to near-zero overhead:

Waiting time → Proper server sizing
Server provisioning → Already exists (no VM/pod creation)
OS/Container startup → Native shell (1s) or cached images
Git operations → Shallow fetches, reused local clones
Cache → Local filesystem, preserved directories
Artifacts → None in our fastest jobs
Script execution → Minimal in our fastest jobs
Termination → No cache/artifact uploads in our fastest jobs

Let's break down each phase and see how to optimize it.

Breaking down the 3-second job

Job execution on a well-sized shell runner

First, let's visualize what a well-optimized shell runner job timeline looks like before extreme optimizations:

Total: already fast on fastest jobs scripts. Now let's optimize each phase to the extreme.

Phase 1: Waiting (~0s)

The problem: Jobs queue when runners are saturated. Your developers are staring at a spinner. Somewhere, a PM is asking "is the pipeline stuck again?".

The solution : proper server sizing

Size the VM for peak concurrent load (not average)
CPU: 16 cores for a 15 developers team on a mono-repo. Not the smallest, but cheap comparing to salaries
Disk I/O: NVMe SSD essential for concurrent git/cache operations
Concurrency tuned (~16 concurrent jobs for 16 CPU)

⏱️ Result: ~0s waiting (when properly sized). No more "grab a coffee" excuses — the job finishes before you stand up.

Phase 2: Server (~0s)

Server already exists — shocking, right?

Shell runner: no provisioning at all
Docker runner: no VM creation, just container scheduling
Resources immediately available
No cloud API calls needed

This is a fundamental advantage of single-server topologies over autoscaling. While Kubernetes is busy scheduling pods, our job is already done.

Phase 3: OS / Shell (~1s)

Think of it this way: Shell executor is a barefoot sprinter, Docker is a sprinter with fancy shoes. Both are fast, but one has less to put on.

For Shell executor (fastest, ~1s):

No Docker image needed
No container startup
Direct shell execution
Native OS environment

For Docker executor (fast with isolation ~1-4s):

Pre-pull images to server
Use lightweight base images (alpine)
Layer caching on local Docker
Keep containers warm when possible

Phase 4: Git Clone/Fetch (~1s)

This is where the magic happens. Or rather, where the magic doesn't happen — because the best git operation is the one you barely do.

Strategy and depth

Obviously, to achieve 3s on a large codebase, a close version of the code must already be present locally. The default algorithm is a fetch (which is perfect), meaning that the runner will just reconstruct a few commits in a build directory that already has been used by a previous job, preferably with nearly same code but different commits.

The default is 20 commits reconstructed. In fact, you only need one most of the time (when not doing git stuff).

variables:
  GIT_DEPTH: 1 # default to 20

Fetch flags

The default fetch flags are --force --prune --tags. force and prune are very useful to handle git fetch problems and keep local repo size reasonable. You can experiment with these on short-lived runners, at your own risks. We tried this but had to step back for consistency, even on our daily runners.

But fetching tags is almost never a good idea, at least by default. Even though we heavily rely on tag pipelines, we still don't need to fetch any tag. Except for a job that uses older tags to extract versions; for it we manually fetch tags.

Optionally, we can define the refmap to fetch, avoiding unnecessary fetching.

Here are our flags for merge requests :

  GIT_FETCH_EXTRA_FLAGS: --force --prune --no-tags
    --refmap "+refs/merge-requests/${CI_MERGE_REQUEST_IID}/head:refs/remotes/origin/merge-requests/${CI_MERGE_REQUEST_IID}/head"

Note: add --verbose to show what's happening, and be sure to know what takes time and what not. We still have this flag because there only a few lines added when fetch is optimised, and it does not slow down the job.

Situational: tailored clone path

When long-lived branches have a fair amount of difference (the longer the branch and/or the higher the developer count, the more differences), fetch takes time. On our projects we have thousands commits of differences between long-lived branches at worse time of our cycle!

To keep a sub-second fetch time, we configure clone in paths depending on the target branch. And yes, MR pipelines secret mode is required for this.

So our clone path depend on the pipeline type :

  # for MRs. Concurrent ID is after the branch name, to ease old branches cleaning
  GIT_CLONE_PATH: $CI_BUILDS_DIR/$CI_PROJECT_NAME/$CI_MERGE_REQUEST_TARGET_BRANCH_NAME/$CI_CONCURRENT_ID
  # for long-lived branches (same path, different variable)
  GIT_CLONE_PATH: $CI_BUILDS_DIR/$CI_PROJECT_NAME/$CI_COMMIT_REF_NAME/$CI_CONCURRENT_ID
  # for tags (rare, no need for distinction)
  GIT_CLONE_PATH: $CI_BUILDS_DIR/$CI_PROJECT_NAME/tags/$CI_CONCURRENT_ID

This can only be used when custom_build_dir is enabled in the runner’s configuration.

Phase 5: Cache (~0s)

Downloading and extracting remote cache takes time. Even with local GitLab cache, you still have to zip/unzip the folder elsewhere. That's like packing and unpacking your suitcase every time you go to the kitchen.

Shared package managers folders

Shared package managers folders on the server will reuse downloaded assets without overhead. When local cache is warm, there are no download/unzip and no zip/upload.

variables:
  NPM_CONFIG_CACHE: /cache/gitlab-runner/.npm
  NUGET_PACKAGES: $CI_BUILDS_DIR/NuGetPackages

Note: For safety, it is still better to have different folders for dev/staging/prod.

Optional : do not clean unpacked assets

Before executing the scripts, GitLab deletes any past produced files. We can save precious seconds, even minutes, by not deleting them, and, most importantly, reuse them. This is especially useful for node_modules :

variables:
  GIT_CLEAN_FLAGS: "-ffdx --exclude=**/node_modules/"

This works best for custom GIT_CLONE_PATH discussed earlier. This could lead to strange behavior in theory : we took the risk for feature branches, and never encountered problems for our 15 developers mono-repo.

Note: For safety, it is still better to clean for staging/prod environment.

Phase 6: Artifacts (none)

The fastest jobs in pipelines do not handle artifacts. We consider none here. Zero. Nada. The fastest artifact is the one that doesn't exist.

Phase 7: Script (~1s)

For the 3-second target, we're specifically measuring fast jobs like linting, formatting checks. Any standard job will naturally take longer. But hey, even our "slow" jobs appreciate having 2 seconds less overhead — that's 2 seconds of their life they'll never get back.

Phase 8: Termination (~0s)

At the end, the job handles artifacts and the cache. The fastest jobs produce no artifact and use local custom cache or none. The job exits so fast, it doesn't even say goodbye.

Final timing breakdown

Here's what the fully optimized timeline looks like:

⚡ Total: ~3 seconds per job (with sub-1s script)

The key insight: We've reduced overhead to ~2s, leaving all remaining time for actual work. Your CI is now faster than your npm start.

Real project results

Again, these aren't theoretical numbers: we experience this extreme speed on a daily basis.

Shell runner - 3 seconds

Docker runner - 13 seconds

Takeaway: If you need isolation, Docker is still very fast with these optimizations. But Shell executor is unbeatable for raw speed.

Example development pipeline - 1min45 for 20 jobs

Most jobs run in parallel on each stage. The pipeline spends minimal time on overhead and maximum time on actual work.

Wrapping up

Achieving 3-second jobs on a multi-million line codebase is possible with the right combination.

These techniques show that single-server Shell/Docker runners, when properly optimized, vastly outperform autoscaling solutions for typical development workflows. The local filesystem advantages are impossible to beat.

Not every job can be 3 seconds—builds and full test suites will always take longer. But for fast-feedback jobs, sub-10-second execution is absolutely achievable and dramatically improves developer experience. Your developers will wonder if the pipeline is broken... because it's too fast.

Of course, sub-10-second jobs are only worth celebrating if the pipeline stays healthy over time — that's the subject of Keep Feeding Your CI/CD — Or Watch It Die.

Illustrations generated locally by Draw Things using Flux.1 [Schnell] model

DEV Community