Mustafa ERBAY

Posted on May 23 • Originally published at mustafaerbay.com.tr

Self-Hosted Runner vs SaaS: Which is More Cost-Effective?

#devops #cicd #infrastructure #selfhosted

When setting up CI/CD pipelines, almost every team faces a dilemma: Should we leave the infrastructure to the ready-made machines of SaaS providers, or should we set up a self-hosted runner system that runs on our own servers? While the thought of "I'll set it up on my own VPS, use unlimited minutes, and save on the bill" seems very attractive at first glance, the reality isn't always that rosy. In this guide, I'll examine the Self-Hosted Runner vs SaaS comparison not just based on server bills, but under the headings of hidden network costs, maintenance effort, disk management, and security, with my 20 years of field experience.

While working on a production ERP and deploying the backend services of my own side projects, I've used, broken, and optimized both methods countless times. Let's analyze which choice protects your wallet and your time in which scenario with concrete metrics.

1. Introduction to the Cost Equation: Hidden Bills and Hardware Amortization

Most developers and system administrators base their cost calculations on the free limits of SaaS platforms or their standard per-minute rates. For example, on a popular SaaS CI/CD platform, a standard 2 vCPU, 8 GB RAM machine costs around $0.008 per minute on average. A medium-sized team with 100 builds a day, each taking an average of 10 minutes, would have a monthly consumption of approximately 30,000 minutes. This directly translates to a $240 bill.

On the other hand, when you run your own runner on a VPS that costs $40 per month with 8 vCPUs and 16 GB of RAM, you might think you're saving $200 on paper. However, amortization, disk wear, idle CPU times, and most importantly, the human resource cost, which are not included in this calculation, completely change the equation. When you set up your own server, every second that machine is not running at 100% capacity means you're essentially wasting money.

SaaS Monthly Cost Formula:
Cost = Total Build Time (minutes) x Per-Minute Rate + Additional Storage/Network

Self-Hosted Monthly Cost Formula:
Cost = Server Rent + Network Egress (Data Out) + Disk Wear/Backup + Maintenance Effort (Engineer Hours x Hourly Rate)

Let me give you an example from my real-life experience: In a client project, we quickly switched to a self-hosted runner architecture because build times were increasing. Our monthly SaaS bill of $300 dropped to a server cost of $45. However, at the end of the first month, the disk on the runner machine filled up to 100% due to accumulated Docker cache, and deployment processes completely stopped for 4 hours. The cost of that 4-hour downtime to the company was far greater than what we would have paid to the SaaS system in a year.

2. CPU and Memory Consumption Analysis: SaaS Limits vs. Your Own Server

SaaS providers' standard machines typically use general-purpose shared processors. This can lead to performance limitations, known as "CPU throttling," especially during intensive compilation or testing processes. On your own bare-metal server or dedicated VPS infrastructure, you have full control over the processor power.

As I've mentioned before in [related: PostgreSQL performance optimization] processes, disk and CPU speed are critical for database operations or integration tests requiring intensive input/output (I/O). Disk I/O limits (IOPS) are generally quite low in SaaS environments. On your own server, you can utilize the power of NVMe drives to their fullest.

In the table below, I've compared the hardware and performance metrics of a standard SaaS runner and a $48/month self-hosted VPS runner:

Metric	Standard SaaS Runner	Self-Hosted VPS (Dedicated)
vCPU	2 Cores (Shared)	8 Cores (Dedicated)
RAM	7 GB - 8 GB	16 GB
Disk Type	Standard SSD (Low IOPS)	NVMe SSD (High IOPS)
Monthly Fixed Cost	$0 (Pay as you go)	$48.00 (Fixed)
Concurrent Builds	Limited (Depends on SaaS plan)	Unlimited (As hardware allows)

If your number of parallel builds is high, the bill on the SaaS side will increase exponentially. On your own server, you can run multiple runners on the same hardware, isolated using Docker Compose or systemd units. Of course, setting resource limits correctly is essential here. For example, if you don't set cgroup limits for a runner running with systemd, excessive memory consumption by one build process could lock up the entire server and cause OOM (Out Of Memory) kernel panics.

⚠️ cgroup Memory Limiting

When running self-hosted runners, to prevent a single build process from crashing the entire server, make sure to define MemoryHigh and MemoryMax limits in your systemd service file.

3. Network Bandwidth (Egress Traffic) and Storage Costs

Here's the hidden monster that is most often overlooked and can lead to warnings from the accounting department at the end of the month: Network Egress (data out) costs. SaaS platforms charge significant fees for every gigabyte of data that leaves their cloud network. If your build steps involve downloading (pull) or uploading (push) large Docker images, data transfer costs can quickly exceed the main server cost.

For example, let's say you push a 1.5 GB Docker image to an external registry with every build process. With 50 builds a day, this amounts to 75 GB per day, or approximately 2.2 TB of data transfer per month. Considering egress fees ranging from $0.08 to $0.12 per GB from many cloud providers, you'd face a monthly bill of $180 to $260 just for data transfer.

When you use a self-hosted runner, your server provider usually grants you a free traffic allowance of 10 TB to 20 TB per month. If you're running on your own local network (on-premise), this cost is almost zero.

# A cron script I use for daily disk cleanup and removing unused Docker images on a self-hosted runner machine:

#!/bin/bash
echo "=== Disk Cleanup Started: $(date) ==="
df -h /

# Clean up dangling images, containers, and volumes older than 24 hours
docker system prune -a --volumes --force --filter "until=24h"

echo "=== Disk Status After Cleanup ==="
df -h /

On a server where I didn't add the above script to run every night at 03:00 AM, the 120 GB NVMe disk filled up in just 3 weeks, and the entire deployment pipeline locked up. You don't have these disk cleanup worries with SaaS systems; each virtual machine starts from a clean image and is destroyed when its job is done.

4. Operations and Maintenance Effort (Ops Overhead): The Time Paradox

As a system administrator or software architect, our most valuable asset is our time. The "free" or "cheap" looking self-hosted systems come with a significant operations overhead. With SaaS systems, keeping the infrastructure running, operating system updates, security patches, and scaling are entirely the responsibility of the service provider. On your own server, all this burden is on your shoulders.

Last year, while performing an Ubuntu LTS version upgrade (do-release-upgrade) on a self-hosted runner server we were using, the Docker daemon's network socket structure changed. All pipelines stopped working at 9:00 AM. It took me a full 4 hours to diagnose the issue, resolve the docker-iptables conflict, and get the system back up and running.

Analysis of Time Lost:
- Issue detection and log analysis (journalctl -u docker -n 100): 45 minutes
- Cleaning iptables rules and re-establishing network bridges: 90 minutes
- Configuring and testing runner services (systemd): 60 minutes
- Total Team Loss: 4 developers x 4 hours = 16 man-hours lost.

If your team doesn't have a dedicated DevOps engineer solely responsible for this infrastructure, managing self-hosted runners starts to steal time from developers' primary job of code development. The cost of this stolen time is always higher than the few hundred dollars per month you'd pay to a SaaS platform.

5. Security and Isolation: CVE Risks and Network Hardening Costs

Security is the softest underbelly of the self-hosted runner architecture. In SaaS platforms, each build step runs in a completely isolated, ephemeral virtual machine or a protected container. If malicious code (supply chain attack) infiltrates your build script, it cannot permanently damage the system because the machine is destroyed once the build is finished.

However, the situation is very different with a self-hosted runner. If you run the runner directly on the main machine (non-containerized), any script running during the build process can gain full access to the server's operating system. Even worse, different projects' build steps running consecutively on the same runner can access each other's environment variables, SSH keys, or API tokens.

# /etc/systemd/system/github-runner.service
# Example of running the runner with restricted privileges for security

[Unit]
Description=GitHub Actions Runner
After=network.target

[Service]
ExecStart=/home/runner/run.sh
User=runner
Group=runner
WorkingDirectory=/home/runner
KillMode=process
Restart=always
RestartSec=5

# Basic security hardening
ProtectSystem=full
ProtectHome=true
NoNewPrivileges=true

[Install]
WantedBy=multi-user.target

As seen in the systemd configuration above, you should never run the runner with root privileges, and you must make operating system directories read-only using parameters like ProtectSystem=full. Furthermore, it's essential to isolate these machines from the company's main network segment on the network side.

If you don't implement these hardening measures, a CVE vulnerability (such as the recent troublesome kernel module exploits) that infiltrates during your testing processes could expose your entire internal network (VLAN) to attackers. The security engineering hours you spend to mitigate these risks are also directly added to your cost column.

6. Decision Matrix: When Should We Choose Which?

You can refer to the simple decision matrix I've prepared to determine which method is more economical for you. This matrix accounts for not only monetary cost but also operational sustainability.

When to Prefer SaaS?

If your team doesn't have a full-time system administrator or DevOps specialist.
If your monthly build time is under 10,000 minutes.
If your projects use standard libraries and don't require special hardware (GPU, high RAM, etc.).
If your security and isolation requirements are very strict, and you don't want to deal with infrastructure security.

When to Prefer Self-Hosted?

If you compile very large and monolithic projects (e.g., C++ compilations or massive Java packages) and build times take hours on SaaS.
If your build processes need to securely connect to databases or custom ERP services within your local network (on-premise) via VPN.
If your monthly build time exceeds 50,000 minutes and SaaS bills run into thousands of dollars.
If you use Docker registries or local package repositories (Nexus, Artifactory) within your own internal network (LAN) and require high bandwidth.

As I've seen in [related: enterprise software architecture] designs before, hybrid models are also very popular. Using SaaS runners for non-critical, fast-running tests while triggering limited-privilege self-hosted runners positioned within our own secure network for heavy integration tests and deployment steps offers the most sensible trade-off.

Conclusion

In conclusion, while using self-hosted runners might seem like a "free" or "very cheap" solution at first glance, hidden costs (disk fill-ups, network transfer fees, security vulnerabilities, and engineering hours spent on maintenance) often make it more expensive than SaaS solutions. When making infrastructure decisions, you need to account for the cost of your own time and your team's focus, not just the server rent.

In the next post, I'll explain how I reduced Docker image sizes and consequently network egress costs by 70%, with concrete Dockerfile examples.

DEV Community