DEV Community

Cover image for Deep dive: optimizing self-hosted GitHub Actions Runners on AWS and GCP for cost efficiency
Siddhant Khare
Siddhant Khare

Posted on

4 1 1 1

Deep dive: optimizing self-hosted GitHub Actions Runners on AWS and GCP for cost efficiency

Running self-hosted GitHub Actions runners on the cloud provides great control, but the costs can spiral if not optimized. In this post, I’ll share how we achieved 30% cost savings on AWS and how you can replicate similar strategies on GCP, with a touch of technical fun and actionable advice. Let's dive in!


Why Self-Hosted Runners?

GitHub Actions' default hosted runners are convenient but can be expensive for large workloads, especially for compute-intensive tasks like integration tests or builds. Self-hosted runners, deployed on AWS or GCP, offer:

  • Cost Control: Pay only for what you use.
  • Custom Environments: Tailored to specific workflows.
  • Scalability: Dynamically scale based on workload.

However, running self-hosted runners at scale comes with its own challenges: idle resources, inefficient configurations, and escalating network costs.


Architecture Overview

Here’s a high-level architecture for both AWS and GCP self-hosted runners:

high-level architecture


Challenges in Cost Management

  1. Idle Resources: Pre-provisioned runners waiting for jobs lead to unnecessary costs.
  2. Networking Overheads: High outbound traffic, especially for Docker pulls.
  3. Instance Type Selection: Choosing cost-effective and performant instance types.
  4. Preemption Risks: Spot instances (AWS) or preemptible VMs (GCP) can fail mid-job.

Optimization Strategies

1. Dynamic Scaling

Both AWS and GCP allow scaling instances based on demand.

AWS

  • Use Auto Scaling Groups (ASGs) with Lambda functions triggered by workflow_job webhooks.
  • Leverage tools like philips-labs/terraform-aws-github-runner to simplify management.

GCP

  • Use Managed Instance Groups (MIGs) with custom autoscaler policies based on job queue size or CPU load.
  • Cloud Functions or Cloud Run can handle scaling triggers.

Scaling Decision


2. Spot Instances (AWS) / Preemptible VMs (GCP)

These offer significant cost savings but require careful handling of preemptions.

AWS Spot Instances

  • Mix instance types in Spot Pools for better availability:
    • m5, m6i, m7i (Intel)
    • m5a, m6a (AMD)

GCP Preemptible VMs

  • Use diverse instance types:
    • e2-standard, n2-highmem, t2d-standard (AMD)
  • Jobs must checkpoint regularly to handle interruptions gracefully.

Pro Tip: Always have fallback capacity with on-demand instances or higher-priority pools for critical workloads.


3. Caching and Artifact Management

Networking Optimization

  • AWS: Implement S3-based caching with tools like actions/cache.
  • GCP: Use Cloud Storage or Artifact Registry for similar functionality.

Docker Pulls

  • Reduce Docker pull costs by:
    • Setting up a pull-through cache in GCP Artifact Registry or AWS ECR.
    • Using VPC endpoints (AWS) or private access (GCP) to minimize outbound traffic.

4. Cost Monitoring and Analysis

Both cloud providers offer tools to analyze costs:

  • AWS: Cost Explorer + CloudWatch for EC2 usage.
  • GCP: Billing Reports + Monitoring with Stackdriver.

Key Metrics to Watch:

  1. Idle instance time
  2. Spot/preemptible interruption rates
  3. Network egress traffic

Cost Breakdown


Case Study: AWS Optimization Outcomes

  • Idle Runners Reduced: Adjusted runner pools based on org activity.
  • Spot Pools Optimized: Added AMD-based m6a instances, reducing costs by 30%.
  • Networking Costs: Introduced Docker pull-through caching with S3.

Case Study: GCP Adaptation

  • Dynamic Scaling: Managed Instance Groups with preemptible VMs.
  • Networking: Switched to private Google Access for egress traffic.
  • Preemptible Instances: n2-highmem provided a balance of cost and performance.

Results

Cost reduction metrics

Cost reduction metrics

Cloud Provider Baseline Cost Optimized Cost Savings (%)
AWS $10,000 $7,000 30%
GCP $9,500 $6,500 31%

User Experience Improvements

  • Reduced job interruptions.
  • Faster job execution due to optimized runner configurations.

Future Opportunities

  1. IPv6 and NAT Gateway Optimization:
    • Both AWS and GCP support IPv6 to reduce NAT costs.
  2. Machine Learning for Scaling Decisions:
    • Use historical data to predict demand spikes.

Conclusion

Optimizing self-hosted GitHub Actions runners on AWS and GCP can save significant costs while improving performance. By dynamically scaling resources, leveraging spot/preemptible instances, and optimizing network usage, you can achieve a highly efficient setup tailored to your workloads.

Feel free to experiment with these strategies and share your results. Happy optimizing! 🚀

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

Billboard image

Create up to 10 Postgres Databases on Neon's free plan.

If you're starting a new project, Neon has got your databases covered. No credit cards. No trials. No getting in your way.

Try Neon for Free →