DEV Community: h6o

When You Use a PGA Subnet on Cloud Run, Traffic to Google APIs Is Treated as Internal

h6o — Wed, 10 Jun 2026 00:00:21 +0000

At work, I tried to put a source-IP restriction on a certain Google API key and ran into a phenomenon where the setting just wouldn't take effect no matter what I did. Tracing the cause led me to Cloud Run's network path — specifically, the behavior of Private Google Access (PGA).

Since other people are likely to hit the same thing, I'm writing this down as a memo for myself. I hope it helps anyone in a similar situation.

What happened

What I wanted to do was put a source-IP restriction on a Google API key, to limit the damage in case of a leak. I figured "I'll just allow-list Cloud Run's egress IP," but the setting wouldn't block anything. Cloud NAT's static IP was on the allow list, and yet for some reason it didn't take effect.

Cutting to the conclusion: on a subnet with PGA enabled, traffic destined for Google APIs was not going through Cloud NAT — that was the cause.

What is Private Google Access (PGA)?

PGA (Private Google Access) is a mechanism that lets VMs and services that don't have an external IP reach Google APIs like *.googleapis.com directly through an internal path within the VPC. The benefit is that you can reach Google services without going out to the internet, and it's enabled per subnet.

In Terraform it's enabled with a single line on the subnet.

resource "google_compute_subnetwork" "egress" {
  name                     = "sb-egress-example"
  private_ip_google_access = true   # ← this is PGA
  # ...
}

And Cloud Run attaches to this subnet via Direct VPC Egress.

metadata:
  annotations:
    run.googleapis.com/network-interfaces: '[{"network":"...","subnetwork":".../sb-egress-example"}]'
    run.googleapis.com/vpc-access-egress: all-traffic

vpc-access-egress: all-traffic is the setting that routes all egress through the VPC. So far this looks like a very common configuration.

Traffic to Google APIs does not go through NAT

The important point is the path that Google API-bound traffic takes on a PGA-enabled subnet. As a diagram, it branches like this.

[When PGA is enabled]

Cloud Run ──┬─ *.googleapis.com bound ──▶ goes directly to Google via the internal network (PGA)
            │                              * does not go through Cloud NAT
            │                              * source is an internal IP
            │
            └─ Other internet-bound traffic ──▶ Cloud NAT ──▶ NAT's static external IP

In other words, this is what was happening:

General internet-bound traffic goes through Cloud NAT, so the source becomes NAT's static external IP
However, *.googleapis.com-bound traffic takes the PGA internal path, so it does not go through NAT
As a result, the source as seen from the Google API side is not the NAT external IP but an internal IP

API-key IP restrictions are something that, by their nature, only accept public IPs. So for PGA-routed requests coming in with an internal IP, there was simply no way for an IP restriction to apply. "Can't pin it to an external IP" — this is what that meant.

This behavior is also explicitly stated in the official Cloud NAT documentation.

Note: Traffic sent to Google APIs and services are routed through Private Google Access even if the VM instance initiating the connections uses Public NAT. For more information, see Private Google Access interaction.

― Cloud NAT overview

So it's clearly written as the specification that "even if you're using Public NAT, traffic to Google APIs alone flows via PGA."

For reference, the Cloud NAT side is configured to allocate a static external IP.

resource "google_compute_router_nat" "nat" {
  name                                = "nat-example"
  nat_ip_allocate_option              = "MANUAL_ONLY"   # manually allocate a static IP
  nat_ips                             = [google_compute_address.nat.self_link]
  source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
}

If traffic egresses via NAT, the source becomes this static IP — but the pitfall is that as long as PGA is enabled, Google API-bound traffic alone does not go through this NAT.

Countermeasure: disable PGA if you want IP restrictions to take effect

If you want to pin the source IP toward Google APIs so that API-key IP restrictions take effect, disable PGA on that subnet.

resource "google_compute_subnetwork" "egress" {
  private_ip_google_access = false   # disable PGA
}

With PGA disabled, traffic to Google APIs also goes through Cloud NAT and out to Google from the internet side. The source becomes NAT's static external IP, so finally — in principle — the API-key IP restriction starts working.

You can confirm that the path has switched to NAT by checking Cloud NAT's flow logs for whether *.googleapis.com-bound entries (Google's ASN 15169) appear. When PGA is enabled, Google-bound traffic does not appear here (because it doesn't go through NAT).

Which one to choose

Summarized, the differences are as follows.

Aspect	PGA enabled	PGA disabled (via NAT)
Reach to Google APIs	Directly via internal nw	Via the internet (NAT)
Source IP	Internal IP (not static)	NAT's static external IP
API-key IP restriction	Does not work	Works
Exposure to the internet	Low	Egresses via NAT

Looking purely from a security angle, PGA — which "doesn't go to the internet" — looks preferable. But if you want to apply a separate guard like API-key IP restriction, you may have to deliberately choose to go via NAT. That was the lesson this time.

Wrapping up

When you can't pin the source IP for Google API-bound traffic on Cloud Run, a good first thing to suspect is whether PGA is enabled or disabled. I hope this helps anyone stuck in the same place.

Securely Exposing a Stateful MCP Server on Cloud Run (n8n Playwright MCP Example)

h6o — Tue, 09 Jun 2026 23:52:58 +0000

TL;DR

I wanted to operate pages that require Google login from n8n via Playwright MCP
The sidecar approach is easy, but has gaps from the perspectives of authentication and team isolation
I built defense-in-depth with ingress: internal + IAM (roles/run.invoker) + service-to-service auth via ID tokens + a Go auth-proxy + Secret Manager
For stateful MCP, set maxScale=1 to stop scale-out and prevent sessions from jumping to another instance

Intended Audience

People who want to run an MCP server on Cloud Run
People who want to automate operations on pages that require Google login using Playwright MCP
People who share n8n across multiple teams and want to handle pages requiring per-team Google logins via Playwright MCP
People who want to set up Cloud Run service-to-service authentication (ID tokens + IAM) in a practical way

Background

The starting point was: I wanted to operate and capture pages that require Google login, like Looker Studio, from n8n workflows. Playwright MCP looked like it could make this work, so I tried it. But once I tried to put it into operation, I ran into the following challenges.

Since n8n is shared across multiple teams, I want to switch login states per team account
I don’t want a Playwright MCP endpoint that "anyone can hit" in the first place

The Problem: Gaps in the Sidecar Setup

The first thing that comes to mind is running playwright-mcp as a sidecar in the same Cloud Run instance as n8n. It's easy, but it has gaps with respect to the challenges above.

[Before: dangerous setup]
┌─ Cloud Run instance ─────────────────────────┐
│  n8n (port 5678)                             │
│        │ localhost:3000 (no auth required)   │
│        ▼                                     │
│  playwright-mcp (port 3000)                  │
│        ※ holds Google-logged-in session      │
└──────────────────────────────────────────────┘

Since it's a sidecar, playwright-mcp isn't visible from outside. However, n8n inside the same instance can hit localhost:3000 without any authentication.

Because playwright-mcp is holding the Google-logged-in state (Storage State):

Anyone who can build n8n workflows can use the shared login credentials
It can't accommodate use cases where each team needs a different account

That's the result. Removing the sidecar and splitting it into a separate Cloud Run service decouples them, but just splitting it leaves "should it be exposed to the internet or only inside the VPC?" and "how do we authenticate?" up in the air.

Solution Architecture

In the end, I adopted the following setup.

[After: defense-in-depth setup]

n8n (Cloud Run, ingress=internal)
 │  Mcp-Auth-Key: <per-team API key>
 │  Path: /playwright-mcp-team-a/...
 │
 │  ※ n8n has vpc-access-egress=all-traffic, so traffic
 │    is routed to internal Cloud Run via the VPC
 ▼
auth-proxy (Cloud Run, ingress=internal)
 │  - Verifies Mcp-Auth-Key
 │  - Picks the backend by the first URL segment
 │  - Attaches its own service account's ID token and forwards
 ▼
playwright-mcp-team-a (Cloud Run, ingress=internal, maxScale=1)
   - IAM: roles/run.invoker is granted only to the auth-proxy SA
   - On receipt, Cloud Run verifies the ID token
   - Storage State is mounted via Secret Manager

There are 4 defensive layers. They are gates stacked in series, and if any one of them is breached, the meaning of the remaining layers weakens.

Layer	Role
Network	Block direct access from outside with `ingress: internal`
IAM	Grant `roles/run.invoker` only to the auth-proxy SA, excluding other principals
Service-to-svc auth	The auth-proxy presents a Google-signed ID token in the Authorization header
Application	The auth-proxy verifies the `Mcp-Auth-Key` and routes to the backend for each team

I'll cover how the ID token and IAM mesh together in detail in the service-to-service authentication section below.

Persisting Google Login State: storage-state

Playwright has a --storage-state option that lets you save and reuse logged-in session info (cookies and localStorage) as a file. Storing this in Secret Manager and mounting it as a Cloud Run volume lets you keep the login state even after a cold start.

volumes:
  - name: playwright-storage-state
    secret:
      secretName: PLAYWRIGHT_STORAGE_STATE
      items:
        - key: "1"
          path: storage-state.json
containers:
  - name: playwright-mcp
    args:
      - "--storage-state=/etc/playwright/storage-state.json"
    volumeMounts:
      - name: playwright-storage-state
        mountPath: /etc/playwright
        readOnly: true

When you want to update the login state, just log in again on another machine and register the new storage-state.json as a new version in Secret Manager. Restarting the service will pick it up.

Implementing mcp-auth-proxy (Go)

I implemented a lightweight service handling authentication and reverse proxying in Go. The reverse-proxy foundation is just the standard library's net/http/httputil.ReverseProxy, and I only add google.golang.org/api/idtoken for getting ID tokens.

Backend List

This is the only place you touch when adding a new team.

var routeSpecs = []struct {
    PathID        string
    APIKeyEnv     string
    BackendURLEnv string
}{
    {
        PathID:        "playwright-mcp-team-a",
        APIKeyEnv:     "TEAM_A_PLAYWRIGHT_MCP_KEY",
        BackendURLEnv: "TEAM_A_PLAYWRIGHT_MCP_URL",
    },
    // To add team B, add one element here
}

Authentication

It's a simple mechanism that just compares the value of the Mcp-Auth-Key header to an environment variable's key. subtle.ConstantTimeCompare is used to avoid timing attacks.

const authHeader = "Mcp-Auth-Key"

func authenticate(r *http.Request, routes map[string]*route) (string, *route, bool) {
    providedKey := r.Header.Get(authHeader)
    if providedKey == "" {
        return "", nil, false
    }

    // Pick the backend by the first segment of the URL
    routeID := strings.TrimPrefix(r.URL.Path, "/")
    if i := strings.IndexByte(routeID, '/'); i > 0 {
        routeID = routeID[:i]
    }

    matched, found := routes[routeID]
    if !found {
        return routeID, nil, false
    }

    if subtle.ConstantTimeCompare([]byte(matched.apiKey), []byte(providedKey)) != 1 {
        return routeID, nil, false
    }

    return routeID, matched, true
}

How Cloud Run Service-to-Service Authentication Works

When the auth-proxy calls a backend Cloud Run service, it uses Cloud Run service-to-service authentication. This is a two-stage mechanism: "the caller proves who they are with a Google-signed ID token, and the receiving Cloud Run service compares it against the IAM policy to decide whether to admit it."

[auth-proxy SA] ── Authorization: Bearer <ID token (aud=backend URL)> ──▶ [Cloud Run frontend]
                                                                            ① Verify ID token
                                                                            ② Check via IAM whether
                                                                               the issuing principal
                                                                               has roles/run.invoker
                                                                            ③ OK → route to container
                                                                               NG → return 403

A common misunderstanding here is that "as long as you send an ID token, you can call it" is not true. The actual decision lives on the IAM side. The ID token is an ID proving "who is calling," and "whether to let that identity in" is determined by who has been granted roles/run.invoker.

Grantee of `roles/run.invoker`	Behavior
`allUsers`	Anyone can call it without an ID token (open to the internet)
A specific service account	Only ID tokens issued by that SA can call it
Not granted	No one can call it

This time, I grant roles/run.invoker only to the auth-proxy's service account. With ingress: internal blocking direct external access, IAM also blocks direct hits from other services inside the VPC.

What Happens If You Set `allUsers`

A common antipattern is "it doesn't work, so I'll just grant invoker to allUsers." If you do that:

Even with ingress: internal left in place, any resource inside the VPC can hit it without an ID token
If you have ingress: all, anyone on the internet can hit it without an ID token

In other words, playwright-mcp effectively becomes a wild API. Since the Storage State carries Google-logged-in credentials, the damage isn't limited to data leakage—it can extend to all resources operable with that account's permissions. It's appropriate to keep checking grants of roles/run.invoker constantly during implementation.

Where ID Tokens Come From and How They're Refreshed

The official documentation lists multiple retrieval paths, but when calling Cloud Run from a service running on Cloud Run, in practice the source is consolidated into a single metadata server.

Querying the metadata server (http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/identity?audience=...) returns an ID token for the service account bound to the instance
The token's lifetime is about 1 hour, and you need to fetch a new one before it expires
Google's official auth libraries (in Go, google.golang.org/api/idtoken) hit the same metadata server internally and handle retrieval, caching, and refresh for you

The other options listed in the official docs (Workload Identity Federation and downloaded service account keys) are mechanisms for calling Cloud Run "from outside Google Cloud." In our case, where we run on Cloud Run, the metadata server is directly usable, so there's no reason to adopt them. Distributing SA keys as files in particular brings in the separate operational headache of key storage and rotation, which is even more reason to avoid it.

In implementation terms, the choice boils down to "hit the metadata server yourself" or "delegate to the auth library," but there's not much reason to choose the former. Including token-expiration handling and cache consistency under concurrent requests, leaning on the library results in fewer accidents.

Caller Code

On the calling side, you create a client by passing the audience (the receiving service's origin URL) to idtoken.NewClient. Specify https://<service>.run.app of the destination Cloud Run as audience. This is the value placed in the ID token's aud claim, which the receiving Cloud Run uses to determine "is this token addressed to me?"

client, _ := idtoken.NewClient(ctx, audience) // audience = "https://<backend>.run.app"
prefix := "/" + spec.PathID                    // e.g. "/playwright-mcp-team-a"

proxy := &httputil.ReverseProxy{
    Director: func(r *http.Request) {
        r.URL.Scheme = backendURL.Scheme
        r.URL.Host = backendURL.Host
        r.Host = backendURL.Host
        r.URL.Path = strings.TrimPrefix(r.URL.Path, prefix)
        r.Header.Del(authHeader) // don't forward the API key to the backend
    },
    Transport: client.Transport, // automatically attaches and refreshes ID tokens
}

The key is passing client.Transport to ReverseProxy.Transport. With just this, every request the auth-proxy relays automatically gets an ID token (fetched from the metadata server) attached and refreshed. ReverseProxy can also pass through long-lived streaming responses like SSE as-is, so it pairs well with Streamable HTTP MCP.

A Stateful Caveat: `maxScale=1`

Authentication is now plugged, but Playwright MCP also has the operational constraint that it fundamentally can't scale out.

Why Streamable HTTP Is Stateful

The transport currently recommended for MCP is Streamable HTTP. To make sense of why this is stateful, you need to grasp two things: "the difference from a regular POST" and "what MCP is actually exchanging."

The Difference Between a Regular POST and SSE

Roughly speaking:

A regular HTTP POST is "exchanging letters." The client sends one letter, the server writes one reply, and that's it.
SSE (Server-Sent Events) is "a phone call." Once connected, the server can speak as many times as it wants, whenever it wants. The line stays open.

For example, consider asking Playwright MCP to "take a screenshot of this page." The internal processing is "navigate to page → wait for load → scroll → capture → encode," which takes a fair amount of time.

With a regular POST, what the client sees is something like:

client ──"take a shot"──▶ server
(10 seconds pass; nothing happens)
client ◀──"here's your shot (image data)"── server

Until the entire body is complete, nothing reaches the client. Meanwhile, you can't even tell whether it's "dead or working," so it's not suited for long-running jobs.

With SSE, the same processing looks like:

client ──"take a shot"──▶ server
client ◀──"navigated to page"── server    (connection still open)
client ◀──"waiting for load"── server
client ◀──"scrolled"── server
client ◀──"here's your shot (image data)"── server
(server closes here)

The actual response body is Content-Type: text/event-stream, with text appended bit by bit, like this:

data: {"jsonrpc":"2.0","method":"notifications/progress","params":{"progress":30}}

data: {"jsonrpc":"2.0","method":"notifications/progress","params":{"progress":70}}

data: {"jsonrpc":"2.0","id":1,"result":{"image":"..."}}

A data: line plus one blank line is the boundary for one message. The client can process each message incrementally as it reads the response body.

MCP's spec says "if the response is short, you may return regular application/json" and "if you want to return multiple messages, you may use SSE," and the server switches based on the situation.

What Sessions Are For

That covers "one request's worth," but there's another concept one level above between MCP clients and servers: the session. The reason is that MCP itself is a stateful protocol. Specifically:

When a connection is opened, the client first sends initialize, negotiating each side's capabilities. It's here that "what tools this server has" and "what notifications it supports" are determined and assumed thereafter
The subscribe state of resources (like "notify me when this file changes") is also remembered by the server

The ID that links these states to "which client they belong to" is the Mcp-Session-Id header. The server issues it in the initialize response, and the client includes the same value in every subsequent request. It’s easier to picture as a cookie translated into an HTTP header.

What Playwright MCP Carries

A Playwright MCP session, on top of the MCP protocol state above, is tied to a live Chromium process + open pages + cookies + ongoing operations. These are live process state stuck to a particular Cloud Run instance's memory and OS resources, so transferring them to another instance isn't realistic.

In other words, it's not the kind of thing where "if you save the session ID somewhere, another instance can pick up where you left off," which is the key point.

Compatibility With Cloud Run Scaling

Cloud Run grows instances based on request count. If a Streamable HTTP client's second-or-later request lands on a different instance, of course no session exists there, and it fails with Session not found.

The most reliable countermeasure is not to grow the instances.

metadata:
  annotations:
    autoscaling.knative.dev/maxScale: "1"
    autoscaling.knative.dev/minScale: "0" # collapse to zero when not in use

For internal batch use cases or low-headcount interactive use cases, one instance is usually enough. Leaving minScale=0 keeps cost down to just cold-start requests.

Note: the auth-proxy itself is stateless and could scale out, but in this case the caller is limited to a single n8n service and traffic is light, so I match it with maxScale=1.

"Why Not Session Affinity?"

You might think, "Rather than fixing the instance, can't we just stick the same client to the same instance?" Cloud Run does have session affinity, and it looks like it could work. But it doesn't help Playwright MCP. There are two reasons.

Affinity is best-effort, and doesn't keep instances alive. The official docs explicitly say "do not use it to store server-side session data that needs to persist across requests and cannot easily be reconstructed." Affinity breaks at any of: scale-in, max concurrency, or CPU limits, and at that moment you lose the live Chromium process along with it. A session that holds "state that can't be reconstructed"—exactly our case—is the very use case the official docs name and recommend against.
The identification paths don't line up. Cloud Run affinity identifies clients via a proprietary cookie issued by the GCLB, but MCP's session identifier is the Mcp-Session-Id header. The two are unrelated, and there's no guarantee an MCP client retains and sends back that cookie.

The conventional approach to surviving scale-out is "offload state to an external store like Redis, and keep instances themselves stateless." MCP's protocol state (capabilities and subscribe state) can be externalized this way, but a live Chromium process + open pages + ongoing operations isn't the kind of thing you can serialize and offload. Affinity is an optimization for apps that can "rebuild state when broken," and for Playwright MCP—where the live process itself is the session—it doesn't act as a fix, only as an optimization.

In the end, since you can't externalize the state, the only sure move is not to grow instances. When you need to scale, expand by "adding services per team" rather than "adding more replicas of one service."

Summary

Issue	Solution
Sidecar is hit straight through from n8n	Switch to going via auth-proxy and authenticate with `Mcp-Auth-Key`
Direct reach from the internet	Eliminate external entry points with `ingress: internal`
Direct hits from other services in the VPC	Grant `roles/run.invoker` only to the auth-proxy SA
Identity proof for service-to-service traffic	Auto-attach and auto-refresh ID tokens with `idtoken.NewClient`
Session isolation between teams	Route by the first URL segment and per-team API keys
Maintaining Google login state	Mount Storage State via Secret Manager
Stateful and unscalable	Pin instances with `maxScale=1` (scale per team)

When you plug things at the four layers of network, IAM, service-to-service auth, and application, attack surfaces—each of which can't stand on its own—are eliminated together. This setup should work as a general-purpose pattern for safely exposing stateful MCP servers on Cloud Run, not just Playwright MCP. I hope it's useful for teams that want to expose an MCP server internally but are unsure how to wire up authentication.

Automatically Merge Dependabot Patch Updates with GitHub Actions

h6o — Wed, 03 Dec 2025 00:25:28 +0000

Introduction

Dependabot automatically detects dependency updates and creates pull requests, but manually merging each one can be tedious.

Patch updates (security fixes and bug fixes) typically have limited impact, making them safe candidates for automatic merging.

This article explains how to implement a GitHub Actions workflow that automatically merges Dependabot patch updates.

Workflow Overview

The following workflow automatically merges only patch updates (version-update:semver-patch) from Dependabot pull requests:

name: Dependabot auto-merge

on:
  pull_request_target:
    types:
      - opened
      - synchronize
      - reopened
      - ready_for_review

permissions: {}

defaults:
  run:
    shell: bash

jobs:
  dependabot:
    runs-on: ubuntu-24.04
    if: github.event.pull_request.user.login == 'dependabot[bot]'
    permissions:
      contents: write
      pull-requests: write
    steps:
      - name: Fetch Dependabot metadata
        id: metadata
        uses: dependabot/fetch-metadata@v2
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}

      - name: Auto-merge Dependabot patch updates
        if: steps.metadata.outputs.update-type == 'version-update:semver-patch'
        run: gh pr merge --merge --auto "$PR_URL"
        env:
          PR_URL: ${{ github.event.pull_request.html_url }}
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Detailed Explanation of Each Step

Trigger Configuration

on:
  pull_request_target:
    types:
      - opened
      - synchronize
      - reopened
      - ready_for_review

pull_request_target: Runs in the context of the branch where the pull request was created. This allows proper access to Dependabot's pull requests with the necessary permissions.
opened: When a pull request is created
synchronize: When new commits are pushed to the pull request
reopened: When a closed pull request is reopened
ready_for_review: When a draft pull request becomes ready for review

Job Condition

if: github.event.pull_request.user.login == 'dependabot[bot]'

This condition ensures the job only runs for pull requests created by Dependabot. It prevents accidental automatic merging of pull requests created by other users.

Permission Settings

permissions:
  contents: write
  pull-requests: write

contents: write: Write access to the repository (required for merging)
pull-requests: write: Pull request operation permissions (required for merging)

Step 1: Fetch Dependabot Metadata

- name: Fetch Dependabot metadata
  id: metadata
  uses: dependabot/fetch-metadata@v2
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}

The dependabot/fetch-metadata@v2 action retrieves metadata about Dependabot's pull request. This action outputs information such as:

update-type: Type of update (version-update:semver-patch, version-update:semver-minor, version-update:semver-major, etc.)
dependency-names: Names of the dependencies being updated
directory: Directory where the update occurred

Step 2: Auto-merge Patch Updates

- name: Auto-merge Dependabot patch updates
  if: steps.metadata.outputs.update-type == 'version-update:semver-patch'
  run: gh pr merge --merge --auto "$PR_URL"
  env:
    PR_URL: ${{ github.event.pull_request.html_url }}
    GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

if condition: Only executes when the update type is a patch update (version-update:semver-patch)
gh pr merge --merge --auto: Uses GitHub CLI to merge the pull request
- --merge: Creates a merge commit to merge
- --auto: Automatically merges once all checks pass

Setup Instructions

1. Create the Workflow File

Save the workflow above in .github/workflows/dependabot-auto-merge.yml.

2. Verify Dependabot Configuration

Ensure Dependabot is enabled in dependabot.yml or in your GitHub repository settings.

Notes and Best Practices

Why Only Auto-merge Patch Updates?

Patch updates (1.0.0 → 1.0.1): Bug fixes and security patches. Safe to auto-merge as they don't contain breaking changes
Minor updates (1.0.0 → 1.1.0): New features added. May have broader impact, so review is recommended
Major updates (1.0.0 → 2.0.0): Likely to contain breaking changes. Manual review is essential

Conclusion

By implementing this workflow, you can automatically merge Dependabot patch updates and quickly apply security patches and bug fixes. Patch updates typically don't contain breaking changes, making them safe for automatic merging.

However, we recommend adjusting the auto-merge conditions based on your project's characteristics and team policies. Consider customizing the workflow for critical dependencies by requiring manual reviews or adding additional checks.

3 ways to speed up CI [GitHub Actions] that you can do immediately!

h6o — Thu, 26 Dec 2024 23:02:54 +0000

For those who are frustrated by slow CI execution.

Here are three ways to speed up CI execution with GitHub Actions.

Three ways to speed up CI [GitHub Actions].

The following three methods are introduced in this article.

Split the Job
Adding package cache processing
Split tests and run them in parallel

Split a Job

Jobs can be split so that each job runs in parallel.

For example, the execution of a unit test and the execution of a Linter can often run independently.

It would be more efficient to describe them in separate Jobs, rather than in series in a single Job.

jobs:.
  test:.
    runs-on: ubuntu-22.04
    steps:.
    ...

  lint: ...
    runs-on: ubuntu-22.04
    steps: ...
    ...

Add package caching process.

Packages are recommended to be cached to skip the time-consuming package installation process.

Use the official actions/cache to implement the cache process.

In the following cases, npm ci will only be executed if there is a change in the OS, Node version or the file that manages package information (package-lock.json), otherwise the cache will be used.

- name: cache and restore packages
  id: cache-npm
  uses: actions/cache@v4.0.2
  with: node_modules
    path: node_modules
    key: ${{ runner.os }}-${{ steps.tool_versions.outputs.nodejs }}-${{ hashFiles(‘**/package-lock.json’) }}

- name: install npm packages
  if: steps.cache-npm.outputs.cache-hit ! = ‘true’
  run: npm ci
  shell: bash

Split and run tests in parallel

If your tests take a long time to run, you can speed up the process by dividing the tests and running each one in parallel.

For example, in the case of Jest, you can use the matrix strategy and the command option --shard. The matrix strategy is a simple and easy way to split up tests and run them in parallel.

The matrix strategy is a method to run a Job for each value defined in a variable within a single Job, and --shard is an option to split tests.

Using these, you can define a workflow like the following.

jobs:
  test:
    runs-on: ubuntu-22.04
    strategy:
      matrix:
        shard: [1/4, 2/4, 3/4, 4/4]
    steps:
      - name: checkout
        uses: actions/checkout@v3

      - name: setup environment
        uses: ./.github/actions/setup

      - name: run test
        run: npx jest --ci --shard=${{ matrix.shard }}

This will run 4 Jobs in parallel, each running a quarter of the tests.

I don't know if there are other options like --shard besides Jest, but the idea itself can be applied to any language.

There are other ways.

The following three methods were introduced as easy ways to improve the speed of CI.

Split the Job
Add package cache processing
Split tests and run them in parallel

However, in addition to these, you can also use larger runner and running tests only in areas where changes have been made, there are many other ways to improve speed.

It is recommended to improve the speed little by little to the extent possible, recognising the time and financial resources available.