DEV Community

Pavel Gajvoronski
Pavel Gajvoronski

Posted on • Originally published at tracehawk.dev

Your Python SDK silently routes through macOS proxy

I spent two hours debugging a 503 error in our OTLP ingest endpoint. The server logs showed no incoming request. The SDK reported a connection refused. The endpoint was definitely running on localhost:3001. The bug wasn't in my code at all.

The problem

We're building TraceHawk — an observability platform for AI agents. Our Python SDK sends OpenTelemetry spans to a local ingest endpoint during development. The setup is straightforward: traceloop-sdk initializes an OTLPSpanExporter pointing at http://localhost:3001/api/otel/v1/traces.

It worked fine on day one. Stopped working on day two. No code changed.

urllib.error.URLError: <urlopen error [Errno 111] Connection refused>
Enter fullscreen mode Exit fullscreen mode

Except the server wasn't refusing connections. curl localhost:3001/api/health returned {"status":"ok"} immediately.

What we tried first

We assumed the exporter URL was wrong. We tried 127.0.0.1 instead of localhost. Same error. We checked that the Next.js dev server was actually running on 3001. It was. We restarted everything. No change.

Then we looked at the actual network request. Instead of going to localhost:3001, it was hitting 127.0.0.1:10809 — and getting a 503 from something called ClashX.

The cause

Python's urllib and requests respect the system proxy by default. On macOS, if you're running any proxy tool — Proxyman, Charles, ClashX, Little Snitch proxy rules, corporate VPNs — Python reads the macOS proxy settings from System Settings → Network → Proxies and routes ALL HTTP traffic through them.

Including traffic to localhost.

This is by design. Python trusts the system proxy config. The proxy tool intercepts localhost:3001, can't forward it anywhere meaningful, and returns a 503.

The kicker: your teammates will hit this too. Anyone on your team with a VPN client or proxy debug tool will see the same symptom. The error message (Connection refused or 503) looks like a server problem, not a proxy problem.

The fix

Two changes, both needed:

1. Set NO_PROXY before SDK initialization:

import os
os.environ.setdefault("NO_PROXY", "localhost,127.0.0.1")
os.environ.setdefault("no_proxy", "localhost,127.0.0.1")  # lowercase too — some libs check this

from tracehawk import init
init(api_key="...", endpoint="http://localhost:3001/api/otel/v1/traces")
Enter fullscreen mode Exit fullscreen mode

The setdefault pattern preserves any existing NO_PROXY the user has set — you're extending it, not overwriting it.

2. Disable proxy trust on the requests Session inside your exporter:

import requests

class AgentObserveExporter:
    def __init__(self, endpoint: str, api_key: str):
        self.endpoint = endpoint
        self.session = requests.Session()
        self.session.trust_env = False  # do NOT read system proxy
        self.session.headers.update({
            "Content-Type": "application/json",
            "x-api-key": api_key,
        })

    def export(self, spans):
        payload = self._serialize(spans)
        resp = self.session.post(self.endpoint, json=payload, timeout=5)
        return resp.status_code == 200
Enter fullscreen mode Exit fullscreen mode

trust_env = False tells requests to ignore HTTP_PROXY, HTTPS_PROXY, and the macOS system proxy entirely. This is the right default for an SDK exporter — you're shipping to a known endpoint, not making arbitrary HTTP requests.

Both fixes are needed because different parts of the Python HTTP stack check different things. NO_PROXY covers urllib-based paths (the default OTLP exporter uses urllib3 under the hood). trust_env = False covers direct requests.Session usage.

What we learned

  1. Python's proxy behavior is correct, not a bug. It's doing exactly what it should — honoring system configuration. The problem is that SDK authors rarely think about developer machines with proxy tools running.

  2. NO_PROXY needs both cases. Some Python HTTP libraries check NO_PROXY (uppercase), others check no_proxy (lowercase). Set both with setdefault to be safe.

  3. The error message is actively misleading. Connection refused looks like the server isn't running. A 503 looks like the server is broken. Neither points toward "proxy interception". Add a note to your SDK docs and README — it will save your users hours.

  4. trust_env = False is the right default for SDK exporters. An SDK sending telemetry to a fixed endpoint has no business routing through the user's system proxy. Make opt-in, not opt-out.

  5. This affects protobuf exporters too. The default OTLPSpanExporter from opentelemetry-exporter-otlp-proto-http uses requests internally. Same fix applies.

What's next

The right long-term fix is to check at SDK init time whether the target endpoint is local and warn if the system proxy would intercept it. Something like:

def _check_proxy_intercepts(endpoint: str) -> bool:
    from urllib.request import getproxies
    proxies = getproxies()
    no_proxy = os.environ.get("NO_PROXY", os.environ.get("no_proxy", ""))
    # check if endpoint hostname is in no_proxy list
    ...
Enter fullscreen mode Exit fullscreen mode

We haven't built this yet. It's a quality-of-life improvement that would make the error message actually useful instead of baffling.

Over to you

  • How do you handle proxy-aware HTTP clients in your SDKs — do you always disable proxy trust for telemetry/internal traffic?
  • Has anyone built a "dev environment sanity checker" that catches things like proxy interception, port conflicts, and stale DNS before devs waste time on them?
  • What's the weirdest "the bug is in my dev environment, not my code" moment you've had?

Top comments (0)