DEV Community

Cover image for NAT Traversal: How It Works
alakkadshaw
alakkadshaw

Posted on

NAT Traversal: How It Works

NAT traversal is the set of techniques that solves this problem: discovering public addresses, punching holes through NATs, and relaying traffic when all else fails.

This guide covers NAT traversal from first principles through production implementation. You'll learn how NATs break peer-to-peer connections, why STUN/TURN/ICE work together, why CGNAT is making the problem worse, and how to troubleshoot connection failures in production.

Whether you're debugging ICE candidates at 11 PM or architecting a new real-time communication product, this is the reference you'll want bookmarked.

What is NAT and why does it break peer-to-peer connections?

Network Address Translation (NAT) was designed to solve a practical problem: IPv4 only provides about 4.3 billion addresses, and the internet ran out of new allocations years ago. NAT lets multiple devices on a private network share a single public IP address.

Your laptop, phone, and smart speaker all get private addresses (like 192.168.1.x), and your router translates those to its single public IP when packets leave for the internet.

Here's how it works. When your device at 192.168.1.50:12345 sends a packet to an external server at 203.0.113.1:443, the NAT router rewrites the source address to its own public IP and assigns a new source port -- say 198.51.100.1:54321. It stores this mapping in a translation table.

When the server responds to 198.51.100.1:54321, the NAT looks up the mapping and forwards the packet back to 192.168.1.50:12345.

From the server's perspective, it's talking to the router. From your device's perspective, NAT is invisible.

This works well for client-server communication. The problem starts when two devices behind separate NATs try to talk directly to each other -- the exact scenario WebRTC needs for peer-to-peer calls.

Neither device knows the other's private address. Even if they did, private addresses aren't routable on the public internet.

And even if Device A somehow learns Device B's public address and port, the NAT in front of Device B will drop the incoming packet because no prior outbound packet created a mapping for that connection. The NAT has no translation table entry, so the packet is silently discarded.

This is the core NAT traversal problem: both sides need to send packets to create NAT mappings, but neither side can receive packets until a mapping exists.

Understanding NAT types and their impact on connectivity

Not all NATs behave the same way. The type of NAT a device sits behind determines whether direct peer-to-peer connections are possible.

Understanding these differences is critical for predicting connection success rates in your WebRTC application.

The classic classification (and why it's incomplete)

The original NAT classification from RFC 3489 (2003) defines four types:

  • Full Cone NAT: Once a mapping is created (internal IP:port to external IP:port), any external host can send packets to that external address. The most permissive type.
  • Address-Restricted Cone NAT: Only external hosts that the internal device has previously sent a packet to (by IP) can send packets back through the mapping.
  • Port-Restricted Cone NAT: Same as address-restricted, but also restricted by port. The external host must match both the IP and port the internal device previously contacted.
  • Symmetric NAT: A different external port mapping is created for each unique destination. A packet sent to Server A gets external port 54321, while a packet to Server B gets external port 54322. This is the most restrictive type and the hardest to traverse.

You'll still see this classification everywhere. It's useful for building intuition, but it has a significant limitation: it conflates two independent behaviors.

The modern classification (RFC 4787)

RFC 4787 introduced a more precise framework by separating NAT behavior into two independent dimensions:

Mapping behavior -- how the NAT assigns external ports:

  • Endpoint-Independent Mapping (EIM): The same external port is used regardless of where packets are sent. If 192.168.1.50:12345 maps to 198.51.100.1:54321 for one destination, it maps to the same external port for every destination. This is "easy NAT."
  • Endpoint-Dependent Mapping (EDM): A different external port is assigned per destination. This is "hard NAT" -- what the classic taxonomy calls symmetric NAT.

Filtering behavior -- which incoming packets the NAT accepts:

  • Endpoint-Independent Filtering: Accepts packets from any external source once a mapping exists.
  • Address-Dependent Filtering: Only accepts packets from IPs the internal device has sent to.
  • Address and Port-Dependent Filtering: Only accepts packets matching both the IP and port previously contacted.

Here's why this matters for NAT traversal: a NAT with endpoint-independent mapping but address-dependent filtering (common in consumer routers) will allow UDP hole punching to work even though it's not "full cone."

The classic taxonomy would call this "restricted cone" and leave you guessing about traversal difficulty. The modern taxonomy tells you directly: EIM means hole punching will work; EDM means you need a relay.

Why symmetric NAT (EDM) is the enemy of peer-to-peer

With endpoint-independent mapping, STUN can discover your public IP:port, and that same IP:port will work for communicating with any peer. You tell your peer "send packets here," and they arrive.

With endpoint-dependent mapping, the port STUN discovers is only valid for talking to the STUN server. When your peer sends packets to that address, the NAT assigns a different port for the new destination -- and drops the peer's packets because they're arriving at the old port.

The address STUN gave you is useless for peer-to-peer communication.

This is why symmetric NATs are the primary reason WebRTC connections fail. And symmetric NAT behavior is common in corporate networks, mobile carriers using CGNAT, and some consumer routers.

NAT Type (RFC 4787) Mapping Filtering Hole Punch? Prevalence
EIM + Endpoint-Independent Filtering Endpoint-Independent Endpoint-Independent Yes (easy) Rare in practice
EIM + Address-Dependent Filtering Endpoint-Independent Address-Dependent Yes Common (consumer routers)
EIM + Address+Port-Dependent Filtering Endpoint-Independent Address+Port-Dependent Yes Common (consumer routers)

How NAT traversal works: the core techniques

NAT traversal is not a single protocol. It's a collection of techniques, each solving a different piece of the puzzle.

Here's how they work from simplest to most complex.

UDP hole punching

UDP hole punching is the most common NAT traversal technique for direct connections. It exploits a simple fact: most NATs create a mapping when an outbound packet is sent, and that mapping permits inbound packets from the destination.

The process works like this:

  1. Both peers (A and B) send their local and public address information to a signaling server (via STUN or other discovery).
  2. The signaling server tells A about B's public address, and B about A's public address.
  3. Both peers simultaneously send UDP packets to each other's public addresses.
  4. When A's packet arrives at B's NAT, B's NAT may initially drop it (no mapping exists yet). But B is also sending a packet to A, which creates an outbound mapping on B's NAT.
  5. When A's next packet arrives, B's NAT now has a mapping that permits it. The "hole" has been punched.

This works reliably when both NATs use endpoint-independent mapping (EIM). Research suggests UDP hole punching succeeds 82-95% of the time across general internet traffic.

But when either NAT uses endpoint-dependent mapping (symmetric NAT), hole punching fails because the port the peer sends to isn't the port the NAT actually assigned for that destination.

TCP hole punching

TCP hole punching follows the same principle but is significantly harder.

TCP's three-way handshake (SYN, SYN-ACK, ACK) means both sides need to send SYN packets simultaneously. If one SYN arrives before the other side has sent its own, the receiving NAT drops it as unsolicited.

The timing window is tight. In practice, TCP hole punching succeeds roughly 64% of the time -- substantially less reliable than UDP. This is one reason WebRTC defaults to UDP for media transport.

Port mapping protocols (UPnP IGD, NAT-PMP, PCP)

A more direct approach: ask the NAT to create a mapping explicitly. Three protocols exist for this:

  • UPnP IGD (Universal Plug and Play Internet Gateway Device): The oldest. Widely supported but has significant security concerns -- it allows any application on the network to open ports.
  • NAT-PMP (NAT Port Mapping Protocol): Apple's alternative, used in AirPort routers. Simpler and slightly more secure than UPnP.
  • PCP (Port Control Protocol, RFC 6887): The modern successor to NAT-PMP. Designed to work with both IPv4 NAT and IPv6 firewalls.

These protocols can create explicit port mappings, but they have a critical limitation: they only work on the first NAT hop.

If a user is behind CGNAT (carrier-grade NAT), UPnP/NAT-PMP/PCP can open a port on the home router, but the carrier's NAT sitting upstream is unaffected. The user is still unreachable.

Relay-based traversal

When direct connections fail -- both sides behind symmetric NATs, restrictive firewalls, or deep packet inspection -- the only option is routing traffic through an intermediary relay server.

Both peers connect outbound to the relay, and the relay forwards packets between them.

This is what TURN servers do. It adds latency (traffic takes an extra hop through the relay) and costs bandwidth (the relay provider pays for every byte), but it guarantees connectivity.

For production WebRTC applications, TURN is the difference between "works for 80% of users" and "works for everyone."

STUN: discovering your public address

STUN (Session Traversal Utilities for NAT, RFC 8489) is a lightweight protocol that lets a client discover its public-facing IP address and port as seen by the outside world. Think of it as asking a friend on the public internet: "What address do you see my packets coming from?"

The flow is straightforward:

  1. Your WebRTC client sends a STUN Binding Request to a STUN server on the public internet.
  2. The request passes through your NAT, which assigns a public IP:port mapping.
  3. The STUN server reads the source IP:port from the received packet and echoes it back in a Binding Response.
  4. Your client now knows its public address -- the server-reflexive candidate in ICE terminology.

STUN is fast (a single UDP round-trip), lightweight (minimal bandwidth), and free to operate at scale. Metered includes free STUN servers on all plans.

But STUN has a hard limitation: it cannot help when the NAT uses endpoint-dependent mapping (symmetric NAT). The public address STUN discovers is only valid for communicating with the STUN server itself.

A different destination gets a different port assignment, and the STUN-discovered address becomes useless for peer-to-peer.

That's where TURN takes over.

TURN: the relay fallback that ensures 100% connectivity

STUN tells you your public address. But when that address is useless -- symmetric NATs, restrictive firewalls, CGNAT -- you need a different approach entirely.

TURN (Traversal Using Relays around NAT, RFC 8656) is the NAT traversal protocol of last resort -- and the most important protocol for production WebRTC. For a deeper look at what TURN does, see what is a TURN server.

When STUN-based hole punching fails, TURN provides a relay path. The client connects outbound to a TURN server, allocates a relay address on that server, and the TURN server forwards packets between the two peers.

Here's how it works:

  1. The client sends an Allocate Request to the TURN server, authenticated with credentials.
  2. The TURN server allocates a relay transport address (a public IP:port on the server itself).
  3. The client tells its peer (via signaling) to send packets to the relay address.
  4. Both peers send traffic to the TURN server, which forwards packets between them.

TURN uses a strict permission model to prevent abuse as an open relay. The client must explicitly authorize which peers can send traffic through its allocation.

The numbers: how often is TURN needed?

Across general WebRTC traffic, 15-30% of connections require TURN relay. Chrome's internal usage metrics (UMA data) show approximately 20-25% of sessions using relay candidates.

The percentage varies significantly by deployment:

  • Consumer applications (users on home Wi-Fi): ~15-20% require TURN
  • Mobile-heavy applications (users on carrier networks with CGNAT): ~25-35%
  • Enterprise/corporate networks (restrictive firewalls, proxy servers): ~30-50%

For a telehealth platform with patients connecting from hospitals, corporate offices, and mobile networks, the TURN requirement can hit 40% or higher.

Without TURN, those users simply cannot connect. Your platform looks broken, and the patient reschedules their appointment.

This is why TURN is not optional for production WebRTC. The question isn't whether you need TURN. It's whether you run it yourself or use a managed service.

The cost of relay

TURN adds latency because traffic takes an extra network hop through the relay server. It also costs bandwidth -- the relay operator pays for every byte forwarded.

This is why TURN is used only as a fallback, not as the default path. The ICE framework (covered next) ensures TURN is only selected when direct connections have genuinely failed.

ICE: the framework that ties it all together

So far we've covered individual NAT traversal techniques. ICE is what brings them together into a single, automated process.

Interactive Connectivity Establishment (RFC 8445) is the framework that orchestrates NAT traversal in WebRTC. ICE doesn't replace STUN or TURN -- it uses both, along with direct connectivity checks, to find the best available path between two peers.

Candidate gathering

When a WebRTC RTCPeerConnection starts, ICE gathers candidates -- potential network paths the connection could use:

  • Host candidates: The device's local IP addresses and ports. These work when both peers are on the same network.
  • Server-reflexive candidates (srflx): Public IP:port discovered via STUN. These work when NATs use endpoint-independent mapping.
  • Relay candidates: Addresses allocated on a TURN server. These always work, at the cost of extra latency and bandwidth.
  • Peer-reflexive candidates (prflx): Discovered during connectivity checks when a packet arrives from an unexpected address. These represent paths that weren't predicted during gathering.

Candidate exchange via signaling

Once candidates are gathered, they're encoded in SDP (Session Description Protocol) and exchanged between peers through your application's signaling channel -- WebSocket, HTTP, or any other mechanism.

ICE doesn't define signaling; your application provides it.

Each candidate includes the transport address, protocol, priority, and component ID. The remote peer receives these candidates and adds them to its checklist.

Connectivity checks and prioritization

ICE pairs each local candidate with each remote candidate and runs connectivity checks -- essentially STUN Binding Requests sent directly between the peers. This verifies that packets can actually traverse the network path.

Candidate pairs are prioritized. ICE prefers:

  1. Host candidates (direct local connection, lowest latency)
  2. Server-reflexive candidates (NAT-traversed direct connection)
  3. Relay candidates (TURN, highest latency but guaranteed connectivity)

The first candidate pair that succeeds becomes the nominated pair, and media flows through it. If a higher-priority pair succeeds later, ICE can switch.

ICE connection states to monitor

In your WebRTC application, the RTCPeerConnection exposes ICE connection state through the iceConnectionState property:

  • new -- ICE agent created, no checks started
  • checking -- At least one candidate pair is being tested
  • connected -- A working pair is found, but checks continue for better options
  • completed -- ICE has finished all checks and selected the best pair
  • failed -- All candidate pairs have failed. No connectivity possible with current candidates.
  • disconnected -- Connectivity was lost (network change, NAT timeout). May recover.
  • closed -- ICE agent is shut down

Monitoring these states is the first line of defense for diagnosing NAT traversal problems. A connection that gets stuck in checking or lands on failed is almost always a NAT/firewall issue.

WebRTC code example: configuring ICE with STUN and TURN

Here's a practical example showing how to configure RTCPeerConnection with both STUN and TURN servers:

// ICE server configuration with STUN and TURN
const iceConfig = {
  iceServers: [
    {
      urls: "stun:stun.metered.ca:80"
    },
    {
      urls: [
        "turn:global.relay.metered.ca:80",
        "turn:global.relay.metered.ca:80?transport=tcp",
        "turn:global.relay.metered.ca:443",
        "turns:global.relay.metered.ca:443?transport=tcp"
      ],
      username: "your-credential-username",
      credential: "your-credential-password"
    }
  ],
  iceCandidatePoolSize: 2
};

const peerConnection = new RTCPeerConnection(iceConfig);

// Monitor ICE connection state changes
peerConnection.oniceconnectionstatechange = () => {
  console.log("ICE state:", peerConnection.iceConnectionState);

  switch (peerConnection.iceConnectionState) {
    case "connected":
      console.log("Peer connected -- media flowing");
      break;
    case "failed":
      console.warn("ICE failed -- attempting ICE restart");
      peerConnection.restartIce();
      break;
    case "disconnected":
      console.warn("Connection interrupted -- monitoring for recovery");
      break;
  }
};

// Monitor ICE candidate gathering
peerConnection.onicecandidate = (event) => {
  if (event.candidate) {
    // Send candidate to remote peer via signaling channel
    console.log("New ICE candidate:", event.candidate.type);
    // event.candidate.type will be "host", "srflx", "relay", or "prflx"
    signalingChannel.send({
      type: "ice-candidate",
      candidate: event.candidate
    });
  } else {
    console.log("ICE candidate gathering complete");
  }
};
Enter fullscreen mode Exit fullscreen mode

Note the TURN configuration includes multiple transport options: UDP on port 80, TCP on port 80, UDP on port 443, and TLS on port 443 (turns:).

This layered approach maximizes connectivity. UDP is fastest, but some networks block non-standard UDP traffic. TCP on port 80 works through most firewalls.

TLS on port 443 (turns:) traverses even deep packet inspection (DPI) firewalls that inspect and block non-HTTPS traffic -- the TURN traffic looks like regular HTTPS.

The CGNAT problem: why NAT traversal is getting harder

If standard NAT wasn't challenging enough, carrier-grade NAT (CGNAT) adds another layer. And it's becoming more prevalent, not less.

What CGNAT is

CGNAT (also called Large Scale NAT or LSN) is a second layer of NAT deployed by internet service providers at the network level.

Your home router performs one level of NAT (private IP to router's public IP), and then the ISP's CGNAT gateway performs a second level (router's "public" IP to the ISP's actual public IP). Your device is now behind two NATs.

ISPs deploy CGNAT because they've run out of IPv4 addresses to assign to customers. Instead of giving each household a unique public IP, the ISP shares one public IP across dozens or hundreds of subscribers.

How CGNAT affects WebRTC

CGNAT creates several problems for NAT traversal:

Double NAT breaks port mapping protocols. UPnP, NAT-PMP, and PCP only work on the first NAT hop -- your home router. The ISP's CGNAT upstream is unaffected.

You can open a port on your home router all day, and the ISP's NAT will still block inbound traffic.

CGNAT behaves as symmetric NAT. ISP NAT gateways use endpoint-dependent mapping to maximize IP sharing efficiency. This means STUN-based hole punching fails.

Direct peer-to-peer connections are impossible without a relay.

Shared IP addresses cause collateral damage. Cloudflare's 2024-2025 research on CGNAT detection revealed that shared IP addresses lead to "CGNAT bias" -- rate limiting and blocking that disproportionately impacts users behind shared IPs.

When one subscriber behind the CGNAT triggers a rate limit, every subscriber sharing that IP is affected.

CGNAT growth trends

CGNAT deployment is increasing, driven by continued IPv4 exhaustion:

  • Mobile networks: The majority of mobile carriers worldwide use CGNAT. If your users connect from phones on cellular data, they're almost certainly behind CGNAT.
  • Emerging markets: ISPs in regions where IPv4 addresses were always scarce (South Asia, Africa, Latin America) rely heavily on CGNAT.
  • Wireline ISPs: Even fixed-line providers are deploying CGNAT as IPv4 pools shrink.

Academic research tracked CGNAT deployments growing from approximately 1,200 in 2014 to 3,400 in 2016, with mobile operators accounting for 28.85% of deployments. Growth has only continued since.

In practice, this means the percentage of WebRTC connections requiring TURN relay is trending upward, not downward. For applications with significant mobile or international user bases, a reliable TURN server isn't a nice-to-have -- it's a requirement.

NAT traversal beyond WebRTC

While this guide focuses on WebRTC, NAT traversal is a challenge across multiple domains. The fundamental problem -- establishing bidirectional communication through NATs -- is universal.

VPN and IPsec (NAT-T)

IPsec VPN tunnels use ESP (Encapsulating Security Payload) packets, which NAT devices cannot translate because ESP doesn't use port numbers.

NAT-T (NAT Traversal, RFC 3948) solves this by encapsulating ESP inside UDP on port 4500.

IKEv2 detects NAT presence during the initial handshake using NAT_DETECTION_SOURCE_IP and NAT_DETECTION_DESTINATION_IP payloads. If NAT is detected, both sides switch to UDP encapsulation automatically. Keep-alive packets (typically every 20 seconds) maintain the NAT mapping.

VoIP and SIP

SIP (Session Initiation Protocol) embeds IP addresses in signaling headers and SDP bodies -- both the contact address and the media ports.

When SIP traverses a NAT, the internal addresses in the SIP headers don't match the external addresses on the packets. The result: the callee's phone rings, but audio flows nowhere because the media path uses the wrong addresses.

Solutions include STUN-based discovery (RFC 5626), SIP ALGs (Application Layer Gateways -- often more harmful than helpful), and ICE for SIP (RFC 5765).

Gaming

Multiplayer games face the same NAT traversal challenge. Console platforms like Xbox and PlayStation use "NAT type" classifications (Open, Moderate, Strict) that roughly correspond to the classic cone/symmetric taxonomy.

"Strict NAT" players can only connect to "Open NAT" hosts. Games typically use relay servers (conceptually similar to TURN) as fallback, though many use proprietary relay protocols rather than standard TURN.

IoT

IoT devices behind home routers need to communicate with cloud services and sometimes directly with each other.

Most IoT platforms solve this with persistent outbound connections to cloud brokers (MQTT, CoAP), avoiding the NAT traversal problem entirely.

But peer-to-peer IoT scenarios -- direct camera-to-phone streaming, device-to-device mesh networks -- face the same NAT challenges as WebRTC and use similar techniques (STUN/TURN/ICE).

Will IPv6 eliminate the need for NAT traversal?

This is one of the most common questions in the NAT traversal space. The short answer: not anytime soon, and not entirely even then.

IPv6 eliminates NAT, but not firewalls

IPv6 provides approximately 3.4 x 10^38 addresses -- enough for every device to have a globally unique, publicly routable address. In theory, this eliminates the need for NAT entirely. No NAT means no NAT traversal problem.

But firewalls still exist.

Even on pure IPv6 networks, stateful firewalls block unsolicited inbound connections by default. A stateful firewall tracking connections on the full 5-tuple (source IP, source port, destination IP, destination port, protocol) is functionally equivalent to a port-restricted cone NAT from a traversal perspective.

You still need hole punching or relay to establish peer-to-peer connections through firewalls.

Current IPv6 adoption

According to Google's IPv6 statistics, approximately 45-49% of Google traffic was IPv6 as of late 2025. The United States surpassed 50% in early 2025. France, Germany, and India lead with majority IPv6 traffic.

But adoption is uneven:

  • Corporate/enterprise networks: Many still run IPv4-only. Enterprises are notoriously slow to migrate.
  • China: Less than 5% of Google traffic from China uses IPv6 (though government reports claim 865 million active IPv6 users).
  • Weekday vs. weekend: IPv6 usage spikes on weekends (residential/mobile) and drops on weekdays (corporate), confirming that enterprise adoption lags behind.

NAT64 introduces its own overhead

For networks transitioning to IPv6-only, NAT64 translates between IPv6 and IPv4. This is itself a form of NAT, and it introduces performance penalties.

Research from Cornell University found that NAT64 paths are on average 23.13% longer with 17.47% higher round-trip times compared to native paths.

The realistic timeline

IPv6 has been in deployment since the 1990s. Thirty years later, it still hasn't reached universal adoption.

Corporate networks, IoT devices running legacy stacks, and the massive installed base of IPv4-only equipment all ensure that NAT traversal will remain a necessary capability for years to come.

The pragmatic engineering approach: build for a world where NAT exists, and treat IPv6-only networks as a welcome simplification when you encounter them -- not as an excuse to skip NAT traversal.

The future of NAT traversal: QUIC, WebTransport, and beyond

The transport layer is evolving, and new protocols are changing how NAT traversal works -- though not eliminating the need for it.

QUIC

QUIC (RFC 9000) runs over UDP, which is inherently more NAT-friendly than TCP.

QUIC's connection ID mechanism means that connections can survive NAT rebinding events (where the NAT assigns a new external port) without interruption. For WebRTC, this is significant: a user switching from Wi-Fi to cellular mid-call would historically break the TCP-based signaling connection and potentially disrupt media.

WebTransport

WebTransport is a new web API providing bidirectional, multiplexed transport using HTTP/3 (and therefore QUIC).

The IETF WebTransport specification (draft-ietf-webtrans-http3) enables client-server communication with lower latency than WebSocket.

More relevant to NAT traversal: the W3C is developing a P2P WebTransport specification that combines ICE-based NAT traversal with QUIC transport. This would bring QUIC's benefits (connection migration, multiplexing, reduced head-of-line blocking) to peer-to-peer communication -- while still using ICE, STUN, and TURN for connectivity establishment.

Media over QUIC (MoQ)

Media over QUIC is an emerging IETF protocol for live media delivery.

While MoQ is primarily designed for server-based relay architectures (not peer-to-peer), it represents the broader industry trend toward QUIC-based real-time communication.

The key takeaway

Every emerging real-time protocol still needs ICE/STUN/TURN for peer-to-peer NAT traversal.

QUIC improves the transport layer, WebTransport modernizes the API surface, and MoQ rethinks media delivery -- but none of them solve the fundamental problem of discovering addresses and punching through NATs.

STUN and TURN infrastructure remains essential.

Troubleshooting NAT traversal issues in WebRTC

When WebRTC connections fail, NAT traversal is the most common culprit. Here's a systematic approach to diagnosing and fixing these issues.

Common symptoms

  • "Works on my machine but not in production": Connection succeeds on your office network (permissive NAT) but fails for users on corporate or mobile networks (restrictive NAT/CGNAT).
  • Consistent ~20-30% failure rate: A significant minority of users can't connect. This is the classic "no TURN server" or "TURN misconfigured" signature.
  • Connection hangs in checking state: ICE is attempting connectivity checks but no candidate pair succeeds.
  • Connection reaches failed: All candidate pairs exhausted. No path works.
  • Audio/video works initially then drops: NAT mapping timeout. The NAT discarded the mapping because keep-alive packets weren't sent frequently enough.

Step-by-step diagnostic process

1. Check ICE candidate gathering

Open chrome://webrtc-internals in Chrome (or the equivalent in your browser). Look at the ICE candidates gathered by each peer. You should see:

  • Host candidates -- If these are missing, the WebRTC API isn't accessing local addresses (rare).
  • Server-reflexive (srflx) candidates -- If missing, your STUN server is unreachable or the NAT is blocking STUN traffic.
  • Relay candidates -- If missing, your TURN server is unreachable, credentials are invalid, or TURN traffic is being blocked.

If you only see host candidates, your STUN/TURN servers are not configured correctly or are unreachable from the user's network. Verify your configuration using a TURN server testing tool.

2. Analyze the selected candidate pair

In chrome://webrtc-internals, find the active candidate pair. Check:

  • Candidate types: If the winning pair uses relay candidates, the connection went through TURN. This works but adds latency.
  • Local and remote candidates: The candidate types tell you which NAT traversal technique succeeded.
  • Round-trip time: High RTT on relay candidates may indicate the TURN server is geographically distant from one or both peers.

3. Check TURN server connectivity

If relay candidates aren't being gathered, test TURN server connectivity:

// Quick TURN connectivity test
const testConfig = {
  iceServers: [{
    urls: "turn:global.relay.metered.ca:443?transport=tcp",
    username: "test-username",
    credential: "test-credential"
  }]
};

const pc = new RTCPeerConnection(testConfig);
pc.createDataChannel("test");

pc.onicecandidate = (event) => {
  if (event.candidate && event.candidate.type === "relay") {
    console.log("TURN relay candidate gathered -- TURN server is reachable");
    pc.close();
  }
};

pc.createOffer().then(offer => pc.setLocalDescription(offer));

// If no relay candidate appears within 10 seconds, TURN is unreachable
setTimeout(() => {
  if (pc.signalingState !== "closed") {
    console.error("No relay candidate -- TURN server unreachable or credentials invalid");
    pc.close();
  }
}, 10000);
Enter fullscreen mode Exit fullscreen mode

4. Implement ICE restart for recovery

When a connection drops (NAT mapping timeout, network change), ICE restart can re-establish connectivity without creating a new peer connection:

peerConnection.oniceconnectionstatechange = () => {
  if (peerConnection.iceConnectionState === "failed") {
    // Trigger ICE restart
    peerConnection.restartIce();
    // Create new offer with ICE restart flag
    peerConnection.createOffer({ iceRestart: true })
      .then(offer => peerConnection.setLocalDescription(offer))
      .then(() => {
        // Send the new offer via signaling channel
        signalingChannel.send({
          type: "offer",
          sdp: peerConnection.localDescription
        });
      });
  }
};
Enter fullscreen mode Exit fullscreen mode

5. Test from multiple network environments

NAT traversal issues are network-dependent. Test from:

  • Home Wi-Fi (consumer NAT -- usually permissive)
  • Mobile cellular data (likely CGNAT -- restrictive)
  • Corporate office network (firewall, potentially proxy-based)
  • VPN connections (adds another NAT layer)
  • Hotel/airport Wi-Fi (often highly restrictive)

If connections succeed from home but fail from corporate or mobile networks, your TURN configuration is the likely issue.

Choosing a TURN server for reliable NAT traversal

NAT traversal theory is well-understood. The engineering challenge is operating reliable TURN infrastructure at scale.

For production WebRTC applications, here's what matters.

Self-hosted vs. managed

You can deploy coturn (the open-source TURN server) on your own infrastructure. It works.

But it comes with an operational burden: deploying across multiple regions for low latency, managing TLS certificates, handling auto-scaling for traffic spikes, rotating credentials, monitoring uptime, and patching security vulnerabilities.

Teams running coturn in production report spending 15-20 hours per month per engineer on TURN operations -- time that isn't going into building your actual product.

A managed TURN service eliminates that burden. You get an API call to provision credentials and global infrastructure that someone else operates.

What to look for in a managed TURN service

  • Global coverage: Your TURN server should be close to your users. A TURN server in US-East doesn't help a user in Singapore -- it adds 250ms+ of latency to every packet.
  • Multiple transport protocols: UDP, TCP, TLS, and DTLS. Different networks block different protocols. You need all four.
  • Firewall-friendly ports: Port 80 and 443. Many corporate firewalls block non-standard ports.
  • High availability: If your TURN server goes down, every relayed connection drops. 99.9% uptime means 8.7 hours of downtime per year. 99.999% means 5.3 minutes.
  • Low latency: Every millisecond of TURN relay latency is added to your call quality. Sub-30ms from anywhere in the world is the benchmark.

Metered TURN Server provides 31+ regions, 100+ PoPs, 99.999% uptime, sub-30ms latency, and support for UDP, TCP, TLS, and DTLS on ports 80 and 443. You can get started with a free trial -- 500 MB of TURN usage, no credit card required. For a hands-on walkthrough, see the setup guide.

If you want to experiment with TURN without signing up for anything, the Open Relay Project provides a free community TURN server with 20 GB per month.

Conclusion

NAT traversal is the invisible infrastructure challenge behind every WebRTC application. NATs break peer-to-peer connectivity by design, and the techniques to work around them -- STUN for address discovery, UDP hole punching for direct connections, and TURN for relay fallback -- are what make real-time communication actually work across the messy reality of the internet.

The landscape is getting harder, not easier. CGNAT deployments are growing as IPv4 exhaustion continues. Corporate firewalls remain restrictive.

IPv6 adoption, while progressing (45-49% of Google traffic), is decades away from universal and doesn't eliminate firewall traversal anyway. Emerging protocols like QUIC and WebTransport improve the transport layer but still rely on ICE/STUN/TURN for peer-to-peer connectivity establishment.

For production WebRTC, reliable TURN infrastructure is not optional. The 15-30% of connections that require relay aren't edge cases you can ignore -- they're real users on real networks who deserve to connect.

The engineering question is whether you want to operate that infrastructure yourself or let someone else handle it. If you'd rather spend your engineering hours on your actual product, Metered's managed TURN service handles the relay infrastructure so you don't have to.

Start free -- 500 MB, no credit card.

Frequently asked questions

What is NAT traversal?

NAT traversal is a set of techniques for establishing direct network connections between devices that are behind Network Address Translators (NATs). Because NATs hide devices behind shared public IP addresses, devices can't receive unsolicited inbound traffic.

NAT traversal solves this using address discovery (STUN), hole punching (coordinated simultaneous outbound packets), and relay servers (TURN) when direct connections fail.

What is the difference between STUN and TURN?

STUN discovers your public-facing IP address and port by asking a server on the public internet. It's lightweight, fast, and free to operate.

TURN relays all traffic through an intermediary server when direct connections are impossible (symmetric NATs, restrictive firewalls, CGNAT). TURN guarantees connectivity but adds latency and costs bandwidth.

In WebRTC, both are used together via the ICE framework -- STUN for direct connections when possible, TURN as fallback.

Why do 15-30% of WebRTC connections fail without TURN?

About 15-30% of internet users sit behind symmetric NATs, CGNAT, or restrictive firewalls that prevent direct peer-to-peer connections.

STUN-based hole punching only works when NATs use endpoint-independent mapping. When the NAT assigns a different port per destination (endpoint-dependent mapping, or "symmetric NAT"), hole punching fails and TURN relay is the only path to connectivity.

Does IPv6 eliminate the need for NAT traversal?

IPv6 eliminates NAT but not firewalls. Stateful firewalls on IPv6 networks still block unsolicited inbound connections, which means hole punching and relay techniques remain necessary for peer-to-peer communication.

Additionally, IPv6 adoption is at roughly 45-49% globally (late 2025) and is unevenly distributed -- corporate networks significantly lag behind. NAT traversal will remain necessary for years.

How do I troubleshoot WebRTC connection failures caused by NAT?

Start with chrome://webrtc-internals to inspect ICE candidate gathering and connection state.

Check whether server-reflexive (STUN) and relay (TURN) candidates are being gathered. If relay candidates are missing, verify TURN server reachability and credentials using a TURN server testing tool.

Test from multiple network environments (home Wi-Fi, cellular data, corporate network) to identify which NAT types are causing failures. Implement ICE restart for recovery from transient failures.

Top comments (1)

Collapse
 
alakkadshaw profile image
alakkadshaw

Thanks for reading. I hope you like the article