<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sankalp Gilda</title>
    <description>The latest articles on DEV Community by Sankalp Gilda (@sankalp_gilda_92ba4374021).</description>
    <link>https://dev.to/sankalp_gilda_92ba4374021</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3840278%2F5ae4724c-86f0-48e8-89c7-0a30761d49dc.jpg</url>
      <title>DEV Community: Sankalp Gilda</title>
      <link>https://dev.to/sankalp_gilda_92ba4374021</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sankalp_gilda_92ba4374021"/>
    <language>en</language>
    <item>
      <title>A superscript-1 walks past every Go SSRF guard</title>
      <dc:creator>Sankalp Gilda</dc:creator>
      <pubDate>Mon, 04 May 2026 07:13:22 +0000</pubDate>
      <link>https://dev.to/sankalp_gilda_92ba4374021/a-superscript-1-walks-past-every-go-ssrf-guard-3gk3</link>
      <guid>https://dev.to/sankalp_gilda_92ba4374021/a-superscript-1-walks-past-every-go-ssrf-guard-3gk3</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR.&lt;/strong&gt; &lt;code&gt;golang.org/x/net/idna.Lookup.ToASCII&lt;/code&gt; runs UTS-46 NFKC mapping&lt;br&gt;
on hostnames, which folds 100 non-ASCII Unicode digit codepoints (math&lt;br&gt;
superscripts, circled digits, fullwidth digits, math-styled digits, and&lt;br&gt;
others) to ASCII &lt;code&gt;0-9&lt;/code&gt;. A pre-IDNA &lt;code&gt;net.ParseIP&lt;/code&gt; check rejects the&lt;br&gt;
non-ASCII input as not-an-IP, hands it to the library, and gets back a&lt;br&gt;
real IPv4 literal. That literal then walks past SSRF allowlists,&lt;br&gt;
&lt;code&gt;NO_PROXY&lt;/code&gt; lists, TLS-SNI routers, and cookie-domain validators that&lt;br&gt;
only checked the pre-IDNA value. The fix is a post-IDNA &lt;code&gt;TrimRight + ParseAddr&lt;/code&gt;&lt;br&gt;
recheck. The blog has the bug, a runnable proof of concept against&lt;br&gt;
&lt;code&gt;golang.org/x/net/http/httpproxy&lt;/code&gt;, the canonical safe pattern, and two&lt;br&gt;
just-shipped registry rules (CodeQL + Semgrep) that catch it in CI.&lt;/p&gt;

&lt;p&gt;I ran into this one while writing a Go HTTP client for a private project. I&lt;br&gt;
had a host allowlist, I had &lt;code&gt;idna.Lookup.ToASCII&lt;/code&gt; canonicalising the host&lt;br&gt;
before dial, and I still could not convince myself the allowlist held. It&lt;br&gt;
did not. A single mathematical-superscript "1" in the host walked straight&lt;br&gt;
through.&lt;/p&gt;

&lt;p&gt;The shape is general. Any Go program that calls &lt;code&gt;golang.org/x/net/idna.Lookup.ToASCII&lt;/code&gt;&lt;br&gt;
(or the &lt;code&gt;MapForLookup&lt;/code&gt; profile, or any custom profile built on&lt;br&gt;
&lt;code&gt;idna.New(idna.MapForLookup(), ...)&lt;/code&gt;) on attacker-controlled hostnames is a&lt;br&gt;
candidate. The library does what its specification says it does. The caller&lt;br&gt;
does what its tutorial says it does. Between them, a smuggled IPv4 literal&lt;br&gt;
slips past every SSRF allowlist, every &lt;code&gt;NoProxy&lt;/code&gt; rule, and every TLS-SNI&lt;br&gt;
router, and reaches a network sink as if it were a regular hostname.&lt;/p&gt;

&lt;p&gt;I reported it privately. The Go security team declined to treat it as a&lt;br&gt;
library bug, on the grounds that the post-IDNA IP-literal check is the&lt;br&gt;
caller's responsibility. The bug class is real either way, and "technically&lt;br&gt;
the caller's fault" is cold comfort for a working engineer staring at an&lt;br&gt;
SSRF in production. So I went looking for what caller-side tooling actually&lt;br&gt;
helps. This post is the writeup: the mechanism, a concrete proof of concept,&lt;br&gt;
the defensive pattern, and the detection rules I shipped at v0.1.1 along&lt;br&gt;
with the empirical work that drove the design.&lt;/p&gt;
&lt;h2&gt;
  
  
  The mechanism
&lt;/h2&gt;

&lt;p&gt;IDNA stands for Internationalized Domain Names in Applications, and the&lt;br&gt;
"UTS-46" profile of IDNA defines a normalization step that maps Unicode&lt;br&gt;
hostnames to ASCII before they go on the wire. The mapping uses NFKC&lt;br&gt;
compatibility decomposition. If you have not stared at the Unicode tables&lt;br&gt;
recently, NFKC has a property that matters here: it folds compatibility&lt;br&gt;
digit codepoints to their ASCII counterparts. A circled &lt;code&gt;①&lt;/code&gt; becomes &lt;code&gt;1&lt;/code&gt;.&lt;br&gt;
A fullwidth &lt;code&gt;０&lt;/code&gt; becomes &lt;code&gt;0&lt;/code&gt;. A mathematical superscript &lt;code&gt;¹&lt;/code&gt; becomes &lt;code&gt;1&lt;/code&gt;.&lt;br&gt;
A double-struck &lt;code&gt;𝟙&lt;/code&gt; becomes &lt;code&gt;1&lt;/code&gt;. And so on, across a hundred different&lt;br&gt;
codepoints in eight families: Latin-1&lt;br&gt;
superscripts, mathematical superscripts, mathematical subscripts, circled&lt;br&gt;
digits, fullwidth digits, mathematical bold and sans-serif and double-struck&lt;br&gt;
and monospace digits, and the segmented digits in the Symbols for Legacy&lt;br&gt;
Computing block.&lt;/p&gt;

&lt;p&gt;Now consider a host string like &lt;code&gt;0.¹.0.0&lt;/code&gt;. The &lt;code&gt;0&lt;/code&gt; and &lt;code&gt;.&lt;/code&gt; are ASCII; the&lt;br&gt;
&lt;code&gt;¹&lt;/code&gt; is &lt;code&gt;U+00B9&lt;/code&gt;, mathematical superscript one. The bug class is the&lt;br&gt;
absence of a post-IDNA IP-literal recheck. The shape that matters is not&lt;br&gt;
"caller did a pre-IDNA &lt;code&gt;net.ParseIP&lt;/code&gt; then trusted the post-IDNA result";&lt;br&gt;
the shape that matters is "the post-IDNA value flowed to a network sink&lt;br&gt;
without any IP-literal check at the post-mapping point." A pre-IDNA check,&lt;br&gt;
when present, makes the bug worse by giving reviewers a false sense of&lt;br&gt;
input validation, but its absence is not what creates the smuggle. The&lt;br&gt;
absence of a post-IDNA check is.&lt;/p&gt;

&lt;p&gt;The most common real-world version of this shape lives in&lt;br&gt;
&lt;code&gt;golang.org/x/net/http/httpproxy.canonicalAddr&lt;/code&gt;. There is no pre-IDNA&lt;br&gt;
guard at all. The function takes a &lt;code&gt;*url.URL&lt;/code&gt;, calls a small wrapper that&lt;br&gt;
runs &lt;code&gt;idna.Lookup.ToASCII&lt;/code&gt; on the host, and feeds the result straight to&lt;br&gt;
&lt;code&gt;net.JoinHostPort&lt;/code&gt;. The post-IDNA value is the host that decides whether&lt;br&gt;
the request goes to the configured proxy or to the origin directly.&lt;br&gt;
Anything that NFKC-folds to a numeric literal walks past &lt;code&gt;NoProxy&lt;/code&gt; and&lt;br&gt;
out to wherever the smuggled literal points.&lt;/p&gt;

&lt;p&gt;The Go module documentation does not mention this. The &lt;code&gt;(*Profile).ToASCII&lt;/code&gt;&lt;br&gt;
godoc says it converts the input to ASCII according to the rules of UTS-46.&lt;br&gt;
It does. There is no IP-literal detection in the &lt;code&gt;idna&lt;/code&gt; package because&lt;br&gt;
detecting IP literals is not what UTS-46 specifies. Any caller that needs&lt;br&gt;
to reject post-mapping IP literals has to do that work themselves.&lt;/p&gt;
&lt;h2&gt;
  
  
  A two-line proof of concept
&lt;/h2&gt;

&lt;p&gt;Here is a self-contained Go program that exhibits the bug against&lt;br&gt;
&lt;code&gt;golang.org/x/net/http/httpproxy&lt;/code&gt;. The &lt;code&gt;httpproxy&lt;/code&gt; package canonicalises the&lt;br&gt;
request URL host before consulting the operator's &lt;code&gt;NO_PROXY&lt;/code&gt; list. It does&lt;br&gt;
that canonicalisation through &lt;code&gt;idna.Lookup.ToASCII&lt;/code&gt;, and it does no&lt;br&gt;
post-mapping IP-literal recheck.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"net/url"&lt;/span&gt;

    &lt;span class="s"&gt;"golang.org/x/net/http/httpproxy"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;cfg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;httpproxy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;HTTPSProxy&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"https://corporate-mitm-proxy.internal:8080"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;HTTPProxy&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;"http://corporate-mitm-proxy.internal:8080"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;NoProxy&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;"0.1.0.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;proxyFunc&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ProxyFunc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;cases&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;"https://example.com/api"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"https://0.1.0.0/api"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"https://0.¹.0.0/api"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"https://１９２．１６８．１．１/api"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;cases&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;proxy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;proxyFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"[%s] proxy=%v&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;proxy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run that against &lt;code&gt;golang.org/x/net@v0.53.0&lt;/code&gt;. The first URL routes through&lt;br&gt;
the configured corporate proxy. The second URL matches &lt;code&gt;NO_PROXY=0.1.0.0&lt;/code&gt;&lt;br&gt;
and bypasses the proxy directly to the literal &lt;code&gt;0.1.0.0&lt;/code&gt;. The third URL,&lt;br&gt;
the smuggled one, also bypasses the proxy: &lt;code&gt;0.¹.0.0&lt;/code&gt; canonicalises to&lt;br&gt;
&lt;code&gt;0.1.0.0&lt;/code&gt;, which matches the &lt;code&gt;NO_PROXY&lt;/code&gt; entry. The fourth URL, fullwidth&lt;br&gt;
&lt;code&gt;192.168.1.1&lt;/code&gt;, behaves the same way against any allowlist that lists&lt;br&gt;
&lt;code&gt;192.168.1.1&lt;/code&gt; or any RFC 1918 range.&lt;/p&gt;

&lt;p&gt;The exploit shape varies by caller. For &lt;code&gt;httpproxy&lt;/code&gt;, it is an&lt;br&gt;
egress-monitoring or DLP bypass: the attacker takes the unmonitored path.&lt;br&gt;
For an HTTP client that uses &lt;code&gt;idna.ToASCII&lt;/code&gt; then dials, it is a classic&lt;br&gt;
SSRF: the attacker reaches loopback, RFC 1918, link-local, or cloud&lt;br&gt;
metadata endpoints by smuggling those literals through a guard that only&lt;br&gt;
checks ASCII IPs. The classic worked example is the AWS IMDS endpoint at&lt;br&gt;
&lt;code&gt;169.254.169.254&lt;/code&gt;: an attacker who can route a fullwidth &lt;code&gt;１６９.２５４.１６９.２５４&lt;/code&gt;&lt;br&gt;
or a math-styled &lt;code&gt;𝟣𝟨𝟫.𝟤𝟧𝟦.𝟣𝟨𝟫.𝟤𝟧𝟦&lt;/code&gt; past your guards reaches the&lt;br&gt;
instance-credential surface from your own service. For a TLS-SNI-based&lt;br&gt;
router, it is a routing-table bypass.&lt;br&gt;
For a cookie-domain validator, it is a cookie-scope confusion that hands&lt;br&gt;
the attacker cookies issued for a numeric host.&lt;/p&gt;
&lt;h2&gt;
  
  
  Where the spec ends and the caller begins
&lt;/h2&gt;

&lt;p&gt;UTS-46 is an Internet standard governing hostname canonicalisation. It is&lt;br&gt;
not an SSRF defence library. The &lt;code&gt;golang.org/x/net/idna&lt;/code&gt; package&lt;br&gt;
implements that spec, and the spec does not mandate IP-literal rejection.&lt;br&gt;
Read on those terms, the post-IDNA recheck belongs to the caller. That&lt;br&gt;
is the position the &lt;code&gt;golang.org/x/net/idna&lt;/code&gt; maintainer took in response&lt;br&gt;
to the private report, and it is internally consistent: a library that&lt;br&gt;
quietly added an IP-literal rejection step would be deviating from the&lt;br&gt;
specification it claims to implement, and would surprise other&lt;br&gt;
specification-conformant callers downstream.&lt;/p&gt;

&lt;p&gt;So the question is not whether the spec is correct. It is what&lt;br&gt;
caller-side guardrail looks like when the spec leaves the recheck to the&lt;br&gt;
caller. The anti-pattern is widespread for the same reason most security&lt;br&gt;
bugs are widespread: the shape "canonicalise, then use" reads as&lt;br&gt;
obviously correct to anyone who has not already been bitten by it.&lt;br&gt;
A reviewer scanning a hundred lines of HTTP plumbing has no reason to&lt;br&gt;
flag an &lt;code&gt;idna.Lookup.ToASCII&lt;/code&gt; call followed by a &lt;code&gt;JoinHostPort&lt;/code&gt;. The fix&lt;br&gt;
has to live somewhere the reviewer's eyes do not have to be: in static&lt;br&gt;
analysis, in CI gates, in codemods that rewrite the call site so the&lt;br&gt;
guard is impossible to forget.&lt;/p&gt;
&lt;h2&gt;
  
  
  The fix
&lt;/h2&gt;

&lt;p&gt;Trim trailing dots and re-check with &lt;code&gt;net.ParseIP&lt;/code&gt; or&lt;br&gt;
&lt;code&gt;netip.ParseAddr&lt;/code&gt; after the IDNA call. Reject if the result parses as an&lt;br&gt;
IP literal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;errIDNAIPLiteralSmuggle&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"idna: post-mapping IP literal smuggle"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;canonicaliseHost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;idna&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lookup&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ToASCII&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ipErr&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;netip&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ParseAddr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TrimRight&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"."&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt; &lt;span class="n"&gt;ipErr&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;errIDNAIPLiteralSmuggle&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two properties of that guard are load-bearing.&lt;/p&gt;

&lt;p&gt;First, &lt;code&gt;strings.TrimRight(ace, ".")&lt;/code&gt; and not &lt;code&gt;strings.TrimSuffix(ace, ".")&lt;/code&gt;.&lt;br&gt;
UTS-46 maps fullwidth dot &lt;code&gt;U+FF0E&lt;/code&gt; and ideographic dot &lt;code&gt;U+3002&lt;/code&gt; to ASCII&lt;br&gt;
dot. An input like &lt;code&gt;0.¹.0.0．．&lt;/code&gt; (two trailing fullwidth dots after the&lt;br&gt;
last numeric label) maps to &lt;code&gt;0.1.0.0..&lt;/code&gt; post-IDNA. &lt;code&gt;TrimSuffix(_, ".")&lt;/code&gt;&lt;br&gt;
strips one dot and leaves &lt;code&gt;0.1.0.0.&lt;/code&gt;, which &lt;code&gt;netip.ParseAddr&lt;/code&gt; rejects as&lt;br&gt;
non-IP, silently passing the smuggle through. &lt;code&gt;TrimRight&lt;/code&gt; removes any number&lt;br&gt;
of trailing dots and closes the variant.&lt;/p&gt;

&lt;p&gt;Second, the recheck has to happen after the IDNA call. A pre-IDNA&lt;br&gt;
&lt;code&gt;net.ParseIP&lt;/code&gt; is worse than no guard at all: it gives the reviewer the&lt;br&gt;
false impression that the input shape has been validated, which is exactly&lt;br&gt;
why the pre-IDNA-only check is the most common form of the anti-pattern in&lt;br&gt;
the wild. The smuggled literal is, by construction, not an IP before&lt;br&gt;
mapping. The check has to be post-mapping, or it does not catch the bug.&lt;/p&gt;

&lt;p&gt;Here is a thing worth pausing on. I went looking for production callers&lt;br&gt;
already doing this canonical guard. Across 19 OSS Go repositories and 31&lt;br&gt;
production callsites of &lt;code&gt;idna.*ToASCII&lt;/code&gt; (a sweep I will get to in a&lt;br&gt;
moment), zero used the &lt;code&gt;TrimRight + netip.ParseAddr&lt;/code&gt; shape. One came&lt;br&gt;
close: &lt;code&gt;google/safebrowsing&lt;/code&gt; does a both-sided &lt;code&gt;strings.Trim&lt;/code&gt; plus an&lt;br&gt;
in-house &lt;code&gt;parseIPAddress&lt;/code&gt;. Everyone else just returned the post-IDNA&lt;br&gt;
string and trusted downstream.&lt;/p&gt;

&lt;p&gt;That is the install base. That is what detection has to handle.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I shipped, and why this shape
&lt;/h2&gt;

&lt;p&gt;The repo is at&lt;br&gt;
&lt;code&gt;https://github.com/astrogilda/idna-ip-literal-smuggle-rules&lt;/code&gt;. The latest&lt;br&gt;
verified release is &lt;code&gt;v0.1.1&lt;/code&gt;. The earlier &lt;code&gt;v0.1.0&lt;/code&gt; mark predates the CodeQL&lt;br&gt;
DB-backed verification I will describe below; treat &lt;code&gt;v0.1.1&lt;/code&gt; as the first&lt;br&gt;
release where I am confident the CodeQL query actually fires on the&lt;br&gt;
canonical wrapper shape in a real DB extraction, not just on synthetic&lt;br&gt;
fixtures.&lt;/p&gt;

&lt;p&gt;The full strategy synthesis lives in the repo at&lt;br&gt;
&lt;code&gt;https://github.com/astrogilda/idna-ip-literal-smuggle-rules/blob/main/docs/research/v0.1-detection-strategy.md&lt;/code&gt;.&lt;br&gt;
The short version is two layers of detection plus one deliberate omission.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: CodeQL is the primary recall vehicle
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;IdnaIpLiteralSmuggle.ql&lt;/code&gt; uses &lt;code&gt;TaintTracking::GlobalWithState&lt;/code&gt; with two&lt;br&gt;
flow states (&lt;code&gt;TPreIdna&lt;/code&gt;, &lt;code&gt;TPostIdna&lt;/code&gt;). A single barrier predicate,&lt;br&gt;
&lt;code&gt;safePostIdnaRecheck(postIdnaSource, node)&lt;/code&gt;, ties the trim source to the&lt;br&gt;
post-IDNA tainted predecessor, so a recheck on an unrelated value cannot&lt;br&gt;
silence the alert. The state-transition step covers&lt;br&gt;
&lt;code&gt;(*idna.Profile).ToASCII&lt;/code&gt; and &lt;code&gt;(*idna.Profile).ToUnicode&lt;/code&gt;; the&lt;br&gt;
package-level &lt;code&gt;idna.ToASCII&lt;/code&gt; is excluded as a Punycode wrapper, no NFKC&lt;br&gt;
mapping, no smuggle surface. Sinks span 11 families: &lt;code&gt;JoinHostPort&lt;/code&gt;, the&lt;br&gt;
&lt;code&gt;Dial&lt;/code&gt; family, &lt;code&gt;(*url.URL).Host&lt;/code&gt;, &lt;code&gt;(*tls.Config).ServerName&lt;/code&gt;,&lt;br&gt;
&lt;code&gt;(*http.Cookie).Domain&lt;/code&gt;, HTTP client request URLs, and the package-level&lt;br&gt;
and &lt;code&gt;(*net.Resolver)&lt;/code&gt; DNS primitives.&lt;/p&gt;

&lt;p&gt;CodeQL's inter-procedural taint walks through a one-deep wrapper like&lt;br&gt;
&lt;code&gt;idnaASCII&lt;/code&gt; for free, no &lt;code&gt;isAdditionalFlowStep&lt;/code&gt; modelling required. The&lt;br&gt;
URL.Hostname taint enters the wrapper, propagates through its body, exits&lt;br&gt;
as the return value, and flows into the caller's &lt;code&gt;net.JoinHostPort&lt;/code&gt; sink.&lt;br&gt;
This is how the bug actually appears in production code, so this is what&lt;br&gt;
detection has to model.&lt;/p&gt;

&lt;p&gt;I extracted a CodeQL DB for &lt;code&gt;golang.org/x/net/http/httpproxy&lt;/code&gt; and ran the&lt;br&gt;
query. At v0.1.1 it fires twice on the canonical &lt;code&gt;canonicalAddr&lt;/code&gt; shape&lt;br&gt;
reproducer, registers 23 unique sink alerts on the positive-fixture suite,&lt;br&gt;
and emits zero genuine false positives on the negative fixtures. That is&lt;br&gt;
the first version where I trust the rule.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: Semgrep OSS as the direct-call precision sweep
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;idna-ip-literal-smuggle.yaml&lt;/code&gt; is a &lt;code&gt;mode: taint&lt;/code&gt; rule, intra-procedural,&lt;br&gt;
runs against community Semgrep with no Pro features required. There is a&lt;br&gt;
sibling &lt;code&gt;-pro.yaml&lt;/code&gt; that adds &lt;code&gt;interfile: true&lt;/code&gt; for operators with the&lt;br&gt;
Pro Engine, and an opt-in &lt;code&gt;-experimental.yaml&lt;/code&gt; that widens the source set&lt;br&gt;
to hostname-typed field reads. Default OSS first; Pro and experimental&lt;br&gt;
are opt-in.&lt;/p&gt;

&lt;p&gt;Now the part that surprised me. I ran the OSS, Pro, and experimental&lt;br&gt;
yamls against three corpora: &lt;code&gt;golang/go&lt;/code&gt;, &lt;code&gt;kubernetes/kubernetes&lt;/code&gt;, and&lt;br&gt;
&lt;code&gt;prometheus/prometheus&lt;/code&gt;, totalling 660 MB of Go. The result table is&lt;br&gt;
short:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rule&lt;/th&gt;
&lt;th&gt;golang/go&lt;/th&gt;
&lt;th&gt;kubernetes&lt;/th&gt;
&lt;th&gt;prometheus&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;idna-ip-literal-smuggle&lt;/code&gt; (OSS)&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;idna-ip-literal-smuggle-pro&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;idna-ip-literal-smuggle-experimental&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Zero. Across all three rules and all three corpora.&lt;/p&gt;

&lt;p&gt;My first instinct was the same as yours probably is now: the rule&lt;br&gt;
under-fires. But there are exactly two production callsites of any&lt;br&gt;
UTS-46-mapping &lt;code&gt;idna.*ToASCII&lt;/code&gt; profile in those 660 MB, and both are&lt;br&gt;
wrapped:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;golang/go: src/net/http/request.go:799&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;golang/go: src/vendor/golang.org/x/net/http/httpproxy/proxy.go:312&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other &lt;code&gt;idna&lt;/code&gt; usages in the corpus call package-level &lt;code&gt;idna.ToASCII&lt;/code&gt;,&lt;br&gt;
which dispatches to the Punycode profile, which is correctly out of&lt;br&gt;
scope. The OSS rule cannot step through &lt;code&gt;idnaASCII&lt;/code&gt; because OSS Semgrep&lt;br&gt;
is intra-procedural. The Pro yaml's &lt;code&gt;interfile: true&lt;/code&gt; is a no-op without&lt;br&gt;
the Pro Engine binary, so against community Semgrep it behaves the same&lt;br&gt;
as OSS. The experimental yaml's relaxed source set still requires the&lt;br&gt;
matching field on a struct passed directly to &lt;code&gt;idna.*ToASCII&lt;/code&gt; in the&lt;br&gt;
same function, which does not occur in these codebases.&lt;/p&gt;

&lt;p&gt;Zero is the honest answer at the OSS tier for a corpus where the only&lt;br&gt;
in-scope callsites are wrapped. CodeQL with &lt;code&gt;TaintTracking::GlobalWithState&lt;/code&gt;&lt;br&gt;
catches them, no Pro licence required. Semgrep OSS catches the direct-call&lt;br&gt;
shape, and the direct-call shape is what fits in an intra-procedural&lt;br&gt;
analyzer's mouth. The right move is not to make Semgrep OSS fire louder&lt;br&gt;
by relaxing the source set; the right move is to lead the registry&lt;br&gt;
submission with CodeQL and frame Semgrep OSS as the precision sweep, not&lt;br&gt;
the recall workhorse.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: I deliberately did not ship a blanket structural rule
&lt;/h3&gt;

&lt;p&gt;This is the call I want to defend explicitly because the obvious thing&lt;br&gt;
to ship is exactly the wrong thing.&lt;/p&gt;

&lt;p&gt;I catalogued 31 production callsites of &lt;code&gt;idna.*ToASCII&lt;/code&gt; across 19&lt;br&gt;
distinct repos: caddy, vault, certmagic, ooni/probe-cli, smallstep,&lt;br&gt;
sing-box, mattermost, hostmatcher, lorawan-stack, tlsproxy, datadog-agent,&lt;br&gt;
cloudflared, mosn, safebrowsing, sniproxy, q, whatwg-url, plus the Go&lt;br&gt;
stdlib and the x/net mirrors. The classification:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Class&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;(a) direct call&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;idna.Lookup.ToASCII(input)&lt;/code&gt; directly in a function whose source is identifiable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;(b) one-deep wrapper&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;small helper named &lt;code&gt;idnaASCII&lt;/code&gt; / &lt;code&gt;toASCII&lt;/code&gt; returning the call result raw&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;(c) multi-deep / conditional wrapper&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;helper that branches on &lt;code&gt;isASCII&lt;/code&gt; / &lt;code&gt;len&lt;/code&gt; and only calls ToASCII on a sub-path&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;(d) post-call IP recheck present&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;google/safebrowsing/urls.go:260&lt;/code&gt;, both-sided &lt;code&gt;strings.Trim&lt;/code&gt; plus in-house parseIPAddress&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A blanket structural rule along the lines of "any &lt;code&gt;ToASCII&lt;/code&gt; without&lt;br&gt;
&lt;code&gt;TrimRight + netip.ParseAddr&lt;/code&gt; in the same block" would fire on 30 of 31&lt;br&gt;
callsites. The 30 are not all bugs. Many are PSL walkers, registrar&lt;br&gt;
pipelines, and TLS-cert-manager code where the &lt;code&gt;ToASCII&lt;/code&gt; result never&lt;br&gt;
network-routes on attacker input, so the missing recheck is not a smuggle&lt;br&gt;
vector. weppos/publicsuffix-go, x/net/publicsuffix, cloudflare-go issue&lt;/p&gt;

&lt;h1&gt;
  
  
  688, and every PSL-driven cookiejar codepath are documented
&lt;/h1&gt;

&lt;p&gt;non-network-routing IDNA users. A rule that flags them as smuggle bugs&lt;br&gt;
pathologises the documented library contract. It would be roughly 95%&lt;br&gt;
false-positive in the strict "vulnerable to UTS-46 smuggling" sense, and&lt;br&gt;
operators would lose trust within one CI cycle.&lt;/p&gt;

&lt;p&gt;This is also, I suspect, why no prior IDNA-class CVE (CVE-2021-29923,&lt;br&gt;
CVE-2024-12224, CVE-2024-3651, CVE-2026-39821) shipped with a Semgrep or&lt;br&gt;
CodeQL rule attached. The detection space is too noisy without&lt;br&gt;
inter-procedural taint scoping. Taint scoping is the work, not an&lt;br&gt;
implementation detail of it.&lt;/p&gt;

&lt;p&gt;The two-tool stratification (CodeQL inter-procedural recall plus Semgrep&lt;br&gt;
OSS direct-call precision) is the same shape &lt;code&gt;BadRedirectCheck.ql&lt;/code&gt; and&lt;br&gt;
&lt;code&gt;IncompleteUrlSchemeCheck.ql&lt;/code&gt; use in the CodeQL community pack today. It&lt;br&gt;
is a defensible registry-submission narrative because it is a documented&lt;br&gt;
prior pattern, not an invention.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pick one, or pick two
&lt;/h2&gt;

&lt;p&gt;If you have CodeQL in CI already, run the query in nightly MRVA sweeps.&lt;br&gt;
It will catch wrapped callsites in addition to direct ones. If you have&lt;br&gt;
Semgrep in CI already, run the OSS rule on every PR. It is fast, it has&lt;br&gt;
no measured FPs on the three corpora I tested, and it will catch the&lt;br&gt;
direct-call shapes that survive into your code. If you have both, run&lt;br&gt;
both. They answer slightly different questions and the failure modes do&lt;br&gt;
not overlap.&lt;/p&gt;

&lt;p&gt;If you maintain Go code that calls &lt;code&gt;idna.Lookup.ToASCII&lt;/code&gt;,&lt;br&gt;
&lt;code&gt;idna.Display.ToASCII&lt;/code&gt;, or any custom profile constructed via&lt;br&gt;
&lt;code&gt;idna.New(idna.MapForLookup(), ...)&lt;/code&gt;, add the post-IDNA IP-literal&lt;br&gt;
recheck. If you have downstream allowlists, SSRF guards, &lt;code&gt;NO_PROXY&lt;/code&gt;&lt;br&gt;
lists, TLS-SNI routing, or cookie-domain validation that depends on&lt;br&gt;
hostname canonicalisation, you have the bug class somewhere in your&lt;br&gt;
dependency graph. The rules at&lt;br&gt;
&lt;code&gt;https://github.com/astrogilda/idna-ip-literal-smuggle-rules&lt;/code&gt; will tell&lt;br&gt;
you where.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;p&gt;IPv4 only. IPv6 colons are rejected by IDNA rune-validation before&lt;br&gt;
NFKC runs, so there is no IPv6 path through this mechanism (the&lt;br&gt;
IPv4-mapped-IPv6 macro-encoding class is a separate bug, separate&lt;br&gt;
sanitizer, separate post). Go-specific tooling; the same anti-pattern&lt;br&gt;
exists in Python's &lt;code&gt;kjd/idna&lt;/code&gt;, Node's &lt;code&gt;url.domainToASCII&lt;/code&gt;, and ICU's&lt;br&gt;
&lt;code&gt;uidna_*&lt;/code&gt;, but each ecosystem needs its own rule. WHATWG-integrated&lt;br&gt;
URL parsers (callers that use &lt;code&gt;url.Parse&lt;/code&gt; and never touch&lt;br&gt;
&lt;code&gt;idna.*.ToASCII&lt;/code&gt; directly) are out of scope: the parser already runs&lt;br&gt;
the &lt;code&gt;ends_in_a_number&lt;/code&gt; host-shape check post-decode.&lt;/p&gt;

&lt;h2&gt;
  
  
  In flight: registry submissions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;CodeQL community-pack PR: &lt;a href="https://github.com/github/codeql/pull/21784" rel="noopener noreferrer"&gt;https://github.com/github/codeql/pull/21784&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Semgrep registry PR: &lt;a href="https://github.com/semgrep/semgrep-rules/pull/3841" rel="noopener noreferrer"&gt;https://github.com/semgrep/semgrep-rules/pull/3841&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both PRs reference the upstream strategy doc so reviewers see the&lt;br&gt;
design rationale before asking. If you want to follow along, those&lt;br&gt;
are the threads to watch.&lt;/p&gt;

&lt;p&gt;Corrections and additional fold-class fixtures are welcome on the&lt;br&gt;
repository.&lt;/p&gt;

</description>
      <category>security</category>
      <category>go</category>
      <category>codeql</category>
      <category>semgrep</category>
    </item>
  </channel>
</rss>
