Go: Identifiers vs. Unicode

#go #unicode

A recent Reddit post about Unicode characters in Go identifiers sparked my interest to dive into the Go spec and look things up properly:

According to the spec, the syntax for valid identifiers is

identifier = letter { letter | unicode_digit }

with

letter = unicode_letter | "_"
unicode_letter = /* a Unicode code point classified as "Letter" */ .
unicode_digit  = /* a Unicode code point classified as "Number, decimal digit" */ .

The "Letter" category consists of the Unicode categories Lu (uppercase letters), Ll (lowercase letters), Lt (titlecase letters), Lm (modifier letters), and Lo (other letters), where "Number, decimal digit" refers to the Unicode category Nd.

So an identifier has to start with either a "letter" or an underscore ("_"), and must contain only "letters", "decimal digits" and "underscores" - according to what's defined as letters and digits in Unicode.
The set of letters is not only the usual A-Z, a-z, but also letters from other scripts, like greek letters (e.g. Σ, or CJK characters (e.g. 㭪). The same holds for digits - not only 0-9, but also digits from other scripts are allowed: e.g. ୩, ٣, etc.

Valid identifiers:

abc_123
_myidentifier
Σ (U+03A3 GREEK CAPITAL LETTER SIGMA)
㭪 (some CJK character from the Lo category)
x٣३߃૩୩3 (x + decimal digits 3 from various scripts)

Invalid identifiers:

42 (does not start with a letter)
😀 (not a letter, but So / Symbol, other)
⽔ (not a letter, but So / Symbol, other)
x🌞 (starts with a letter, but contains non-letter/digit characters)

Although Go considers identifiers valid that contain other characters than A-Z, a-z, 0-9, and _, it's generally not advisable to use those - because of readability, accessibility, or even to avoid rendering issues.

DEV Community

Go: Identifiers vs. Unicode

Top comments (0)

Read next

Builder pattern / improvements

Why I Switched to Table Driven Testing approach in Go

Golang generator functions (Experimental in 1.22)

From Homemade HTTP Router to New ServeMux