DEV Community

Cover image for Graphemes in Go
Ashwin Gopalsamy
Ashwin Gopalsamy

Posted on • Originally published at ashwingopalsamy.substack.com

Graphemes in Go

I once ran into this problem of differentiating runes, bytes and graphemes while handling names in Tamil and emoji in a Go web app: a string that looked short wasn’t, and reversing it produced gibberish. The culprit wasn’t Go being flawed, it was me making assumptions about what “a character” means.

Let’s map the territory precisely:

1. Bytes. The raw material Go calls a string

Go represents strings as immutable UTF-8 byte sequences.

What we see isn’t what Go handles under the hood.

s := "வணக்கம்"
fmt.Println(len(s)) // 21
Enter fullscreen mode Exit fullscreen mode

The length is 21 bytes not visible symbols. Every Tamil character can span 3 bytes. Even simple-looking emojis stretch across multiple bytes.

2. Runes. Unicode code points

string[]rune( gives you code points, but still not what a human perceives.

rs := []rune(s)
fmt.Println(len(rs)) // 7
Enter fullscreen mode Exit fullscreen mode

Here it’s 7 runes, but some Tamil graphemes (like “க்”) combine two runes: + .

3. Grapheme clusters the units users actually see

Go’s standard library stops at runes. To work with visible characters, you need a grapheme-aware library, like github.com/rivo/uniseg.

for gr := uniseg.NewGraphemes(s); gr.Next(); {
    fmt.Printf("%q\n", gr.Str())
}
Enter fullscreen mode Exit fullscreen mode

That outputs what a human reads “வ”, “ண”, “க்”, “க”, “ம்”, and even “❤️” as a single unit.


Why this matters

If your app deals with names, chats, or any multilingual text indexing by bytes will break things. Counting runes helps, but can still split what you intend as one unit. Grapheme-aware operations align with what users actually expect.

Real bugs I’ve seen: Tamil names chopped mid-character, emoji reactions breaking because only one code point was taken.


To put it simply

Task Approach
Count code points utf8.RuneCountInString(s)
Count visible units Grapheme iteration (uniseg)
Reverse text Parse into graphemes, reverse slice, join
Slice safely Only use s[i:j] on grapheme boundaries

Think about what you intend to manipulate: the raw bytes, the code points, or what a user actually reads on screen and choose the right level.

Top comments (0)