DEV Community

kingyou
kingyou

Posted on

Character Type Conversion & Encoding in Go (Golang): Practical Guide

Go (Golang) provides robust support for character and encoding operations, especially for Unicode/UTF-8. Here's a clear overview and basic practice notes for working with strings, bytes, and runes, and performing conversions between encodings in Go.


1. Character Types in Go

  • byte: Alias for uint8, represents raw binary data, each byte is a number in range 0–255.
  • rune: Alias for int32, represents a Unicode code point—can express any Unicode character.

Example:

str := "Hello! 世界"
bytes := []byte(str) // byte slice (raw data)
runes := []rune(str) // rune slice (Unicode code points)

fmt.Println(len(str))   // Number of bytes (not characters)
fmt.Println(bytes)      // Show underlying bytes
fmt.Println(runes)      // Show Unicode values for each character
Enter fullscreen mode Exit fullscreen mode

2. Encoding: UTF-8 and Unicode

  • All Go source code files use UTF-8 encoding.
  • Strings in Go are stored as read-only slices of bytes ([]byte), but are usually valid UTF-8.
  • English letters are 1 byte, but Chinese or other multi-byte characters are 3 or more bytes.

3. String Traversal

  • Iterating with for i := 0; i < len(str); i++ walks by byte, which may break multi-byte characters (e.g., Chinese).
  • Use for _, ch := range str to properly iterate character by character (rune by rune).
for _, ch := range str {
    fmt.Printf("%c ", ch)  // each ch is a full Unicode character (rune)
}
Enter fullscreen mode Exit fullscreen mode

4. Conversions Between string, []byte, []rune

  • Convert string to bytes:
  b := []byte(str)
Enter fullscreen mode Exit fullscreen mode
  • Convert string to runes (Unicode code points):
  r := []rune(str)
Enter fullscreen mode Exit fullscreen mode
  • Convert byte slice or rune slice back to string:
  s1 := string(b)
  s2 := string(r)
Enter fullscreen mode Exit fullscreen mode

5. Encoding Conversion (e.g., GBK <-> UTF-8)

If you need to convert between encodings (such as legacy Chinese GBK to UTF-8), use extra libraries like golang.org/x/text/encoding:

import (
    "bytes"
    "golang.org/x/text/encoding/simplifiedchinese"
    "golang.org/x/text/transform"
    "io/ioutil"
)

gbkData := []byte{ /* your GBK bytes */ }
reader := transform.NewReader(bytes.NewReader(gbkData), simplifiedchinese.GBK.NewDecoder())
utf8Data, err := ioutil.ReadAll(reader)
if err != nil {
    // handle error
}
fmt.Println(string(utf8Data))
Enter fullscreen mode Exit fullscreen mode

6. Extra: Base64 & JSON Encoding

Go has built-in support for various encoding schemes that are essential for data transmission and storage.

Base64 Encoding

Base64 encoding is commonly used to encode binary data into ASCII text format, making it safe for transmission over text-based protocols.

import (
    "encoding/base64"
    "fmt"
)

// Encoding to Base64
data := []byte("Hello, World!")
encoded := base64.StdEncoding.EncodeToString(data)
fmt.Println(encoded) // SGVsbG8sIFdvcmxkIQ==

// Decoding from Base64
decoded, err := base64.StdEncoding.DecodeString(encoded)
if err != nil {
    // handle error
}
fmt.Println(string(decoded)) // Hello, World!
Enter fullscreen mode Exit fullscreen mode

Go provides two main Base64 encodings:

  • StdEncoding: Standard Base64 encoding (uses + and /)
  • URLEncoding: URL-safe Base64 encoding (uses - and _ instead)

JSON Encoding with Unicode

Go's encoding/json package handles Unicode characters seamlessly, making it perfect for international applications.

import (
    "encoding/json"
    "fmt"
)

type Person struct {
    Name    string `json:"name"`
    Message string `json:"message"`
}

// Encoding JSON with Unicode
person := Person{
    Name:    "张三",
    Message: "Hello 世界",
}

jsonData, err := json.Marshal(person)
if err != nil {
    // handle error
}
fmt.Println(string(jsonData))
// Output: {"name":"张三","message":"Hello 世界"}

// Pretty print JSON
prettyJSON, _ := json.MarshalIndent(person, "", "  ")
fmt.Println(string(prettyJSON))

// Decoding JSON
var decoded Person
err = json.Unmarshal(jsonData, &decoded)
if err != nil {
    // handle error
}
fmt.Printf("%+v\n", decoded)
Enter fullscreen mode Exit fullscreen mode

Key Points:

  • JSON encoding/decoding automatically handles UTF-8 encoding
  • Use struct tags to control JSON field names
  • MarshalIndent for human-readable JSON output
  • JSON strings are always UTF-8 encoded

Top comments (0)