Go (Golang) provides robust support for character and encoding operations, especially for Unicode/UTF-8. Here's a clear overview and basic practice notes for working with strings, bytes, and runes, and performing conversions between encodings in Go.
1. Character Types in Go
-
byte: Alias foruint8, represents raw binary data, each byte is a number in range 0–255. -
rune: Alias forint32, represents a Unicode code point—can express any Unicode character.
Example:
str := "Hello! 世界"
bytes := []byte(str) // byte slice (raw data)
runes := []rune(str) // rune slice (Unicode code points)
fmt.Println(len(str)) // Number of bytes (not characters)
fmt.Println(bytes) // Show underlying bytes
fmt.Println(runes) // Show Unicode values for each character
2. Encoding: UTF-8 and Unicode
- All Go source code files use UTF-8 encoding.
- Strings in Go are stored as read-only slices of bytes (
[]byte), but are usually valid UTF-8. - English letters are 1 byte, but Chinese or other multi-byte characters are 3 or more bytes.
3. String Traversal
- Iterating with
for i := 0; i < len(str); i++walks by byte, which may break multi-byte characters (e.g., Chinese). - Use
for _, ch := range strto properly iterate character by character (rune by rune).
for _, ch := range str {
fmt.Printf("%c ", ch) // each ch is a full Unicode character (rune)
}
4. Conversions Between string, []byte, []rune
- Convert string to bytes:
b := []byte(str)
- Convert string to runes (Unicode code points):
r := []rune(str)
- Convert byte slice or rune slice back to string:
s1 := string(b)
s2 := string(r)
5. Encoding Conversion (e.g., GBK <-> UTF-8)
If you need to convert between encodings (such as legacy Chinese GBK to UTF-8), use extra libraries like golang.org/x/text/encoding:
import (
"bytes"
"golang.org/x/text/encoding/simplifiedchinese"
"golang.org/x/text/transform"
"io/ioutil"
)
gbkData := []byte{ /* your GBK bytes */ }
reader := transform.NewReader(bytes.NewReader(gbkData), simplifiedchinese.GBK.NewDecoder())
utf8Data, err := ioutil.ReadAll(reader)
if err != nil {
// handle error
}
fmt.Println(string(utf8Data))
6. Extra: Base64 & JSON Encoding
Go has built-in support for various encoding schemes that are essential for data transmission and storage.
Base64 Encoding
Base64 encoding is commonly used to encode binary data into ASCII text format, making it safe for transmission over text-based protocols.
import (
"encoding/base64"
"fmt"
)
// Encoding to Base64
data := []byte("Hello, World!")
encoded := base64.StdEncoding.EncodeToString(data)
fmt.Println(encoded) // SGVsbG8sIFdvcmxkIQ==
// Decoding from Base64
decoded, err := base64.StdEncoding.DecodeString(encoded)
if err != nil {
// handle error
}
fmt.Println(string(decoded)) // Hello, World!
Go provides two main Base64 encodings:
-
StdEncoding: Standard Base64 encoding (uses+and/) -
URLEncoding: URL-safe Base64 encoding (uses-and_instead)
JSON Encoding with Unicode
Go's encoding/json package handles Unicode characters seamlessly, making it perfect for international applications.
import (
"encoding/json"
"fmt"
)
type Person struct {
Name string `json:"name"`
Message string `json:"message"`
}
// Encoding JSON with Unicode
person := Person{
Name: "张三",
Message: "Hello 世界",
}
jsonData, err := json.Marshal(person)
if err != nil {
// handle error
}
fmt.Println(string(jsonData))
// Output: {"name":"张三","message":"Hello 世界"}
// Pretty print JSON
prettyJSON, _ := json.MarshalIndent(person, "", " ")
fmt.Println(string(prettyJSON))
// Decoding JSON
var decoded Person
err = json.Unmarshal(jsonData, &decoded)
if err != nil {
// handle error
}
fmt.Printf("%+v\n", decoded)
Key Points:
- JSON encoding/decoding automatically handles UTF-8 encoding
- Use struct tags to control JSON field names
-
MarshalIndentfor human-readable JSON output - JSON strings are always UTF-8 encoded
Top comments (0)