8.4.0. Chapter Overview
Chapter 8 is mainly about common collections in Rust. Rust provides many collection-like data structures, and these collections can hold many values. However, the collections covered in Chapter 8 are different from arrays and tuples.
The collections in Chapter 8 are stored on the heap rather than on the stack. That also means their size does not need to be known at compile time; at runtime, they can grow or shrink dynamically.
This chapter focuses on three collections: Vector, String (this article), and HashMap.
If you find this helpful, please like, bookmark, and follow. To keep learning along, follow this series.
8.4.1. You Cannot Use Indexing to Access String
String in Rust is different from that in other languages: you cannot access it by indexing. Example:
fn main() {
let s = String::from("6657 up up");
let a = s[0];
}
Output:
error[E0277]: the type `str` cannot be indexed by `{integer}`
--> src/main.rs:3:15
|
3 | let a = s[0];
| ^ string indices are ranges of `usize`
|
= help: the trait `SliceIndex<str>` is not implemented for `{integer}`, which is required by `String: Index<_>`
= note: you can use `.chars().nth()` or `.bytes().nth()`
for more information, see chapter 8 in The Book: <https://doc.rust-lang.org/book/ch08-02-strings.html#indexing-into-strings>
= help: the trait `SliceIndex<[_]>` is implemented for `usize`
= help: for that trait implementation, expected `[_]`, found `str`
= note: required for `String` to implement `Index<{integer}>`
The error says that the String type cannot be indexed with an integer. Looking further down at the = help line, we can see that this type does not implement the Index<{integer}> trait.
8.4.2. Internal Representation of String
String is a wrapper around Vec<u8>, where u8 means a byte. We can use the len() method on String to return the string length. Example:
fn main() {
let len = String::from("Niko").len();
println!("{}", len);
}
Output:
4
This string uses UTF-8 encoding, and len is 4, which means the string occupies 4 bytes. So in this example, each letter takes up one byte.
But that is not always the case. For example, if we change the string to another language (here, Russian written in Cyrillic):
fn main() {
let hello = String::from("Здравствуйте");
println!("{}", hello.len());
}
If you count the letters in this string, there are 12, but the output is:
24
That means each letter in this language takes up two bytes (Chinese characters take three bytes each). The term used to refer to a “letter” here is a Unicode scalar value, and each Cyrillic letter here corresponds to two bytes.
From this example, you can see that numeric indexing into String does not always correspond to a complete Unicode scalar value, because some scalar values occupy more than one byte, while numeric indexing can only read one byte at a time.
Another example: the Cyrillic letter З corresponds to two bytes, whose values are 208 and 151. If numeric indexing were allowed, then taking index 0 of Здравствуйте would give you 208, which by itself is meaningless because it is missing the second byte needed to form a Unicode scalar value. So to avoid this kind of bug that would be hard to notice immediately, Rust bans numeric indexing on String, preventing misunderstandings early in development.
8.4.3. Bytes, Scalar Values, and Grapheme Clusters
There are three ways to view strings in Rust: bytes, scalar values, and grapheme clusters. Among them, grapheme clusters are the closest to what we usually call “letters.”
1. Bytes
Example:
fn main() {
let s = String::from("नमस्ते"); // Hindi written in Devanagari script
for b in s.bytes() {
print!("{} ", b);
}
}
This Devanagari string may look like it contains four letters. We use the .bytes() method to get the bytes it corresponds to. The output is:
224 164 168 224 164 174 224 164 184 224 165 141 224 164 164 224 165 135
These 18 bytes show how the computer stores the string.
2. Scalar Values
Now let’s view it as Unicode scalar values:
fn main() {
let s = String::from("नमस्ते");
for b in s.chars() {
print!("{} ", b);
}
}
Using the .chars() method gives the scalar values corresponding to this string. The output is:
न म स ् त े
It has 6 scalar values, and some of them are combining marks rather than standalone letters. They only make sense when combined with the preceding characters.
This also explains why this Devanagari string takes 18 bytes: each of the 6 scalar values takes 3 bytes, and 6 × 3 gives 18 bytes.
3. Grapheme Clusters
Because obtaining grapheme clusters from a String is complicated, the Rust standard library does not provide this functionality. We will not demonstrate it here, but you can use a third-party crate from crates.io to implement it.
In short, if this string were printed as grapheme clusters, it would look like this:
8.4.4. Why String Cannot Be Indexed
- Numeric indexing may return an incomplete value that cannot form a full Unicode scalar value, leading to bugs that are not immediately visible.
- Indexing is supposed to take constant time, or
O(1), butStringcannot guarantee that, because it must traverse the entire contents from beginning to end to determine how many valid characters it contains.
8.4.5. Slicing String
You can use [] with a range inside it to create a string slice. For detailed coverage of string slices, see Chapter 4.5, Slices. Example:
fn main() {
let hello = String::from("Здравствуйте");
let s = &hello[0..4];
println!("{}", s);
}
As mentioned earlier, one Cyrillic letter takes two bytes. This string slice takes the first 4 bytes of the string, which means the first two letters. The output is:
Зд
What if the string slice takes the first three bytes instead? That would mean the slice contains the first letter plus half of the second letter. What happens in that case? Look at the following example:
fn main() {
let hello = String::from("Здравствуйте");
let s = &hello[0..3];
println!("{}", s);
}
Output:
byte index 3 is not a char boundary; it is inside 'д' (bytes 2..4) of `Здравствуйте`
The program triggers panic!, and the error message says that index 3 is not a char boundary. In other words, slicing must follow char boundaries. For Cyrillic, that means slicing in units of two bytes.
8.4.6. Iterating Over String
- For scalar values, use the
.chars()method. Example:
fn main() {
let s = String::from("नमस्ते");
for b in s.chars() {
print!("{} ", b);
}
}
- For bytes, use the
.bytes()method. Example:
fn main() {
let s = String::from("नमस्ते");
for b in s.bytes() {
print!("{} ", b);
}
}
- For grapheme clusters, the standard library does not provide a method, but you can use a third-party crate.

Top comments (0)