Oskar Cieslik

Posted on Oct 17

Iterators - A Deep Dive into Lazy, Composable Processing

#rust #programming

In this post from Rust Bits series, we'll be doing a deep dive into Iterators, trying to showcase their lazy & composable nature of processing data.

Iterators are one of Rust's most powerful abstractions, providing a zero-cost way to process sequences of data.

They combine functional programming elegance with systems programming efficiency, and understanding them is essential for writing idiomatic Rust code.

The Iterator Trait

At the heart of Rust's iterator system lies the Iterator trait, which defines how to produce a sequence of values.

The trait is surprisingly simple, requiring only one method to be implemented:

pub trait Iterator {
    type Item;

    fn next(&mut self) -> Option<Self::Item>;
}

The Item associated type specifies what kind of values the iterator produces.

This is a form of Generic Associated Type (GAT), allowing each iterator implementation to define its own item type without the consumer needing to specify it explicitly.

The next() method is where the magic happens—it returns Some(value) for each element in the sequence, and None when the iterator is exhausted.

Example of a simple custom iterator could be a Counter from 0 to max.

Let's create the struct and implement the Iterator trait for it.

struct Counter {
    current: u32,
    max: u32,
}

impl Counter {
    fn new(max: u32) -> Self {
        Counter { current: 0, max }
    }
}

impl Iterator for Counter {
    type Item = u32;

    fn next(&mut self) -> Option<Self::Item> {
        if self.current < self.max {
            let result = self.current;
            self.current += 1;
            Some(result)
        } else {
            None
        }
    }
}

Great, now let's use it:

let counter = Counter::new(5);

println!("-- Iterating over counter --");
for num in counter {
    println!("Number: {}");
}

println!("Checking `counter.next()` after: {:?}", counter.next());

// Prints:
// -- Iterating over counter --
// Number: 0
// Number: 1
// Number: 2
// Number: 3
// Number: 4
// Checking `counter.next()` after: None

As you can see, the Iterator trait is a powerful abstraction that allows us to work with sequences of data in a flexible and efficient manner.

Now that we understand the basics of the Iterator trait, let's dive deeper into the Rust iterator system.

What makes Iterators special

At their core, iterators in Rust are lazy—they do nothing until consumed.

This laziness allows the compiler to optimize iterator chains into tight, efficient loops that rival hand-written procedural code.

When chaining operations like map(), filter(), or fold(), Rust performs only a single traversal of the data, ensuring efficient processing.

Iterators in Rust are defined by the Iterator trait, which requires implementing just one method: next(). This method specifies how to advance the iterator and return the next item.

This simplicity lets any type become iterable easily. It also gives us the flexibility to specify the iteration's internal logic as we like, decoupling it from the user-facing API.

Three ways to create an iterator

Rust provides three primary methods for creating iterators from collections, each offering different ownership semantics.

These methods differ in how they handle borrowing and ownership, allowing to choose the appropriate one based on usage.

iter() method allows iterating through the collection, while borrowing each element as an immutable reference.

This approach ensures read-only access, preventing accidental modifications during traversal.

let numbers = vec![1, 2, 3, 4, 5];

for num in numbers.iter() {
    println!("Number: {}", num);
}

// Note:
// numbers still available afterwards

some_fn(&numbers);

into_iter() method allows iterating through the collection while transferring ownership of each element.

This approach is efficient for consuming the collection without unnecessary cloning, as it moves the elements directly into the iterator.

let numbers = vec![1, 2, 3, 4, 5];

for num in numbers.into_iter() {
    println!("Owned number: {}", num);
}

// Note:
// numbers no longer available afterwards

iter_mut() method allows you to iterate through a collection while providing a mutable reference to each element.

This enables direct modification of the elements during iteration, which is useful for in-place updates without creating a new collection.

let mut numbers = vec![1, 2, 3, 4, 5];

for num in numbers.iter_mut() {
    *num *= 2;
}

// Note:
// numbers is now [2, 4, 6, 8, 10]
// numbers still available afterwards

Transforming Iterators

Iterators in Rust provide multiple methods that transform items, enabling powerful data processing pipelines.

For example, methods like map(fn) and filter(fn) allow developers to apply functions to each element or selectively process data, streamlining complex operations without manual loops.

map(fn) produces an iterator that applies passed function to each element:

let squares = (0..=10).map(|x| x * x).collect();

// Note:
// squares == [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

filter(fn) produces an iterator that ensures only elements satisfying a predicate are being iterated:

let even_numbers = (0..=10).filter(|x| x % 2 == 0).collect();

// Note:
// even_numbers == [0, 2, 4, 6, 8, 10]

Of course adapters are easily chain-able, allowing for more complex pipelines:

let even_squares = (0..=10).filter(|x| x % 2 == 0).map(|x| x * x).collect();

// Note:
// even_squares == [0, 4, 16, 36, 64, 100]

Consuming Iterators

Due to their lazy nature, iterators in Rust require consumption into a specific type to yield a result, typically achieved using built-in methods like collect().

This approach ensures efficient memory usage by deferring computation until needed.

collect() accumulates iterator's items into either specified or implicitly determined type

let numbers: Vec<i32> = vec![1, 2, 3];
let doubled: Vec<i32> = numbers.iter().map(|x| x * 2).collect();

// Note:
// doubled == [2, 4, 6]

fold() transforms iterator's items into accumulated value using specified initial value and accumulation function

let sum_of_numbers = (0..=5).fold(0, |acc, x| acc + x);

// Note:
// sum_of_squares == 15 (0 + 1 + 2 + 3 + 4 + 5)

Iterator trait includes additional built-in methods which allow us to avoid re-implementing basic functionality like sum or product of numbers (similar to our implementation using fold(initial, fn) above)

let sum_of_numbers: i32 = (0..=10).sum();

// Note:
// sum_of_squares == 15 (0 + 1 + 2 + 3 + 4 + 5)

let product_of_numbers: i32 = (1..=10).product();

// Note:
// product_of_numbers == 120 (1 * 2 * 3 * 4 * 5)

let max_of_numbers: i32 = (0..=100).max();

// Note:
// max_of_numbers == 100

Reshaping Iterators

Iterators additionally provide powerful adaptors to modify the very structure and order of the data stream itself.

These adaptors enable efficient transformations without consuming the original iterator prematurely.

step_by() modifies the traversal of an iterator by skipping items based on a specified stride.

This method allows you to process every nth element efficiently.

let even_numbers: Vec<i32> = (0..=10).step_by(2).collect();

// Note:
// even_numbers == [0, 2, 4, 6, 8, 10]

rev() reverses the direction of the traversal over an iterator.

This method is particularly useful as it does not modify the original collection.

let numbers = vec![0, 1, 2, 3, 4];
let numbers_reversed: Vec<i32> = numbers.iter().rev().collect();

// Note:
// numbers_reversed == [4, 3, 2, 1, 0]

enumerate() method modifies the traversal by wrapping each item in a tuple, with the item's index in the first position.

This allows you to access both the index and the value during iteration, making it useful for tasks like numbering elements in a collection.

let names = vec!["Jane", "Dave"];

for (index, name) in names.iter().enumerate() {
    println!("Name '{name}' appeared at index {index}");
}

// Prints:
// Name 'Jane' appeared at index 0
// Name 'Dave' appeared at index 1

zip(iterator) combines two iterators into an iterator over tuples, pairing items from each.

This function stops when the shorter iterator is exhausted, ensuring no panics from mismatched lengths.

let fruits = vec!["apple", "pear", "peach"];
let colors = vec!["green", "yellow"];

for (fruit, color) in fruits.iter().zip(colors.iter()) {
    println!("Here's {fruit} in {color} color.");
}

// Prints:
// Here's apple in green color.
// Here's pear in yellow color.
//
// Note:
// "peach" is not printed as iteration stopped as it exhausted 'colors' iterator.

Follow-Along example:

As a closing note, let's go throught little more complicated example of a Deltas iterator, which yields the distance from each element to its last occurrence.

Knowing the requirement we can create a test that validates the functionality:

#[test]
fn deltas() {
    let arr = vec![1, 1, 2, 2, 3, 3, 2, 3, 4];
    assert_eq!(Deltas::new(arr.iter()).collect::<Vec<usize>>(), vec![0, 0, 2, 0, 4, 0, 2, 1, 8]);
}

After that, we can move on to the implementation of the Deltas struct.

The struct wraps an iterator passed to it and modifies it by calling .enumerate() on it.

pub struct Deltas<I: Iterator> {
    items: Vec<(I::Item, usize)>,
    enumerate_iter: std::iter::Enumerate<I>,
}

impl<I: Iterator> Deltas<I> {
    pub(crate) fn new(iter: I) -> Self {
        Deltas {
            items: Vec::new(),
            enumerate_iter: iter.enumerate(),
        }
    }
}

Now that we have the Deltas struct, the only thing left is to implement the Iterator trait for it.

The next() method is where the magic happens. This is where we modify the iterator.

impl<I: Iterator> Iterator for Deltas<I>
where
    I::Item: std::cmp::PartialEq,
{
    type Item = usize; // The type of the items yielded by the iterator

    fn next(&mut self) -> Option<Self::Item> {
        // Get the next item from the underlying iterator
        let (next_index, next_item) = self.enumerate_iter.next()?;

        // Find the index of the last occurrence of the current item (not efficient, but keeps the example simple)
        let last_index = (self.items.iter().rev()).find_map(|(item, index)| (item == &next_item).then_some(*index));

        // Push the current item and its index to the items vector
        self.items.push((next_item, next_index));

        // Return the distance from the current item to its last occurrence as the next item
        Some(last_index.map_or(next_index, |last_idx| next_index - last_idx - 1))
    }
}

And that's it! We've implemented a lazy iterator that yields the distance from each element to its last occurrence.

As we know, the implementation of the next() method is hidden from the user. This allows us to modify the underlying iterator without breaking the contract.

Let's utilize that and improve our iterator's efficiency by getting rid of the Vec and using a HashMap instead.

pub struct Deltas<I: Iterator> {
    item_last_index_map: std::collections::HashMap<I::Item, usize>,
    enumerate_iter: std::iter::Enumerate<I>,
}

impl<I: Iterator> Deltas<I> {
    pub(crate) fn new(iter: I) -> Self {
        Deltas {
            item_last_index_map: std::collections::HashMap::new(),
            enumerate_iter: iter.enumerate(),
        }
    }
}

impl<I: Iterator, K, F> Iterator for DeltasByKey<I, F, K>
where
    F: FnMut(&I::Item) -> K,
    K: std::hash::Hash + Eq,
{
    type Item = usize;

    fn next(&mut self) -> Option<Self::Item> {
        // Get the next item from the underlying iterator
        let (next_index, next_item) = self.enumerate_iter.next()?;

        // Find the index of the last occurrence of the current item:
        // 1. If it's already in the map, use the distance from current index to last one
        // 2. If it's not in the map, use the current index as the distance
        let delta = (self.item_last_index_map.get(&next_item))
            .map(|&last_idx| next_index - last_idx - 1)
            .unwrap_or(next_index);

        // Update the item_last_index_map with the current item and its index
        self.item_last_index_map.insert(next_item, next_index);

        // Return the distance from the current item to its last occurrence as the next item
        Some(delta)
    }
}

Key Takeaways

Rust iterators represent a fundamental shift in how we process sequences, combining functional programming elegance with systems programming performance.

They're lazy by default, enabling efficient chaining of operations that compile down to optimal machine code.

Understanding the difference between iter(), into_iter(), and iter_mut() is crucial for managing ownership correctly.

With iterator adapters like map(), filter(), and fold(), complex data transformations become concise and expressive.

Most importantly, iterators aren't just syntactic sugar — they're a zero-cost abstraction that encourages writing safer, more maintainable code without sacrificing performance.