Oleg Goncharov

Posted on Nov 6

STL Algorithms in C++: From Beginner to Expert

#cpp #stl #algorithms

A comprehensive guide to effective use of standard algorithms with analysis of implementations, pitfalls, and best practices

Introduction

STL algorithms are not just a convenient set of functions. They are the foundation of modern C++, enabling you to write expressive, safe, and performant code. Behind the apparent simplicity lies a powerful infrastructure—deep understanding of which separates professionals from novices.

In this article, we'll journey from basic usage to expert level: we'll examine internal implementations, reveal pitfalls, and study best practices and modern idioms.

Part 1: Architecture and Fundamental Principles

What are STL Algorithms Under the Hood?

STL algorithms are built on five pillars:

Iterators — abstraction over pointers
Concepts — type requirements (formalized in C++20)
Function objects — predicates and operations
Ranges — abstraction over iterator pairs (C++20)
Complexity theory — performance guarantees

Fundamental Principle: Separation of Algorithms and Data Structures

// One algorithm works with any container
std::vector<int> vec = {3, 1, 4, 1, 5};
std::list<int> lst = {3, 1, 4, 1, 5};
std::deque<int> deq = {3, 1, 4, 1, 5};

// Same algorithm!
std::sort(vec.begin(), vec.end());    // O(n log n)
std::sort(lst.begin(), lst.end());    // ❌ Doesn't compile! list doesn't support random access
std::sort(deq.begin(), deq.end());    // O(n log n)

Why doesn't std::sort work with std::list?

The sort algorithm requires RandomAccessIterator, while list only provides BidirectionalIterator.

Iterator Categories and Their Impact on Performance

Category	Supported Operations	Example Containers	Algorithms
InputIterator	`++it`, `*it` (read)	`istream_iterator`	`find`, `copy`
OutputIterator	`++it`, `*it` (write)	`ostream_iterator`	`copy`, `fill`
ForwardIterator	Input + Output + multiple passes	`forward_list`	`replace`, `rotate`
BidirectionalIterator	Forward + `--it`	`list`, `set`, `map`	`reverse`, `partition`
RandomAccessIterator	Bidirectional + `it[n]`, `it+n`	`vector`, `deque`, `array`	`sort`, `nth_element`

Critically important: Choosing the wrong iterator category can degrade performance from O(n) to O(n²).

Part 2: Complexity Analysis and Algorithm Performance

STL Algorithm Complexity Guarantees

The C++ standard strictly regulates algorithm complexity. These are not recommendations—they are mandatory requirements for implementations.

Search Algorithms

Algorithm	Requirements	Complexity	When to Use
`find`	InputIterator	O(n)	Unordered data
`binary_search`	RandomAccessIterator + sorted	O(log n)	Sorted data
`lower_bound`	RandomAccessIterator + sorted	O(log n)	Find insertion position
`equal_range`	RandomAccessIterator + sorted	O(log n)	Range of equal elements

Sorting Algorithms

Algorithm	Average Complexity	Worst-case Complexity	Stable	Extra Memory
`sort`	O(n log n)	O(n log n)*	❌ No	O(log n)
`stable_sort`	O(n log n)	O(n log²n)	✅ Yes	O(n)
`partial_sort`	O(n log k)	O(n log k)	❌ No	O(1)
`nth_element`	O(n)	O(n)	❌ No	O(1)

* C++11 requires worst-case O(n log n) for std::sort

Pitfall #1: Performance Degradation of `std::sort`

// ❌ Bad: worst-case O(n²) in old implementations
std::vector<int> v = {1, 2, 3, 4, 5}; // already sorted
std::sort(v.begin(), v.end());

// ✅ Modern implementations use introsort:
// Quicksort + switch to heapsort on deep recursion

Introsort (introspective sort) is a hybrid algorithm:

Starts with quicksort
Switches to heapsort if recursion depth exceeds 2×log₂(n)
Uses insertion sort for small subarrays (< 16 elements)

Real Benchmarks and Unexpected Results

Let's consider a real benchmark of vector element summation:

std::vector<int> v(10'000'000);
std::iota(v.begin(), v.end(), 1); // fill with 1, 2, 3, ...

// Method 1: std::accumulate
auto sum1 = std::accumulate(v.begin(), v.end(), 0LL);

// Method 2: manual loop with unrolling
long long sum2 = 0;
for (size_t i = 0; i < v.size(); i += 4) {
    sum2 += v[i] + v[i+1] + v[i+2] + v[i+3];
}

// Method 3: range-based for
long long sum3 = 0;
for (auto x : v) sum3 += x;

Benchmark results (GCC -O3):

std::accumulate: ~8-9 ms
Manual unrolled loop: ~4-5 ms
Range-based for: ~8-9 ms

Conclusion: STL algorithms are optimized for the general case, but specialized solutions can be faster.

Part 3: Pitfalls and Typical Mistakes

Pitfall #2: Iterator Invalidation

std::vector<int> v = {1, 2, 3, 4, 5};
auto it = v.begin();

// ❌ DANGEROUS: iterator may be invalidated
std::remove_if(v.begin(), v.end(), [](int x){ return x % 2 == 0; });
std::cout << *it; // Undefined behavior!

// ✅ Correct: update iterator
it = std::remove_if(v.begin(), v.end(), [](int x){ return x % 2 == 0; });
v.erase(it, v.end());

Pitfall #3: Incorrect Use of erase-remove

std::vector<int> v = {1, 2, 3, 2, 4, 2, 5};

// ❌ WRONG: elements remain in the vector!
std::remove(v.begin(), v.end(), 2);
// v = {1, 3, 4, 5, ?, ?, ?} — garbage at the end!

// ✅ CORRECT: erase-remove idiom
v.erase(std::remove(v.begin(), v.end(), 2), v.end());
// v = {1, 3, 4, 5}

Why this happens? std::remove doesn't delete elements—it moves non-removed elements to the beginning and returns an iterator to the new "logical end".

Pitfall #4: Performance of `std::find_first_of`

std::string text = "Hello, World!";
std::string chars = "aeiou";

// ❌ Slow: O(n×m) — double loop
auto pos = std::find_first_of(text.begin(), text.end(), 
                             chars.begin(), chars.end());

// ✅ Fast: use specialized function
auto pos2 = text.find_first_of(chars); // optimized implementation

Explanation: std::find_first_of is implemented as a double loop, giving O(n×m) complexity. For strings, it's better to use specialized methods.

Pitfall #5: Wrong Choice of Sort Algorithm

struct Person {
    std::string name;
    int age;

    bool operator<(const Person& other) const {
        return age < other.age; // sort by age
    }
};

std::vector<Person> people = {
    {"Alice", 25}, {"Bob", 30}, {"Charlie", 25}
};

// ❌ sort doesn't guarantee stability!
std::sort(people.begin(), people.end());
// Alice and Charlie may swap positions!

// ✅ For stable sorting
std::stable_sort(people.begin(), people.end());
// Order of elements with equal keys is preserved

Pitfall #6: Function Object Issues

// ❌ Inefficient: std::function adds overhead
std::function<bool(int)> predicate = [](int x) { return x > 5; };
auto count = std::count_if(v.begin(), v.end(), predicate);

// ✅ Efficient: direct lambda usage
auto count2 = std::count_if(v.begin(), v.end(), [](int x) { return x > 5; });

// ✅ Even better: compiler can vectorize for simple lambdas

Reason: std::function uses type erasure, which adds function call overhead.

Part 4: Modern Idioms and Best Practices

Idiom #1: Transform-if Pattern

Standard doesn't provide "transform_if", but we can implement it elegantly:

// Task: transform even numbers to their squares
std::vector<int> v = {1, 2, 3, 4, 5, 6};
std::vector<int> result;

// ❌ Naive approach: two passes
std::vector<int> temp;
std::copy_if(v.begin(), v.end(), std::back_inserter(temp), 
             [](int x){ return x % 2 == 0; });
std::transform(temp.begin(), temp.end(), std::back_inserter(result),
               [](int x){ return x * x; });

// ✅ Elegant solution: one pass
std::for_each(v.begin(), v.end(), [&result](int x) {
    if (x % 2 == 0) {
        result.push_back(x * x);
    }
});

// ✅ C++20: ranges make it even simpler
auto transformed = v | std::views::filter([](int x){ return x % 2 == 0; })
                     | std::views::transform([](int x){ return x * x; });

Idiom #2: Accumulate with Custom Operation

// Find product of all elements
auto product = std::accumulate(v.begin(), v.end(), 1, std::multiplies<int>());

// String concatenation (but be careful with performance!)
std::vector<std::string> words = {"Hello", " ", "World", "!"};
auto sentence = std::accumulate(words.begin(), words.end(), std::string{});

// ✅ Better for strings: use reserve
auto sentence2 = std::accumulate(words.begin(), words.end(), std::string{},
    [](std::string acc, const std::string& word) {
        return acc + word;
    });

Idiom #3: Conditional Operations with Algorithms

// Apply operation only to some elements
std::vector<int> v = {1, 2, 3, 4, 5};

// Increment even numbers by 10
std::for_each(v.begin(), v.end(), [](int& x) {
    if (x % 2 == 0) x += 10;
});

// Alternative via transform with condition
std::transform(v.begin(), v.end(), v.begin(), [](int x) {
    return x % 2 == 0 ? x + 10 : x;
});

Idiom #4: Safe Element Access with Algorithms

// ✅ Safe search with check
auto it = std::find(v.begin(), v.end(), target);
if (it != v.end()) {
    // element found
    *it = new_value;
}

// ✅ Search with predicate
auto it2 = std::find_if(v.begin(), v.end(), [](int x){ return x > 10; });

// ✅ Safe search (C++20)
if (auto it = std::ranges::find(v, target); it != v.end()) {
    // use structured binding for readability
}

Part 5: Expert Techniques and Optimizations

Technique #1: Custom Allocators for Performance

// Example: pool allocator for frequent allocations
template<typename T>
class PoolAllocator {
    // ... pool allocator implementation
};

std::vector<int, PoolAllocator<int>> v;
// Algorithms work without changes!
std::sort(v.begin(), v.end());

Technique #2: Optimizing Predicates

// ❌ Inefficient predicate
auto count = std::count_if(strings.begin(), strings.end(), 
    [](const std::string& s) {
        return std::regex_match(s, std::regex("\\d+")); // compiled every time!
    });

// ✅ Optimized predicate
static const std::regex digit_regex("\\d+");
auto count2 = std::count_if(strings.begin(), strings.end(),
    [](const std::string& s) {
        return std::regex_match(s, digit_regex); // regex compiled once
    });

Technique #3: Parallel Algorithms (C++17)

#include <execution>

// Parallel sorting
std::sort(std::execution::par, v.begin(), v.end());

// Parallel search
auto it = std::find(std::execution::par, v.begin(), v.end(), target);

// Parallel transformation
std::transform(std::execution::par, v.begin(), v.end(), result.begin(),
    [](int x) { return expensive_computation(x); });

When to use parallel algorithms:

✅ Large data volumes (> 10,000 elements)
✅ CPU-intensive operations
✅ Multi-core system
❌ I/O-bound operations
❌ Operations with side effects

Technique #4: Custom Comparators and Optimization

struct Person {
    std::string name;
    int age;
    double salary;
};

// ❌ Inefficient: many string comparisons
auto bad_comp = [](const Person& a, const Person& b) {
    if (a.name != b.name) return a.name < b.name;
    if (a.age != b.age) return a.age < b.age;
    return a.salary < b.salary;
};

// ✅ Efficient: compare numbers first
auto good_comp = [](const Person& a, const Person& b) {
    if (a.age != b.age) return a.age < b.age;
    if (a.salary != b.salary) return a.salary < b.salary;
    return a.name < b.name; // strings last
};

std::sort(people.begin(), people.end(), good_comp);

Part 6: Ranges and the Future of STL (C++20+)

The Ranges Revolution

#include <ranges>

std::vector<int> v = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

// ✅ C++20: readable pipeline
auto result = v 
    | std::views::filter([](int x) { return x % 2 == 0; })  // even
    | std::views::transform([](int x) { return x * x; })    // square
    | std::views::take(3);                                  // first 3

// Lazy evaluation! Computations happen only on iteration
std::vector<int> output;
std::ranges::copy(result, std::back_inserter(output));
// output = {4, 16, 36}

Advantages of Ranges

Composability: easy to combine operations
Lazy evaluation: computations on demand
Type safety: better type checking
Readability: code reads like English text

Migration from Classic Algorithms

// Old style
std::sort(v.begin(), v.end());
auto it = std::find(v.begin(), v.end(), target);

// New style (C++20)
std::ranges::sort(v);
auto it = std::ranges::find(v, target);

Part 7: Real Examples and Use Cases

Use Case #1: Processing Log Files

// Task: find top 10 IP addresses by request count
struct LogEntry {
    std::string ip;
    std::string timestamp;
    std::string request;
};

std::vector<LogEntry> logs = read_log_file("access.log");

// Count requests by IP
std::map<std::string, int> ip_counts;
std::for_each(logs.begin(), logs.end(), [&ip_counts](const LogEntry& entry) {
    ip_counts[entry.ip]++;
});

// Convert to vector of pairs for sorting
std::vector<std::pair<std::string, int>> ip_vector(ip_counts.begin(), ip_counts.end());

// Sort by count (descending)
std::partial_sort(ip_vector.begin(), ip_vector.begin() + 10, ip_vector.end(),
    [](const auto& a, const auto& b) {
        return a.second > b.second; // descending count
    });

// Top 10 ready in first 10 elements

Why partial_sort? We only need top 10, so full sorting is unnecessary. partial_sort has O(n log k) complexity where k=10.

Use Case #2: Processing Sensor Data

// Task: smooth sensor data with sliding average
std::vector<double> sensor_data = read_sensor_data();
std::vector<double> smoothed;
const size_t window_size = 5;

// Sliding average using algorithms
for (size_t i = window_size; i <= sensor_data.size(); ++i) {
    auto begin = sensor_data.begin() + i - window_size;
    auto end = sensor_data.begin() + i;

    double average = std::accumulate(begin, end, 0.0) / window_size;
    smoothed.push_back(average);
}

// C++20 version with ranges
auto windowed = sensor_data 
    | std::views::slide(window_size)
    | std::views::transform([window_size](auto window) {
        return std::accumulate(window.begin(), window.end(), 0.0) / window_size;
    });

Use Case #3: Data Validation and Cleanup

struct User {
    std::string email;
    int age;
    std::string country;
};

std::vector<User> users = load_users();

// Multi-stage data cleanup
// 1. Remove users with invalid email
users.erase(
    std::remove_if(users.begin(), users.end(), [](const User& u) {
        return !is_valid_email(u.email);
    }),
    users.end()
);

// 2. Remove users with invalid age
users.erase(
    std::remove_if(users.begin(), users.end(), [](const User& u) {
        return u.age < 0 || u.age > 150;
    }),
    users.end()
);

// 3. Normalize countries to uppercase
std::for_each(users.begin(), users.end(), [](User& u) {
    std::transform(u.country.begin(), u.country.end(), u.country.begin(), ::toupper);
});

// 4. Sort by email for fast lookup
std::sort(users.begin(), users.end(), [](const User& a, const User& b) {
    return a.email < b.email;
});

Part 8: Performance Measurement and Profiling

Correct Time Measurement

#include <chrono>

template<typename Func>
auto measure_time(Func&& func) {
    auto start = std::chrono::high_resolution_clock::now();
    func();
    auto end = std::chrono::high_resolution_clock::now();
    return std::chrono::duration_cast<std::chrono::microseconds>(end - start);
}

// Usage
std::vector<int> v(1'000'000);
std::iota(v.begin(), v.end(), 1);

auto time1 = measure_time([&v]() {
    std::sort(v.begin(), v.end());
});

auto time2 = measure_time([&v]() {
    std::stable_sort(v.begin(), v.end());
});

std::cout << "sort: " << time1.count() << " μs\n";
std::cout << "stable_sort: " << time2.count() << " μs\n";

Microbenchmarks: What You Need to Know

// ❌ Incorrect benchmark
for (int i = 0; i < 1000; ++i) {
    std::vector<int> v(1000);
    auto start = std::chrono::high_resolution_clock::now();
    std::sort(v.begin(), v.end()); // sorting zeros!
    auto end = std::chrono::high_resolution_clock::now();
    // ... time measurement
}

// ✅ Correct benchmark
for (int i = 0; i < 1000; ++i) {
    std::vector<int> v(1000);
    std::iota(v.begin(), v.end(), 1);
    std::shuffle(v.begin(), v.end(), std::mt19937{42}); // random data

    auto start = std::chrono::high_resolution_clock::now();
    std::sort(v.begin(), v.end());
    auto end = std::chrono::high_resolution_clock::now();

    // Use result so compiler doesn't optimize away
    benchmark::DoNotOptimize(v.data());
}

Part 9: When NOT to Use STL Algorithms

Case #1: Trivial Operations on Small Data

std::array<int, 4> small_array = {1, 2, 3, 4};

// ❌ Overkill for 4 elements
auto sum = std::accumulate(small_array.begin(), small_array.end(), 0);

// ✅ Simpler and faster
int sum2 = small_array[0] + small_array[1] + small_array[2] + small_array[3];

Case #2: Specific Logic with Early Exits

// Task: find first pair of adjacent elements with sum > 100
std::vector<int> v = {...};

// ❌ Inefficient through STL
auto it = std::adjacent_find(v.begin(), v.end(), [](int a, int b) {
    return a + b > 100;
});

// ✅ Manual loop may be clearer and faster
for (auto it = v.begin(); it != v.end() - 1; ++it) {
    if (*it + *(it + 1) > 100) {
        // found, can exit immediately
        break;
    }
}

Case #3: SIMD Optimizations

// For performance-critical operations
// may require manual SIMD optimization
void vectorized_add(const float* a, const float* b, float* result, size_t count) {
    // AVX/SSE instructions for parallel addition
    // Can be orders of magnitude faster than std::transform
}

Part 10: Best Practices and Recommendations

Practice #1: Prefer Algorithms Over Manual Loops

// ❌ Manual search
bool found = false;
for (auto it = v.begin(); it != v.end(); ++it) {
    if (*it == target) {
        found = true;
        break;
    }
}

// ✅ Use algorithm
bool found2 = std::find(v.begin(), v.end(), target) != v.end();

// ✅ Even better (C++20)
bool found3 = std::ranges::contains(v, target);

Practice #2: Choose the Right Algorithm

// For searching in sorted data
std::vector<int> sorted_v = {1, 3, 5, 7, 9, 11};

// ❌ Slow: O(n)
auto it1 = std::find(sorted_v.begin(), sorted_v.end(), 7);

// ✅ Fast: O(log n)
bool found = std::binary_search(sorted_v.begin(), sorted_v.end(), 7);
auto it2 = std::lower_bound(sorted_v.begin(), sorted_v.end(), 7);

Practice #3: Be Careful with Function Objects

// ❌ Avoid std::function for hot paths
std::function<bool(int)> pred = [](int x) { return x > 0; };

// ✅ Use direct lambdas
auto count = std::count_if(v.begin(), v.end(), [](int x) { return x > 0; });

Practice #4: Use Const-Correctness

const std::vector<int> cv = {1, 2, 3, 4, 5};

// ✅ Correct: const-algorithms for const-data
auto count = std::count(cv.cbegin(), cv.cend(), 3);
auto it = std::find_if(cv.cbegin(), cv.cend(), [](int x) { return x > 3; });

Practice #5: Handle Edge Cases

std::vector<int> v; // may be empty!

// ❌ Dangerous
auto max_val = *std::max_element(v.begin(), v.end()); // UB if empty!

// ✅ Safe
if (!v.empty()) {
    auto max_val = *std::max_element(v.begin(), v.end());
}

// ✅ Even better
auto it = std::max_element(v.begin(), v.end());
if (it != v.end()) {
    auto max_val = *it;
}

Conclusion and Development Roadmap

Key Takeaways

STL algorithms are not just convenience—they're the foundation of modern C++
Understanding complexity is critical — wrong choice can kill performance
Pitfalls are real — iterator invalidation, erase-remove, predicate selection
Modern techniques — ranges, parallel algorithms, custom allocators
Measure and profile — assumptions are often wrong

Evolution of STL: What's Next?

C++23 and beyond:

More ranges views and algorithms
Better constexpr support
Optimized containers (flat_map, flat_set)
Improved parallel algorithms

Final Advice

"STL algorithms are the language of professional C++. Mastery of them separates professionals from novices. But remember: the tool must fit the task. Sometimes a simple loop beats a complex algorithm."

Practice, measure, and let your code be expressive and efficient!

Useful Resources:

cppreference.com — comprehensive reference
Algorithm library — complete algorithm list
C++ Core Guidelines — modern recommendations
Sean Parent's talks — deep talks on algorithms