C++ Standard Container Inheritance

#algorithms #computerscience #performance #cpp

Standard container inheritance is, in general, considered a design flaw and should be avoided, as is private inheritance. In this article, I’ll discuss what I consider a good usage of standard container inheritance and private inheritance using the k‑smallest problem.

K‑Smallest Problem

Suppose you want to keep the k‑smallest values from a stream. As you might know, one efficient way to achieve that is to use a priority_queue, with a time complexity of O(n*log k), where n is the number of values and k is the number of desired values. The following code generates 10 numbers, from 1 to 10, and outputs the 3 smallest numbers:

#include <iostream>
#include <queue>
#include <ranges>
#include <vector>

int main() {
  constexpr size_t k = 3;
  constexpr size_t n = 10;
  auto stream = std::views::iota(1u) | std::views::take(n);

  std::priority_queue<unsigned int> max_heap;
  for (auto value : stream) {
    max_heap.push(value);
    if (max_heap.size() > k) {
      max_heap.pop();
    }
  }

  std::vector<unsigned int> smallest_elements;
  smallest_elements.reserve(k);
  while (!max_heap.empty()) {
    smallest_elements.push_back(max_heap.top());
    max_heap.pop();
  }

  std::println(std::cout, "Top {} smallest numbers from the stream: {}",
    k, smallest_elements);
}

Top 3 smallest numbers from the stream: [3, 2, 1]

As you might have seen, converting a priority_queue to a vector is not trivial, since there is no way to iterate over its elements. Elements must be removed one by one, taking O(k*log k) time.

K‑Smallest Problem Extended

Let's extend this problem to something more complex, where we need the k‑smallest values sent to the system whenever a new value is received. For this example, let’s move the implementation into a class with two methods, push and to_vector, and make it generic by allowing the class to handle both the k‑smallest and the k‑largest problems. In the new main, for every value received, the value is pushed into our container, then converted to a vector, and the resulting vector is sent to the system.

template<typename T, typename Compare>
class k_elements_container {
public:
  using value_type = T;
  using value_compare = Compare;

  k_elements_container(size_t k) : k_(k) {}

  void push(value_type&& value) {
    heap_.push(value);
    if (heap_.size() > k_)
      heap_.pop();
  }

  std::vector<T> to_vector() {
    std::vector<unsigned int> result;
    result.reserve(k_);
    while (!heap_.empty()) {
      result.push_back(heap_.top());
      heap_.pop();
    }

    heap_.push_range(result);
    return result;
  }

private:
  std::priority_queue<T, std::vector<T>, Compare> heap_;
  size_t k_;
};

// Aliases
template<typename T>
using k_smallest_elements_container = k_elements_container<T, std::less<T>>;
template<typename T>
using k_largest_elements_container = k_elements_container<T, std::greater<T>>;


void send(std::vector<unsigned int>&& /*k_smallest*/) {
  // Send to the system
}

int main() {
  constexpr size_t k = 3;
  constexpr size_t n = 10;
  auto stream = std::views::iota(1u) | std::views::take(n);
  k_smallest_elements_container<unsigned int> heap(k);
  for (auto value : stream) {
    heap.push(std::move(value));
    send(heap.to_vector());
  }
}

With this new problem, to_vector is just cumbersome; we empty the priority_queue, then push the value back in. Lucky for us, with C++23, it’s possible to push all the values at once, which has a time complexity of O(k) instead of O(k*log k) with single element push, but the complexity of to_vector is still O(k*log k) because we must pop all the values from the priority_queue. The global time complexity is now O(n*k log k).

Inheriting from priority_queue

The priority_queue class takes a container type as a template argument, which defaults to vector. Wouldn’t it be nice to have access to it and iterate on it directly, taking O(k) time?

Accessing the underlying container can be done by inheriting from priority_queue, which allows access to two protected members:

Member objects

Member name	Description
Container C	the underlying container (protected member object)
Compare Comp	the comparison function object (protected member object)

Here’s the new code:

template<typename T, typename Compare>
class k_elements_container :
  public std::priority_queue<T, std::vector<T>, Compare> {
  using underlying_type = std::priority_queue<T, std::vector<T>, Compare>;
public:
  using value_type = typename underlying_type::value_type;
  using value_compare = typename underlying_type::value_compare;

  k_elements_container(size_t k) : k_(k) {
    this->c.reserve(k_+1);
  }

  void push(value_type&& value) {
    std::priority_queue<T, std::vector<T>, Compare>::push(std::move(value));
    if (this->size() > k_)
      this->pop();
  }

  std::vector<T> to_vector() {
    return this->c;
  }

private:
  size_t k_;
};

As previously mentioned, this implementation provides better runtime performance, with a global time complexity of O(n*k). As you might have noticed, this implementation also allows calling reserve on the underlying vector in the constructor, which disables reallocation since the maximum size is known at the time of creation.

Since this is a public inheritance, it's possible to call all priority_queue’s members, which is not desired. There is also a greater problem with this public inheritance; k_elements_container can be held by a priority_queue pointer, which will result in incorrect behavior for the push method, and the destruction will result in undefined behavior since push and the destructor are not declared virtual in priority_queue. For example, let’s see what happens with push:

k_smallest_elements_container<unsigned int> heap(3);
heap.push(1);
heap.push(2);
heap.push(3);
auto fun = [](std::priority_queue<unsigned int>& std_heap) {std_heap.push(4);};
fun(heap);
print_top_k(k, heap.to_vector());

Top 3 smallest numbers from the stream: [4, 3, 2, 1]

As you can see, the heap size constraint is no longer respected, and the resulting values are incorrect.

One way to solve this problem, and probably the best way, is to use private inheritance. This will improve the class usability while allowing direct access to the underlying container of priority_queue:

template<typename T, typename Compare>
class k_elements_container :
  private std::priority_queue<T, std::vector<T>, Compare> {
  // ...
};

With this small change, the previous code no longer compiles, as intended.

As you saw previously, inheriting from priority_queue also gives us access to the comparison operator. This allows for another performance improvement, although smaller. The current implementation always pushes the new value into the heap, even when it’s the largest. In this case, the heap is rebalanced twice: push, then pop, even though the value could have been discarded. For small k and large n, since we are using a uniform distribution, this will happen most of the time. By comparing the value to the top of the heap, it’s possible to discard the value preemptively and avoid paying the cost of pushing and popping the same value, which both have a time complexity of O(log k).

template<typename T, typename Compare>
class k_elements_container :
  private std::priority_queue<T, std::vector<T>, Compare> {
  using underlying_type = std::priority_queue<T, std::vector<T>, Compare>;
public:
  using value_type = typename underlying_type::value_type;
  using value_compare = typename underlying_type::value_compare;

  k_elements_container(size_t k) {
    this->c.reserve(k);
  }

  void push(value_type&& value) {
    if (this->size() < this->c.capacity()) {
      underlying_type::push(std::move(value));
    }
    else if (this->comp(value, this->top())) {
      this->pop();
      underlying_type::push(std::move(value));
    }
  }

  std::vector<T> to_vector() {
    return this->c;
  }
};

An attentive eye might have noticed that the underlying vector is now reserved with k rather than k+1. This smaller size is probably irrelevant, but it allows us to get rid of the member variable k_ since k_ is now always equal to the vector capacity.

Avoiding inheritance

Another option, which avoids inheritance, is to perform all operations using the heap operations, but to do so, you need to be aware of the priority_queue implementation.

template<typename T, typename Compare>
class k_elements_container {
public:
  using value_type = typename T;
  using value_compare = typename Compare;

  k_elements_container(size_t k) {
    c.reserve(k);
  }

  void push(value_type&& value) {
    if (c.size() < c.capacity()) {
      c.push_back(std::move(value));
      std::ranges::push_heap(c, comp);
    }
    else if (comp(value, c.front())) {
      std::ranges::pop_heap(c, comp);
      c.back() = std::move(value);
      std::ranges::push_heap(c, comp);
    }
  }

  std::vector<T> to_vector() {
    return this->c;
  }

private:
  std::vector<T> c;
  Compare comp;
};

As you can observe, you need to know that whenever you push a value, you need to push it at the end of the vector and then move this new node upward in the tree using push_heap and similarly for pop, where top of the heap is moved at the back of the vector.

Benchmark

The benchmark uses the Google Benchmark framework and was run on an i7-9850H @ 2.60GHz. The benchmark uses k=100 and n=1’000’000, with a uniform distribution over [0, 100’000’000].

template<typename Container>
void execute(benchmark::State& state) {
  for (auto _ : state) {
    // Initialization
    state.PauseTiming();
    constexpr size_t k = 100;
    constexpr size_t n = 1'000'000;

    Container heap(k);
    std::uniform_int_distribution dist(0u, static_cast<unsigned int>(n*100));
    auto const stream = create_stream(n, std::move(dist));
    state.ResumeTiming();

    // Benchmark execution
    for (auto value : stream) {
      heap.push(std::move(value));
      benchmark::DoNotOptimize(heap.to_vector());
    }
  }
}

constexpr size_t iterations = 100;
#define CUSTOM_BENCHMARK(Container) \
  BENCHMARK_TEMPLATE(execute, Container) \
    ->Unit(benchmark::kMillisecond) \
    ->Iterations(iterations); \

int main(int argc, char** argv) {
  using no_inheritance_impl = no_inheritance::k_smallest_elements_container<unsigned int>;
  using inheritance_impl = inheritance::k_smallest_elements_container<unsigned int>;
  using optimized_impl = optimized::k_smallest_elements_container<unsigned int>;
  using heap_operations_impl = heap_operations::k_smallest_elements_container<unsigned int>;

  CUSTOM_BENCHMARK(no_inheritance_impl);
  CUSTOM_BENCHMARK(inheritance_impl);
  CUSTOM_BENCHMARK(optimized_impl);
  CUSTOM_BENCHMARK(heap_operations_impl);

  benchmark::Initialize(&argc, argv);
  benchmark::RunSpecifiedBenchmarks();
  benchmark::Shutdown();
}

Run on (12 X 2592 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 12288 KiB (x1)
-----------------------------------------------------------------------------
Benchmark                                         Time         CPU Iterations
-----------------------------------------------------------------------------
execute<no_inheritance_impl>/iterations:100    1499 ms     1493 ms        100
execute<inheritance_impl>/iterations:100       70.3 ms     69.7 ms        100
execute<optimized_impl>/iterations:100         60.4 ms     60.8 ms        100
execute<heap_operations_impl>/iterations:100   59.1 ms     59.2 ms        100

The results show that the implementation without inheritance is inefficient, the optimized version is about 20% more efficient than the non-optimized version, and the heap operations version is equivalent to the optimized version.

Conclusion

The goal of this article was to demonstrate a real-world use case for standard container inheritance and private inheritance. The k‑smallest problem was used to demonstrate how private inheritance from priority_queue can be used to efficiently extract the heap's contents, with a time complexity of O(k) rather than O(k*log k). The resulting implementation from private inheritance is also simpler, less error-prone, and easier to understand.

This strategy can also be applied to stack and queue, where the underlying container is a protected member accessible only through inheritance.