Anna Voronina

Posted on Oct 29

What we didn't get in C++

#cpp #programming

What's missing from C++—and how do developers fill the gaps? The author explores ideas and examples that show how the language could go even further. Let's explore them in this article!

We published and translated this article with the copyright holder's permission. The author is Nikolai Shalakin.

Missing features

I have over ten years of professional experience in C++ development. I entered the profession in 2013 when the C++ standardization committees began releasing updated language standards every three years. C++11 has already been released, introducing many exciting new features that significantly revamped the language. However, not everyone had the luxury of using all these new features in their code, so they had to stick with boring C++03, eyeing the new standard with envy.

At the same time, despite the variety of new features introduced to the language, I have noticed a recurring pattern from project to project: the use of "helper" files and "helper" containers that often implement the same things and fill in the gaps of STL. I'm not talking about highly specialized structures and algorithms—rather, about things one can't do without when developing software products in C++. I see how companies working on different projects come up with the same custom solutions because they're natural and in demand. But there's no supply, at least in STL.

In this article, I've gathered some of the most striking examples of what I've seen and used in development work. However, as I was collecting all the features missing from the C++ framework, I discovered that some of them were already covered by the new language standards, either completely or partially. So, this article is more of a reflection on and a critique of what was missing from the language for a long time but eventually made its way in. It also discusses what is still missing from the standard. The content doesn't make any grand claims; it simply offers a chance to chat about everyday C++.

DISCLAIMER: I may use the terms C++, STL, language, and language standard interchangeably (and may already have done so), as this is not important in the context of this article, which covers "all of the above."

What was missing for a long time

std::string::starts_with, std::string::ends_with

This is the phantom pain of every other C++ developer. We had been waiting so long for these things, and then they took forever to arrive. Give me a thumbs-up if you've seen something similar in the code base of a project you work on:

inline bool starts_with(const std::string &s1, const std::string &s2)
{
  return s2.size() <= s1.size() && s1.compare(0, s2.size(), s2) == 0;
}

These methods were introduced into the language only in C++20, which is still not available to everyone. The lucky ones can finally find the prefix of a string, though. And the postfix too:

std::string s("c++20");

bool res1 = s.starts_with("c++"); // true
bool res2 = s.starts_with("c#");  // false
bool res3 = s.ends_with("20");    // true
bool res4 = s.ends_with("27");    // false

std::optional

"This class has been around for a long time. Old man, go take your pills," you might say. And you would be somewhat right because std::optional has been with us since C++17, and everyone has grown fond of it. However, this is more of a personal issue for me. In my early years of work, I was involved in a project that strictly required adherence to the C++03 standard, and I had to use a custom optional created by my colleague.

Reading the code that implements this custom optional was an exciting process for me. I was still a junior developer at the time, and it really made an impression on me. Yes, it was quite simple and straightforward, but it brought me as much excitement as reading STL source code.

I'm glad that now I can boldly and without hesitation write something like this in almost any project:

std::optional<Result> getResult();

const auto res = getResult();
if (res) {
  std::cout << *res << std::endl;
} else {
  std::cout << "No result!" << std::endl;
}

std::expected

If you're familiar with Rust, you know that the Option<T> class has a close companion: Result<T, E>. They're closely related, and each has a bunch of methods that convert one into the other.

While Option<T> is obvious—it's an analog of optional<T> in C++—Result<T, E> requires some explanation. It's similar to optional<T>, but if there's no result, it's considered an E type error. So, an object of the Result<T, E> class can have two states:

The Ok state when the object stores a valid value of the T type.
The Error state when the object stores an error of the E type.

We can always ask an object which of the two states it's in and try to get a valid value from it, or we can ask what error it has.

Such a class may seem strange to a C++ developer, but it's quite important in Rust since the language doesn't have exceptions and handles errors only by returning error codes. In 99% of cases, this is accomplished by returning the result as the Result<T, E> object.

On the other hand, during my time working with C++, I've been involved only in projects where exceptions were banned for one reason or another. In this context, C++ becomes similar to Rust in terms of how it handles errors in a program.

This is why, once I saw Result<T, E> in Rust, I couldn't unsee it. I envied Rust for having it while C++ didn't. So yes, I wrote an analog of Result<T, E> for C++. The class had the questionable name, Maybe<T, E>, which could mislead Haskell programmers (in Haskell, Maybe is an analog of optional).

Then, just recently, I discovered that the C++ standardization committee approved the std::expected<T, E> class in the 23rd standard. MSVC even implemented it in VS 2022 17.3. It's available when the /std:c++latest compiler option is enabled. Even the name turned out to be fitting: I think, it's much better than Result or Maybe.

Now, let's take a look at how this class works using code that parses a human-readable chess address into coordinates that can be easily used by a chess engine. For example, a3 should become the coordinates [2; 0]:

struct ChessPosition
{
  int row; // stored as [0; 7], represents [1; 8]
  int col; // stored as [0; 7], represents [a; h]
};

enum class ParseError
{
  InvalidAddressLength,
  InvalidRow,
  InvalidColumn
};

auto parseChessPosition(std::string_view address) -> 
                    std::expected<ChessPosition, ParseError>
{
  if (address.size() != 2) {
    return std::unexpected(ParseError::InvalidAddressLength);
  }

  int col = address[0] - 'a';
  int row = address[1] - '1';

  if (col < 0 || col > 7) {
    return std::unexpected(ParseError::InvalidColumn);
  }

  if (row < 0 || row > 7) {
    return std::unexpected(ParseError::InvalidRow);
  }

  return ChessPosition{ row, col };
}

...

auto res1 = parseChessPosition("e2");  // [1; 4]
auto res2 = parseChessPosition("e4");  // [3; 4]
auto res3 = parseChessPosition("g9");  // InvalidRow
auto res4 = parseChessPosition("x3");  // InvalidColumn
auto res5 = parseChessPosition("e25"); // InvalidAddressLength

std::bit_cast

I've occasionally tripped over this. I don't know why, but every once in a while, I need to do strange things, like obtaining a bit representation of a floating-point number. Of course, back in my junior days, I wasn't afraid of UB and just used whatever worked, at least there and then. So, this is what we have in terms of converting one type of unsafe bit representation to another:

this is reinterpret_cast of course. It's so easy and tempting to write code like this:

uint32_t i = *reinterpret_cast<uint32_t*>(&f);

without worrying about anything. This is UB, though.

Let's go back to our roots with the C-style cast. It's the same as reinterpret_cast, only easier to write:

uint32_t i = *(uint32_t*)&f;

After all, if the Quake III developers used it, then why can't we? But... that's UB.

The trick with union:

union {
  float f;
  uint32_t i;
} value32;

This code itself is not UB, but the problem is that reading from the union data member to which you haven't written anything yet is UB as well.

However, I've seen all of these approaches with different twists:

An attempt to determine the sign of a float number by reading its most significant bit.
Converting a pointer to a number and back—hello, embedded. I've seen an unusual case where an address was converted into an ID.
Mathematical deviations with the exponent or mantissa of float.

"Who would need a mantissa?" you may ask. I'll answer to that: here's my old GitHub project where I created a simple IEEE 754 converter for fun. You can play around with the bit representation of 32-bit floating-point numbers. I made it a while ago for educational purposes. I also wanted to recreate the standard Windows 7 calculator design to see how it would turn out.

All in all, some people need this bit-level weirdness here and there.

The question is, how can we do it safely? When I turned to Stack Overflow for answers, I received a clear but harsh one: "Use memcpy." I also took a small snippet from there to make using memcpy more convenient:

template <class OUT, class IN>
inline OUT bit_cast(IN const& in)
{
  static_assert(sizeof(OUT) == sizeof(IN), 
                "source and dest must be same size");
  static_assert(std::is_trivially_copyable<OUT>::value,
                "destination type must be trivially copyable.");
  static_assert(std::is_trivially_copyable<IN>::value,
                "source type must be trivially copyable");

  OUT out;
  memcpy(&out, &in, sizeof(out));
  return out;
}

C++20 introduced std::bit_cast, performing the same task but being constexpr. This was possible thanks to the capabilities that the standard required compilers to implement.

Now, we can experience this beauty and ensure that it's not only beautiful but also correct in terms of language specifications:

float q_rsqrt(float number)
{
  long i;
  float x2, y;
  const float threehalfs = 1.5F;

  x2 = number * 0.5F;
  y = number;
  i = std::bit_cast<long>(y);          // evil floating point bit level hacking
  i = 0x5f3759df - (i >> 1);           // what the fuck?
  y = std::bit_cast<float>(i);
  y = y * (threehalfs - (x2 * y * y));    // 1st iteration
  //y = y * (threehalfs - (x2 * y * y));  // 2nd iteration, this can be removed

  return y;
}

No thanks needed, id Software.

What is missing and may never come to be

Floating-point arithmetic

As we all know, you can't simply compare two floating-point numbers to see if they're equal. Even though they seem perfectly equal to you, 1.0 and 0.999999999 are not equal to each other. No standard methods exist in the language to adequately resolve this issue—one must manually compare the absolute difference between the numbers and epsilon.

Another useful feature is the option to round a number to a certain number of decimal places. We have floor, ceil, and round at our disposal, but none of them are what we need; they all round to the nearest integer. So, we need to go to Stack Overflow to find some ready-made solutions.

As a result, the code base ends up full of helpers like these:

template<class T>
bool almostEqual(T x, T y)
{
  return std::abs(x - y) < std::numeric_limits<T>::epsilon();
}

template<class T>
bool nearToZero(T x)
{
  return std::abs(x) < std::numeric_limits<T>::epsilon();
}

template<class T>
T roundTo(T x, uint8_t digitsAfterPoint)
{
  const uint32_t delim = std::pow(10, digitsAfterPoint);
  return std::round(x * delim) / delim;
}

What else can I to say? It's not really a big deal, but it's sad.

EnumArray

Let's imagine we have the following list:

enum class Unit
{
  Grams,
  Meters,
  Liters,
  Items
};

It's quite common to need a dictionary with an enum key to store configuration or information about each element of the enumeration. This situation is common in my work. The first straightforward solution can be easily implemented using common STL tools:

std::unordered_map<Unit, const char*> unitNames {
  { Unit::Grams, "g" },
  { Unit::Meters, "m" },
  { Unit::Liters, "l" },
  { Unit::Items, "pcs" },
};

This is what we can notice in this piece of code:

std::unordered_map isn't the most generic container. It's also not the best in terms of memory performance.
Such configuration dictionaries can appear very often in a project. Most of them are small because the average number of items in a list is usually just a few, rarely going over a few dozen. Using a hash table with std::unordered_map or a tree with std::map seems excessive.
An enumeration is essentially a number. It's tempting to think of it as a numerical index.

The latter may quickly lead us to the idea of creating a container that appears to be a dictionary in terms of its interface but is actually based on std::array under the hood. The indexes of an array are the elements of the enumeration, and the array data are the map values.

All we need to do is figure out how to tell the array what its length should be. In other words, how to count the number of elements in a list. The simplest old-fashioned way is to add the Count service element to the end of the enum. Let's focus on this method. It isn't particularly exotic; I often see it in code bases, so it's fine to use:

enum class Unit
{
  Grams,
  Meters,
  Liters,
  Items,

  Count
};

Further implementation of the proxy container is quite simple:

template<typename Enum, typename T>
class EnumArray
{
public:
  EnumArray(std::initializer_list<std::pair<Enum, T>>&& values);

  T& operator[](Enum key);
  const T& operator[](Enum key) const;

private:
  static constexpr size_t N = std::to_underlying(Enum::Count);
  std::array<T, N> data;
};

We need the constructor with std::initializer_list so that we can build our configuration the same way we built std::unordered_map back in the day:

EnumArray<Unit, const char*> unitNames {
  { Unit::Grams, "g" },
  { Unit::Meters, "m" },
  { Unit::Liters, "l" },
  { Unit::Items, "pcs" },
};
std::cout << unitNames[Unit::Items] << std::endl; // outputs "psc"

Beautiful!

This is what this beauty is about:

We leverage all the benefits of both std::array and std::unordered_map. The convenience of the dictionary interface plus the efficiency and simplicity (in a good way) of the array under the hood.
Unlike std::unordered_map and std::map, it's cache-friendly because the data is stored sequentially in memory.
The array size is known at compile time, and if we refine the container, almost all of its methods can be easily made constexpr.

Here are the limitations of this approach:

The mandatory Count in the enumeration.
The enumeration can't have custom type values:

enum class Type
{
  A = 4,
  B = 12,
  C = 518,
  D
}

It has only the default order starting from zero.

The memory is allocated in the array for all elements of the enumeration. If we don't fill the EnumArray with all values, the rest will contain default-constructed objects.
By the way, this is another restriction—the T type must be default-constructed.

I usually don't mind such restrictions, so I use this container without issue.

Early return

Let's look at a typical function with some bounds checks:

std::string applySpell(Spell* spell)
{
  if (!spell)
  {
    return "No spell";
  }

  if (!spell->isValid())
  {
    return "Invalid spell";
  }

  if (this->isImmuneToSpell(spell))
  {
    return "Immune to spell";
  }

  if (this->appliedSpells.constains(spell))
  {
    return "Spell already applied";
  }

  appliedSpells.append(spell);
  applyEffects(spell->getEffects());
  return "Spell applied";
}

It's nothing special, right? The sad three lines at the bottom are where the method actually works. The rest is checking whether it can perform the task. That's a bit annoying. This is especially true if you're a fan of the Allman style and each of your curly brackets knows how to set personal boundaries.

It'd be great to have a more streamlined approach that doesn't rely on boilerplate. C++ has assert, for example, which is similar to what we're doing here: it checks a certain condition and, if necessary, takes action under the hood. However, it's easier for assert—it doesn't need to return anything. Still, we could create something similar:

#define early_return(cond, ret)      \
  do {                             \
    if (static_cast<bool>(cond)) \
    {                            \
      return ret;              \
    }                            \
  } while (0)

#define early_return_void(cond)      \
  do {                             \
    if (static_cast<bool>(cond)) \
    {                            \
      return;                  \
    }                            \
  } while (0)

FFFUUU, macros! Bjarne Stroustrup dislikes macros. If he sends me a private message asking for an apology, I'll understand and apologize. I don't like C++ macros either.

But yes, the code contains macros, even two of them. In fact, we can reduce them to one if we use a variadic macro:

#define early_return(cond, ...)      \
  do {                             \
    if (static_cast<bool>(cond)) \
    {                            \
      return __VA_ARGS__;      \
    }                            \
  } while (0)

There's only one macro left, but it's still a macro. No, a miracle is unlikely to happen—it can't be converted into a non-macro. As soon as we drag it into a function, we lose the ability to influence the control flow of our current function. It's sad, but true. Check out how we can rewrite our example, though:

std::string applySpell(Spell* spell)
{
  early_return(!spell, "No spell");
  early_return(!spell->isValid(), "Invalid spell");
  early_return(this->isImmuneToSpell(spell), "Immune to spell");
  early_return(this->appliedSpells.constains(spell), "Spell already applied");

  appliedSpells.append(spell);
  applyEffects(spell->getEffects());
  return "Spell applied";
}

This also works if the function returns void:

void applySpell(Spell* spell)
{
  early_return(!spell);
  early_return(!spell->isValid());
  early_return(this->isImmuneToSpell(spell));
  early_return(this->appliedSpells.constains(spell));

  appliedSpells.append(spell);
  applyEffects(spell->getEffects());
}

We shortened it, and I think it's better that way. If the standard supported this feature, it could be a full-fledged language construct rather than a macro. Although, just for fun, I'll note that assert in C++ is also a macro :)

If you strictly adhere to the assert behavior and believe that conditions should work exactly like assert—that is, asserting the expected and triggering otherwise—then we can easily accommodate that. All we need to do is reverse the logic and rename the macro based on its new behavior:

#define ensure_or_return(cond, ...)   \
  do {                              \
    if (!static_cast<bool>(cond)) \
    {                             \
      return __VA_ARGS__;       \
    }                             \
  } while (0)

void applySpell(Spell* spell)
{
  ensure_or_return(spell);
  ensure_or_return(spell->isValid());
  ensure_or_return(!this->isImmuneToSpell(spell));
  ensure_or_return(!this->appliedSpells.constains(spell));

  appliedSpells.append(spell);
  applyEffects(spell->getEffects());
}

The name could be better, but you get the idea. I'd be happy to see any of these constructs in C++.

Unordered erase

I believe the most frequently used collection in C++ is vector. We know that a vector is great for everything except inserting or deleting items from random parts of the collection. This takes O(n) time, which is why I always feel a bit sad deleting something from the middle of a vector—since it has to shuffle about half of its contents just to shift slightly to the left.

There's an idiomatic trick that can turn O(n) into O(1), but it comes at the cost of losing the element order in the vector. So, if you're ready to pay the price, this simple trick is definitely the way to go:

std::vector<int> v {
  17, -2, 1084, 1, 17, 40, -11
};

// we delete 1 from the vector
std::swap(v[3], v.back()); 
v.pop_back();

// we get [17, -2, 1084, -11, 17, 40]

What did we do? First, we replaced the last element of the vector with the one marked for deletion, and then simply discarded it. Both operations are super cheap—it's simple and beautiful.

Why the vector interface doesn't have such a simple alternative to the usual erase method is unclear. In Rust, for example, it exists.

Well, we'll have to create our own helper function for the code base:

template<typename T>
void unorderedErase(std::vector<T>& v, int index)
{
  std::swap(v[index], v.back());
  v.pop_back();
}

Let's sum-up

I had to rework and discard half of the article while still writing it, because the modern C++20 and C++23 standards covered many of the items on the wish list described in this complaint book. Otherwise, the list of things that language users want will never end because there are as many requests as there are people, and you can't fit them all into the standard library or the language itself.

I tried to mention only the points that I thought were the least subjective and the most worthy of inclusion in the language standard. In my work, they are needed almost every day, at least. You may well have a different opinion about my list, and I'd be happy to read in the comments about your pain points and the features you feel are missing. This would give me an idea of what users want for the future of C++.

Top comments (1)

Pierre Gradot • Nov 3

Can early_return() / ensure_or_return() macros be replaced with contract_assert() from C++26?