Prachi Jha

Posted on Jun 10

Why Copy Something That's About to Die?

#cpp #beginners #software #learning

I know the title could be a metaphor for a billion things in programming, so bear with me for a minute as I build this up to rvalue references in cpp.

How it started

I was writing a recursive palindrome checker and hit an Out Of Memory error while submitting my solution.

Someone (GPT) suggested changing

bool check(string s, int left, int right)

to:

bool check(const string& s, int left, int right)

to avoid copying the string on every recursive call.

Now, while I understood that string& s created a reference to the variable that holds my string, and const ensured that we didn't alter the value later, I didn't really understand how that worked as a function argument.

Why References?

Typically, defining:

void foo(string s){...}

and then calling:

foo(name);

creates a copy.

name --> "Prachi"
s --> "Prachi"

These are two separate strings. However, with:

void foo(string& s){...}

calling:

foo(name);

creates a reference for name. Here, s becomes another name for name.

References don't create a new object at all, and are thus able to avoid copying.

This however brings us to the question, what would happen if I were to write:

foo("Prachi");

Here, "Prachi" is not an object. So what does s refer to now?

As expected,

void foo(string& s){...}

would not work with:

foo("Prachi");

However, quite unexpectedly, if foo is defined as:

void foo(const string& s){...}

foo("Prachi") would totally work.

And the only difference is that of a const.

So bringing us to the obvious question in the next part-

How. Doesn't the Temporary Get Destroyed?

Normally, temporary objects don't stick around for very long.

For example:

string("Prachi");

creates a temporary std::string object that is destroyed at the end of the statement.

This means that if C++ simply allowed references to bind to temporaries, we could end up with a reference pointing to an object that no longer exists.

To avoid that, C++ makes a special exception for const T&.

Consider:

const string& s = string("Prachi");

Without any special rules, the execution might look like this:

Create temporary string("Prachi")
↓
Bind s to it
↓
Destroy temporary
↓
Use s

which would leave s referring to an object that no longer exists.

Instead, C++ extends the lifetime of the temporary so that it survives for as long as the reference does.

Conceptually, the compiler behaves more like:

Create temporary string("Prachi")
↓
Bind s to it
↓
Keep temporary alive
↓
Use s safely
↓
Destroy temporary when s goes out of scope

This rule is known as temporary lifetime extension.

Which brings us to another question:
Why not just do the same for every reference instead of just const T&?

Why such bias with non-constants?

Imagine cpp allowed:

void scream(string& s)
{
    s += "!!!";
}

and then:

scream(string("Prachi"));

by extending a temporary's lifetime.

What would happen?

A temporary gets created, say temp.

temp = "Prachi"

It gets bound to s.

string& s = temp;

Then the function runs, edits s and thus temp, so now temp becomes:

temp = "Prachi!!!"

Then the function exits. The temporary is destroyed.

The weird thing is that the modification was technically valid. the string was successfully modified. But nobody will ever see the result. The object would immediately disappear afterward.

Instead if we did:

string name = "Prachi";
scream(name);

Now, name is a concrete object. And thus the modification would survive.

C++ could have extended temporary lifetimes for all references. Instead, it chose to reserve non-const references for "real" objects whose state can meaningfully be modified and observed later. A temporary object is, by definition, about to disappear, making modification largely pointless.

But what if we did want to do something with the temporary?

Let's consider another example:

string makeName()
{
    string s = "Prachi";
    return s;
}

Here, when s is returned,

string name = makeName();

all of its characters are copied to a new string.

Why do that when s is about to get destroyed anyway?

Move Instead of Copy

In the above example, when s is returned, we have two options.

Option 1. Copy

When s is copied to the return value, the entire buffer is duplicated.

For a small string, that's not a big deal. But imagine a vector containing millions of elements. Copying becomes expensive very quickly.

And here's the thing:

Immediately after the return statement executes, s is destroyed.

We spent time and memory duplicating data from an object that was about to disappear anyway.

Option 2: Move

Instead of duplicating the contents, we transfer ownership of the buffer.

No characters are copied.

No new buffer is allocated.

The return value simply takes ownership of the memory that s was already using.

When s is eventually destroyed, it has nothing left to clean up.

This is the core idea behind move semantics.

Instead of asking:

How do we copy this object efficiently?

C++ asks:

Does this object still need its resources?

If the answer is no, we can simply transfer ownership instead of copying!

Thus for the first time, having temporaries becomes an optimization opportunity. But for this, we would have to correctly identify which are the temporaries that are about to get destroyed.

Basically, how do we know if an object is safe to steal resources from?

Enter Rvalue References

Before this point, I only knew about one kind of reference:

string& ref = name;

which can only bind to a named object.

string name = "Prachi";
string& ref = name;      // Valid

string& ref = string("Prachi"); // Error

C++11 introduced a second kind of reference:

string&& ref = string("Prachi");

Unlike ordinary references, rvalue references specifically allowed to bind to temporary objects: an object that is about to be destroyed and whose resources can potentially be reused.

While Lvalue refers to an object with an identity that can be referred to later, Rvalues are temporaries.

An rvalue reference is simply a reference that is allowed to bind to these temporary objects.

Once the language syntax made this distinction possible, move semantics came into picture.

Move Semantics is the idea that rather than creating a brand-new copy of an object's resources, we "move" those resources from one object to another.

The moved-from object remains valid, but no longer owns the resource it previously held.

Why I Thought Rvalue References Were Redundant

When I first encountered rvalue references, I was thinking exclusively about function arguments.

From that perspective, they seemed almost pointless.

After all, I had already learned that:

const string& s

avoids copies.

It can bind to temporaries.

It extends the temporary's lifetime.

So what exactly was left for:

string&& s

to do?

For nearly an hour, I kept looking at rvalue references through that lens and couldn't understand why the feature existed at all.

The breakthrough came when I stopped thinking about function arguments and started thinking about ownership.

The goal was never to create "another kind of reference."

The goal was to give the language a way to recognize objects that were about to disappear and make use of that fact.

Instead of treating temporary objects as a nuisance, C++ turns them into an optimization opportunity.

Once I realized that, the entire design suddenly clicked.

Rvalue references weren't solving a parameter-passing problem.

They were solving an ownership problem.

And move semantics was the elegant consequence of that solution.

Interestingly, while reading further, I learned that one of the key contributors behind rvalue references and move semantics in C++11 was Howard Hinnant.

So if by some miracle this article ever reaches him:

Thank you.

What started as an Out Of Memory error in a recursive palindrome checker somehow turned into one of my favorite language-design rabbit holes.

DEV Community