A Weird Way to Substring in C++

twitter logo ・1 min read

/*

author:  mike bell 
twitter: @therealdarkmage
date:    Fri Aug 9 5:12 PM
website: https://mikebell.xyz

I was playing around and discovered a weird way to perform a substring on a string in C++.
Take the address of a string and then, using array indexing, add the position to substring from.
Works with constant and arbitrary strings! Cool/weird!
Compiles on current macOS g++ as of this writing.
*/
#include <iostream>
#include <string>
using std::string;
using std::cout;
using std::endl;
int main() {
    string s = &"Hello, World"[7];
    string ss = &s[2];
    cout << s << endl; // prints "World"
    cout << ss << endl; // prints "rld"
    return 0;
}
twitter logo DISCUSS (4)
markdown guide
 

The second use, ss, isn't particularly weird, actually. std::string is basically a wrapper around a c-string (char array).

A few things to consider:

  • C-strings are a linear data structure, stored in adjacent memory (as opposed to a list.)

  • C-strings have to end with the null terminator \0, which marks the end of the string. If a c-string lacked this, there would be no way to know when to stop; the actual length of a c-string is not stored anywhere. (A std::string might cache the length for efficiency reasons though. I haven't read implementation in a while, but I suspect it to be so.)

  • Applying the [] operator to a pointer is just performing pointer arithmetic.

  • & before a variable name is returning the memory address of the variable. In the case of std::string, the first thing in the object's memory is the internal c-string.

The "magic" is all happening on string ss = &s[2];...

  1. We get the address of s, which also happens to point to the beginning of the c-string inside of s.

  2. [2] is the same as address + 2. That means you're pointing to the third character in memory now.

  3. The c-string is read starting at that pointer, and goes until it encounters the \0.

  4. Said c-string is passed to the constructor for std::string, and is used to create ss.

My only caution with this method is that it makes assumptions about the implementation details of std::string. If you use another string class, it might not behave the same.

The safer way to do the same thing is to save the pointer to the c-string inside of s via s.c_str(), and then work with that pointer directly. And even then, you're still not as safe as if you just used std::string's own member functions, because if you get your pointer arithmetic wrong, you're going to have memory errors.

 

A "fun" side effect of the fact that the C++ (and C) subscript operator is just pointer arithmetic means that

const char* array[] = { "Well", "This", "Is", "Strange" };

// value contains "Strange"
const char* value = (1 + 1) [array + 1];

is actually valid C++ 😆.

Because x [y] is the same as *(x + y), that means that

const char* x = (1 + 1) [array + 1];

is just the same as

const char* x = *(1 + 1 + array + 1);
// Or
const char* x = *(array + 3); // array [3]

It's worth only noting that this works where the subscript operator does use pointer arithmetic (C style arrays and pointers mostly I think) and not where the [] operator is overloaded - so you can't do

std::string v = "Hello World!";
char second_letter = 1 [v];

So I guess that means the original example of

string s = &"Hello, World"[7];
string ss = &s[2];

can be rewritten as

std::string s = & (2 * 3 + 1) ["Hello, World!" - 1];

if you were so inclined 😆.
Sometimes I worry about C++ - it doesn't exactly help itself at times... 😄

Obligatory godbolt for this:
godbolt.org/z/nHJOgs

 

Sometimes I worry about C++ - it doesn't exactly help itself at times... 😄

Yes, but at least we get to play with esoteric hackery without needing a different language.

...and sometimes, ever so rarely, if the stars are aligned, that hackery comes in handy.

 

I knew about the pointer arithmetic aspect from college, but I had no idea you could also do it with string constants! Thank you for the explanation!

Classic DEV Post from May 11

Handling Array Duplicates Can Be Tricky

Handling Array Duplicates Can Be Tricky

darkmage profile image
Computer Science Tutor, Wannabe Hacker, and Cannabis Thinktank

Where the wild code grows

Sign up (for free)