Nathan Fenner

Posted on Feb 25, 2017

A Slightly Better C++

C++ is a huge language, with a lot to love, and a lot (more) to dislike. The following are a few ramblings on things that could be done to take C++ and make (what I would consider) a slightly better language. The main goals can be summarized as follows:

code should have identical, or mechanically translatable, semantics to the equivalent C++
code should be very familiar to existing C/C++/ALGOL-family developers
code should have similar performance to the equivalent C++
- I'm willing to accept some performance decline if it simplifies semantics considerably (and there's an example of this below).
code should be fully compatible with C++ (as library and as ABI). Ideally, the computer shouldn't be able to tell the difference between them.
it should be possible to mechanically translate existing C++ (provided that it follows reasonable coding standards) to this new language.

It's also worth adding non-goals:

become Rust and/or Haskell, or otherwise focus on safety extensively
- small changes that make things incrementally better are nice, but if I want to actually have even a miniscule chance of actually improving real-world C++, it absolutely cannot require these kinds of complex static checks.

So with these goals set down, what's the language I propose? This whole article is a bit of a mess, but it includes many of the little ideas I have to make C++ better to use. Many of the things are based on what I've seen TAing EECS 281 at the University of Michigan and hacking on C++ on my own, but I hope that they resonate with you regardless of how you use C++.

First, a few syntax-level things. This isn't really core to most of the ideas here, but C++ desperately needs a makeover in a few key areas.

// C++
#include <vector>
#include <iostream>
std::vector<int> global_vector(6, -1);
int sum() {
    int sum = 0;
    for (int x : global_vector)
        sum += x;
    return sum;
}
void print_sum() {
    std::cout << "the sum of the global vector is " << sum() << std::endl;
}

// SBC++ (slightly better C++)
module main;
#include <vector>
#include <iostream>
var global_vector : std::vector<int> = vector<int>::make(6, -1);
func sum() -> int {
    var sum : int = 0;
    for var x : int in global_vector {
        sum += x;
    }
    return sum;
}
func print_sum() {
    std::cout << "the sum of the global vector is " << sum() << std::endl;
}

The most-prominent difference is perhaps the addition of the var keyword, and moving the types to the right-hand-side of a :. You might not care for this change, and that's fine, but I generally find that being able to pick out where variables are being defined is more helpful than the small amount of space saved by omitting a variable-declaring keyword.

(Also, we can get rid of auto now: just don't specify the type of the variable. Still not doing full type inference, though.)

The next obvious change is switching to using func instead of declaring functions with a preceding return type. My reasoning for this is pretty much the same as var: I'd rather be able to pick out the fact that a function is being defined, and its name, than its return type in the majority of cases.

There's also the module statement at the top of the file. Perhaps unsurprisingly, the includes aren't textual either. These changes are probably far less contentious than the previous ones (although the details of the implementation and behavior are certainly up for debate!).

The enhanced-for loop needs a keyword (in here) to replace the :, now used for type definition.

Lastly, braces are mandatory on for-loops, while-loops, and if and else blocks (although else if is allowed without a brace in between). Since braces are mandatory, parentheses go away.

Changes to Assignment, Construction, and Destruction

C++'s model of constructing and destructing objects works well for giving fine-grained control over resources while being very efficient. However, there are a few (mis)features of the system that frequently get in the way, and, frankly, aren't necessary.

So, first, constructing objects. It's hard to get anywhere if we don't have anything to work with first.

Constructing an Object

Constructors are a little bit magical in C++. I'd like to tone that down in SBC++. The biggest difference is that in SBC++, constructors have names. Here's an example:

// SBC++
class Deck {
    var cards : std::vector<Card>;
public:
    new func empty() {
        cards = std::vector<Card>::empty();
    }
    new func full() {
        cards = std::vector<Card>::empty();
        for var suit in suites {
            for var rank in ranks {
                cards.push_back(Card::make(suit, rank));
            }
        }
        shuffle(mut cards);
    }
    func draw() mut -> Card {
        return cards.pop_back();
    }
}

A new func is what C++ calls a constructor. In a new func, all member variables start uninitialized, and it's the job of the new func to initialize them. In SBC++, constructors aren't quite as special, so we don't need to use initializer lists at all.

The above code defines two new funcs for the Deck class. We have a way of constructing empty decks, as well as full ones.

Note that SBC++ doesn't have a concept of a "default" constructor. If a type wants to support the creation of default instances, then it can do so explicitly. In particular, SBC++ doesn't need default constructors in order to be able to put a bunch of items into a collection! In a similar way, we'll see how SBC++ doesn't need copy constructors as much as C++ does (although it can use them too if it wants).

This might not sound so good to you! In particular, I am deliberately sacrificing the convenience of being able to create an empty collection etc. simply by declaring the appropriate value. In other words,

// C++
std::vector<int> empty_vector;

corresponds not to

// SBC++ (doesn't create empty vector)
var empty_vector : std::vector<int>;

but instead to

// SBC++
var empty_vector : std::vector<int> = vector<int>::empty();

What is lost in brevity is hopefully made up for in clarity. In particular, if you don't want to initialize it to anything, the compiler can much more easily tell you that.

Assignment

Assigning things in C++ is a sordid business. Conceptually, assignment can be divided into copy assignment and move assignment. Copying a value makes the target into a copy of the value at the right, and move assignment "moves" the value from one place to another.

SBC++ takes another approach. First, we have to impose a restriction on how values may assume their location can change. In particular, if no explicit references have been externally created to an object, the compiler is free to move it around in memory arbitrarily (note: obviously sometimes values have to be pinned, so specialized types may opt-in to eliminating this behavior; but then they cannot be assigned at all). In particular, this means that objects can't hold pointers to themselves or their own fields, because if the object were to move, they would become invalid.

Here's how slightly-better C++ does assignment:

invoke the destructor on the LHS
memcpy the RHS to where the LHS is
mark the RHS as invalid (if it was a temporary, it can just be forgotten; if it's a variable, it needs a "drop flag" to indicate that it was moved from and invalid, unless it is "trivially copyable" because it owns no resources).

This simplifies the model considerably. We do lose a bit in the way of optimization; a C++ implementation could reuse buffers or other owned memory in the LHS for the assignment; but more often than not I suspect that most applications (except for the very highest performance ones) want this behavior. If it's desired, the same effect can be achieved, just not using assignment.

Destruction

Destruction is pretty much the same as in C++. The only difference is that each variable now need a "drop flag" to indicate whether they should be dropped. Essentially, SBC++ knows the difference between a variable being uninitialized and actually having no value (in C++-land, being moved-out-of). In practice, most variables won't actually need drop flags since it will be clear (statically) when and if their destructors need to run.

Static Checks

There are a few static checks that are somewhat easier to add to SBC++ than to C++, and the compiler would be required to enforce them.

The first is the use of uninitialized values. Since SBC++ doesn't run "default constructors" when an object is initialized, variables not given values can't be used! In particular, if there is any code path to the current point of execution in which the variable is not assigned to some value, it cannot be used. Since C++ is all about zero-cost abstraction, we'd want to provide a way to assert to the compiler that it actually has been (I'd vote for something like assert_initialized variable_name;); and in this case there's no performance lost.

References and Const

References are supposed to be safer pointers, but there are a few ways that they lack in C++. The most obvious problem is that they must be assigned a value at declaration. C++ needs this because otherwise they're not particularly safe; if they haven't been pointed somewhere, you can't use them, but C++ doesn't have this sort of assignment check to stop you from doing this! Luckily, SBC++ does.

References in SBC++ are unfortunately not quite as convenient as they are in C++. Luckily, the ways in which they are awkward matter much less (as will be seen later).

Another way in which C++'s references are unfortunate is that they encourage non-constness. Constant by default tends to be better (because it is a safer default, while still not preventing any optimization that a good compiler should perform) so SBC++ adopts it for references (although not for variables).

// C++
int x = 4;
int& y = x;
y = 2;
std::cout << x << std::endl;

// SBC++
var y : &mut int; // can declare y first!

var x : int = 4;

y = &mut x; // need to use '&' to point y at x 
*y = 2; // need to dereference y explicitly

in other words, when used in this manner, references are just like pointers! In SBC++, the type of &expression is now &T instead of *T, but references are allowed to decay to pointers (and of course, references still can't be NULL).

Note that this means that there's no problem with pointing a reference somewhere new. Writing

y = &mut x;
y = &mut z;
y = &mut w;

would be perfectly fine, and repoints y to a different place each time. This means that we can assign to a struct or class with a reference field. (Yay!)

However, this is fairly inconvenient. So there are a few conveniences baked in. Accessing fields and methods is done with a dot instead of an arrow, and functions expecting a

// SBC++
func print_vec(list : &std::vector<int>) {
    for var x : int in list {
        std::cout << x << " ";
    }
}

func double_up(list : &mut std::vector<int>) {
    for var i in std::range(list.size()) {
        list.push_back(list[i]);
    }
}


var my_list : std::vector<int> = {1, 2, 3};
double_up(mut my_list);
print_vec(my_list); // 1 2 3 1 2 3

If a function or method takes a mutable reference as a parameter, the caller has to say mut or &mut before the argument to indicate this (but mut is preferred).

Constness

const fields get a makeover too! C++ makes it very annoying (or downright impossible) to use const fields effectively. If you want to make an object with const fields, it cannot go in a vector, at least not if you want to be able to push_back into it.

SBC++ simplifies things. When a field is declared as const, it's clear that what this really means is that it should be changed with respect to the object. In other words, it means that a mutation of the object must not change it.

However, a wholesale replacement of an object is still perfectly fine even with a const field!

// SBC++

struct TwoOfTheSame {
    const x : int;
    const y : int;
    new func make(v : int) {
        x = v; // x is not const for the body of a new-func
        y = v;
    }
}

var p : TwoOfTheSame = TwoOfTheSame::make(7);

p = TwoOfTheSame(3); // perfectly valid

by allowing const to refer to object integrity, instead of absolute memory constness, it can be used more frequently, and thus more effectively.

Safer Enums and Tagged Unions

Yes, this was bound to be coming for sure. Although I'd like to have generalized algebraic data types in whatever language I'm working with, it's a bit much to add them to C++ when trying to make a conservative improvement to the language.

Basically, enum values should enum class was a huge improvement over C's enum, and I'd like to make a similar extension to it. An enum union extends the C-style union with a tag field that allows it to be distinguished in its cases. In addition, it creates a destructor for the type that's based on the tag, so that they can still be cleaned up. These changes (along with some sort of type-directed switch to actually operate on them) satisfy what I'm looking for. The result could be something like this for an optional type:

template<T>
enum union Optional {
    None;
    Some(T);
}

Template Improvements

Somehow, the syntax has got to be improved a little. I'm not sure how. Instead, I'd like to talk about something completely different. (Here's where I propose generalized algebraic data types.)

There's a convenient concept coming (primarily) from functional languages known as phantom types. Essentially, a phantom type is a type parameter without a corresponding runtime component. In addition, I want to support polymorphic recursion but only on phantom types. This means that we can still compile them by templating, since the phantom type doesn't have to actually be instantiated!


struct Z{}
template<T> struct S{}

template<T, phantom L>
enum union AVLNode {
     Empty : AVL<T, Empty>;
     Balanced(T, *AVL<T, Height>, *AVL<T, Height>) : AVL<T, S<Height>>;
     LeftHeavy(T, *AVL<T, S<Height>>, *AVL<T, Height>>) : AVL<T, S<S<Height>>>;
     RightHeavy(T, *AVL<T, Height>, *AVL<T, S<Height>>>) : AVL<T, S<S<Height>>>;
}

This type describes a balanced AVL tree; it's impossible to construct an unbalanced instance of the tree, but its runtime representation only pays a penalty of 1-byte per node object to ensure that this is possible! In particular, this is no more than the standard implementation anyway of AVL trees anyway.

Conclusion

There's a lot I didn't cover here, and some of these details are pretty important. Some things worth noting that need to be gone into in more detail are the semantics of moving something out of a reference (which shouldn't be allowed for reasons of correctness, but it's problematic if we can't do this at all). How inheritance deals with some of these things needs some thought too, especially w.r.t base class constructors and virtual methods.

I'm publishing this mostly as a rambly note to myself, but I hope that it's at least interesting to whoever stumbles across it.

Top comments (2)

Bryan Baldwin • Mar 13 '17

You'd probably be better off binning the whole idea that C++ can be "fixed." Instead realize the truth, almost nothing that C++ added to C was of any value. Using C++ becomes a lot easier and saner if you just refrain from using most of the OO features. Program like its plain C.

On the other hand, JAI by Jonathon Blow is more like that improvement to C that we were waiting for and C++ should have been.

nihar • Jun 23 '17

You can master C++ by your own.
Best C++ tutorials recommended by programming community: hackr.io/tutorials/learn-c-c-plus-...
You just need to have determination to do it.

Keep in mind the following things

It will be frustrating in the beginning all those compilation errors

I will be confusing especially the pointers part.

Give time to understand the code you have written from the first line to the last

It will take some time to learn but once you did. Everything will be smooth after that.

DEV Community