dayvonjersen

Posted on Aug 15, 2018 • Edited on Jul 29, 2019

C++ For Go Programmers: Part 2 - Slices

#c #cpp #go #badideas

DISCLAIMER: This series title is inspired by a terrific article on the Go Wiki but, unlike that article, none of the information presented in this series should in any way be mistaken as good advice. See Part 0 for an introduction.

Seriously, don't do what I'm about to demo in this article. If you're using C++, just use std::vector. If you're using C, check out stretchy buffers.
EDIT: Also I got slice semantics wrong. s1 := s0[:] in Go actually makes both refer to the same underlying array, instead of making a copy. And realloc() could screw things up too. Thanks to jstag on #go-nuts for pointing these mistakes out.

As a thought experiment, let's try to implement Go slices in C and C++. If you don't want the play-by-play, take a look at the finished product in C and C++.

Part of what makes Go slices so nice is the convenient syntax:

s0 := []int{1, 2, 3}       // literals
s1 := make([]byte, 0, 512) // or specify length and capacity
s2 := s0[1:]               // s2 == {2, 3}, s0 is still == {1, 2, 3}
s3 := s0[:] // create a copy of a slice
s3[2] = 5   // modify in-place
somethingThatImplementsioReader.Read(s1) // slices are reference types, s1 has data in it now (maybe)
s0 = append(s0, 4, 5, 6) // append implicitly resizes
s1 = append([]int{0}, append(s0[:1], s0[2:]...)...) // like I said, nice!

We'll have none of that happy nonsense. This is C.

NOTE: Go also has the very nice feature of being able to effortlessly convert its native utf8 string type to and from either []byte or []rune types. I have no idea how to accomplish this in C/C++.

Slices in C

A slice has a length, a capacity, and a pointer to an underlying array.

typedef struct _slice {
    int   len;
    int   cap;
    void* arr;
} slice;

The builtin functions len and cap let you access the length and capacity of a slice.

int len(slice* s) { return s->len; }
int cap(slice* s) { return s->cap; }

So far this is pretty simple and straight-forward.

But actually, we need to store the type information. Types are not first-class values in C, so we'll store the size of the type in bytes and hope that's good enough. We'll come back to this later.

typedef struct _slice {
    size_t type;
    int    len;
    int    cap;
    void*  arr;
} slice;

The builtin function make allocates a slice with a specified length and optionally a capacity.

We'll require both and additionally the size (in bytes) of the type of the elements of the array as the first argument.

slice* make(size_t type, int len, int cap) {
    slice* s = calloc(1, sizeof(slice));
    s->arr   = calloc(cap, type);
    s->type = type;
    s->len  = len;
    s->cap  = cap;
    return s;
}

Slicing a slice creates a new slice. We can't do bracket notation, however, so we'll make it a function.

// naming stuff is hard
slice* sslice(slice* s, int start, int end) {
    int len = end-start;
    slice* s2 = make(s->type, len, len);
    memcpy(s2->arr, s->arr + start * s->type, len * s->type);
    s2->len = len;
    return s2;
}

Here's where the fun begins with the pointer arithmetic. As you know, arr[i] is just shorthand for *(arr + i), but the compiler needs to know the type of arr in order for this to work. We're working with void* here so we have to multiply the offsets by the size of the type in bytes.

"The copy built-in function copies elements from a source slice into a destination slice. [...] The source and destination may overlap. Copy returns the number of elements copied, which will be the minimum of len(src) and len(dst)."

https://godoc.org/builtin#copy

int copy(slice* dst, slice* src) {
    if(dst->type != src->type)  return 0; // should generate an error somehow actually
    int len = dst->len < src->len ? dst->len : src->len;
    memcpy(dst->arr, src->arr, len * src->type);
    return len;
}

We have enough now to start using these slices, but append() is really what makes Go slices so valuable.

If we knew the type of the underlying array we could do this:

typedef struct _int_slice {
    int  len;
    int  cap;
    int* arr;
} int_slice;

// same len, cap, make, sslice, and copy... except you could could omit type and use sizeof(int)

int_slice* append(slice* s, int elem) {
    if(s->cap < s->len+1) {
        s->arr = realloc(s->arr, sizeof(int) * s->len+1);
        s->cap = s->len+1;
    }
    s->arr[s->len++] = elem;
    return s;
}

But since we want to have slices of any type, we have to use void*.

NOTE: As a consequence, we cannot pass literals or value types to our append(), we always have to pass a pointer type. Something like int a = 5; s = append(s, &a); worked in my little toy examples but it's not great.

Recall that in C a byte is an unsigned char, and we have the size in bytes of the type of each individual element stored, thus:

slice* append(slice* s, void* elem) {
    if(s->cap < s->len+1) {
        s->arr = realloc(s->arr, s->type * s->len+1);
        s->cap = s->len+1;
    }
    int offset = s->len * s->type;
    for(int i = 0; i < s->type; i++) {
        *((unsigned char*)s->arr + offset + i) = *((unsigned char*)elem + i);
    }
    s->len += 1;
    return s;
}

"Ah", I can almost hear you say now, "but the real append() takes variadic arguments!" I got you covered, family:

slice* append(int count, slice* s, ...) {
    va_list args;
    va_start(args, s);

    if(s->cap < s->len+count) {
        s->arr = realloc(s->arr, s->type * s->len+count);
        s->cap = s->len+count;
    }

    int offset = s->len * s->type;
    for(int j = 0; j < count; j++) {
        unsigned char* elem = va_arg(args, unsigned char*);
        for(int i = 0; i < s->type; i++) {
            *((unsigned char*)s->arr + offset + i + j*s->type) = elem[i];
        }
    }
    va_end(args);

    s->len += count;
    return s;
}

Unfortunately, we have to specify how many elements we're appending because <stdarg.h> provides no mechanism for counting the number of variadic args. This is entirely consistent with how C expects the programmer to constantly keep track of the length of arrays and will happily read off the end and Segfault on your face, leading back around to why you would want to use something like slices in the first place.

full source code: slices.c 👀

Don't forget to free() your slices and their underlying arrays when you're finished with them.

Slices in C++

So now we have Go slices in C, but C++ (and its creator) will scream at us if we try to compile this code. It's just a matter of casting the return value of *alloc and ignoring and/or turning off some compiler warnings, but let's try to do it the right way™ with classes and templates:

template<typename T>
class slice {
    int _len;
    int _cap;
    T*  _arr;
public:
    slice(int len) : slice(len,len) {}
    slice(int len, int cap) {
        _len = len;
        _cap = cap;
        _arr = (T*) calloc(cap, sizeof(T));
    }
    ~slice() {
        free(_arr);
    }

    int len() { return _len; }
    int cap() { return _cap; }

    T& operator[](int i) { return _arr[i]; }

    void append(T elem) {
        if(_cap < _len+1) {
            _arr = (T*) realloc(_arr, _len+1 * sizeof(T));
            _cap = _len+1;
        }
        _arr[_len++] = elem;
    }

    template<typename... Targs>
    void append(T elem, Targs... args) {
        append(elem);
        append(args...);
    }

    slice<T> sslice(int start, int end) {
        int len = end-start;
        slice<T> s2(0, len);
        for(int i = 0; i < len; i++) {
            s2.append(_arr[start + i]);
        }
        return s2;
    }
};

full source code: slices.cpp 👀

Templates let us have slices of any type without having to do any pointer arithmetic or obscene casting.

I think I'm starting to sound a little too much like Bjarne so I should skip the play-by-play and wrap this up quickly.

NOTE: I had to use stdlib *alloc and friends because new[] for an unbounded array class member led to some strange results...

Bonuses for using C++:

We get to have free() called in the destructor automatically
Bracket notation for accessing slice elements directly.
Constructor overloading lets us get the same arguments as Go's make() (can omit capacity)
C++11 introduced parameter pack for variadic arguments, very nicely explained here by this fine fellow. I'm not crazy about having to use recursion but realistically you'd never append enough elements in a single call to overflow the stack.

The End

In conclusion, don't do this. You get none of the convenient syntax of Go slices nor any of the flexibility of conversions between Go strings and []byte or []rune while having to manage your own memory. realloc()'ing all the time can't be good for performance and you'll leak memory if you forget to free() every slice, especially since slicing a slice creates a new slice every time. Again, if you're using C++ just use std::vector otherwise check out stretchy buffers. Both of those check for OOM conditions and weren't hacked together in 30 minutes to write a follow-up article for a series that isn't even really a series.

DEV Community

C++ For Go Programmers: Part 2 - Slices

Slices in C

Slices in C++

The End

Top comments (0)