DEV Community

Cover image for Toy Literal Byte Alignment
Kayne Ruse
Kayne Ruse

Posted on

Toy Literal Byte Alignment

Hi! Long time coder, first time poster.

I'm making a programming language called Toy, which is intended to allow easy modding of video game logic by the players. To this end, each value within Toy, be it a number, a string, a variable name, etc. is stored in a structure called "Toy_Literal".

Lets see if you can see the problem here:

typedef struct Toy_Literal {
    Toy_LiteralType type;
    union {
        bool boolean;
        int integer;
        float number;
        Toy_RefString* stringPtr;

        Toy_LiteralArray* array;
        Toy_LiteralDictionary* dictionary;

        struct {
            void* bytecode;
            Toy_NativeFn native;
            Toy_HookFn hook;
            void* scope;
            int length;
        } function;

        struct {
            Toy_RefString* ptr;
            int hash;
        } identifier;

        struct {
            Toy_LiteralType typeOf;
            bool constant;
            struct Toy_Literal* subtypes;
            int capacity;
            int count;
        } type;

        struct {
            void* ptr;
            int tag;
        } opaque;
    } as;
} Toy_Literal;
Enter fullscreen mode Exit fullscreen mode

This is slightly adjusted from the actual definition of the literal structure I was using for the longest time. As the language progressed, I would add new features as needed.

The big problem, is that this is 48 bytes in size, with a lot of wasted space. It's kind of obvious if you know ahead of time, but it was a real lightbulb moment when I realized I could shrink this down by 50%, thus speeding up the copious copying of literals throughout my lang's internals:

typedef struct Toy_Literal {
    union {
        bool boolean; //1
        int integer; //4
        float number;//4

        struct {
            Toy_RefString* ptr; //8
            //string hash?
        } string; //8

        struct Toy_LiteralArray* array; //8
        struct Toy_LiteralDictionary* dictionary; //8

        struct {
            union {
                void* bytecode;  //8
                Toy_NativeFn native; //8
                Toy_HookFn hook; //8
            } inner;  //8
            struct Toy_Scope* scope; //8
        } function; //16

        struct { //for variable names
            Toy_RefString* ptr;  //8
            int hash; //4
        } identifier; //16

        struct {
            struct Toy_Literal* subtypes; //8
            Toy_LiteralType typeOf;  //4
            unsigned char capacity; //1
            unsigned char count; //1
            bool constant; //1
        } type; //16

        struct {
            void* ptr; //8
            int tag; //4
        } opaque; //16
    } as; //16

    Toy_LiteralType type; //4
    int bytecodeLength; //4 - shenanigans
} Toy_Literal;
Enter fullscreen mode Exit fullscreen mode

By rearranging the members from largest-to-smallest in each struct/union, byte alignment allowed me to pack the whole literal into just 24 bytes, using the entire structure's contents.

I should also note a couple of quirks with the function type: as functions can only be one type (A Toy function represented by bytecode, a native C function, or a "hook" function which is used for libraries) I stuck them all in a union.

Also, the out-of-place member bytecodeLength represents the number of bytes used by Toy functions. This member is the only "wasteful" part, as it's used exclusively by the bytecode function.

This seems like it would've taken a lot of work to thread through my lang's internals, but it was surprisingly easy, since I had made it a habit of only interacting with literals via a big set of macros:

#define TOY_IS_NULL(value)    ((value).type == TOY_LITERAL_NULL)
#define TOY_IS_BOOLEAN(value) ((value).type == TOY_LITERAL_BOOLEAN)
#define TOY_IS_INTEGER(value) ((value).type == TOY_LITERAL_INTEGER)
//etc.
Enter fullscreen mode Exit fullscreen mode

So I only had to rework the macros, and add in a single macro that accessed the bytecode function length.

When not tinkering with languages, I'm usually a game developer - come follow me on Twitter! Or hire me, either is good.

Top comments (1)

Collapse
 
pauljlucas profile image
Paul J. Lucas

Why have function and type embedded inside Toy_Literal? If you have two Toy_Literal objects that refer to the same function or type, then all that information is duplicated. Why not define functions and types elsewhere (a distinct set for each) with all their relevant information then have Toy_Literal contain only pointers to those?