Paul J. Lucas

Posted on Feb 20

Why C Requires the “struct” Keyword for Structures

Introduction

Regardless of whether you’ve been programming in C for many years or have only recently started, you might wonder why structure declarations require that the struct keyword be included. For example:

struct point {
  int x, y;
};

point p1;         // Error: "struct" required.
struct point p2;  // OK.

Obviously, struct is needed to declare a structure, but that’s not what this article is about. This article is about why, once declared, struct is still needed to use the previously declared structure type. If you omit struct, the compiler still knows what you mean, but it pedantically requires that you include the struct keyword anyway.

If you’ve programmed in other languages, this requirement seems even more curious because no other language that allows you to declare “structure” (Go), “record” (Pascal), or “class” (C++, Java, Python, and many others) types requires repeating a keyword when you use the type. In other languages, such a type becomes a “first class citizen” that can be used just the same as a built-in type like int.

Indeed, when Stroustrup designed C++, he made it so that structure (and class) names are useable as-is without needing to be prefixed by a keyword. He gave his rationale in The Annotated C++ Reference Manual (§3.1c, p.26) an excerpt of which is:

Requiring that such prefixing always appear would compromise the effort to make use of user defined types, such as complex, as similar as possible to use of built-in types, such as int.

Formally, structure, union, and enumeration names are in a separate “tags” namespace. (I’ll get to unions and enumerations later, but let’s stick to structures for now.) This means you can have both a structure name and some other identifier name (variable or function) with the same name. For example, POSIX defines stat as both a structure and a function:

int stat( char const *path, struct stat *buf );

One not-very-helpful reason why the compiler requires struct is because that’s how Ritchie designed C. But then the question becomes, “Why did Ritchie design C that way?”

So as not to keep you in suspense, as far as I’ve been able to tell, nobody definitively knows the answer. Sadly, we lost Ritchie in 2011, so nobody can ask him. However, some people have offered plausible explanations for why struct is the way it is.

Origin Story

Where did the struct keyword come from? It’s most likely from Algol 68 that has both STRUCT and UNION as keywords.

Note that the Algol family of languages uses stropping for keywords, so bold, or some other mechanism, is required to represent keywords.

But even in Algol 68, there’s no tags namespace. That idea seems unique to C.

C Archeology

If you do some digging, you might come across Ritchie’s C Reference Manual (CRM) published in 1974 that predates The C Programming Language (TCPL1) first published in 1978 by four years. In CRM §8.2 Type specifiers, it says:

type-specifier:
        int
        char
        float
        double
        struct { type-decl-list }
        struct identifier { type-decl-list }
        struct identifier

Two things to note:

If you scan the entire CRM, nowhere is typedef mentioned. That’s because early versions of C didn’t have typedef.
This means that all types start with a keyword that would make writing any compiler simpler.

The lack of typedef can be independently verified by scanning the source code for the C compiler, specifically its keyword table, that’s part of Sixth Edition Unix (Unix V6) released in 1975 — no typedef.

Implications

As for making it simpler to write a compiler, consider the following:

A * B;          // What is this?

Without typedef, A and B must be variables, so the above is an expression for A times B. However, once you add typedef, A could be a type:

typedef int A;  // "A" is a synonym for "int"

If it is a type, then the above declares B to be a pointer to type A. Hence, it’s context-sensitive. This definitely complicates both the parser and the lexer often requiring a lexer hack.

Hence, it’s likely that Ritchie required struct to make writing the compiler simpler. He may have also believed it made code easier to read for humans. (Generally, programming languages that are easier to parse are also easier for humans to understand.)

`typedef`

By the time TCPL1 is published in 1978, typedef was added to C. This too can be independently verified by scanning the source code for the C compiler, specifically its keyword table, that’s part of Seventh Edition Unix (Unix V7) released in 1979 — with typedef.

typedef is quite handy, both for hiding implementation details especially for types that vary across platforms and helping to write complicated declarations that C is infamous for. Consider:

void (*signal(int sig, void (*func)(int)))(int);

which is the declaration for the POSIX library function. According to cdecl (line breaks added for readability):

cdecl> explain void (*signal(int sig, void (*func)(int)))(int)
declare signal as function
        ( sig as int,
          func as pointer to function (int) returning void )
    returning pointer to function (int) returning void

That can become much simpler by using typedef:

typedef void (*sig_t)(int);
sig_t signal( int sig, sig_t func );

Hence, Ritchie added typedef despite making the compiler harder to write.

Once typedef was added, then Ritchie could have retroactively altered the way struct is handled by doing away with the tags namespace and making structure types first class citizens. But by this time, there was already a lot of C code out there and such a change would have broken many programs. For better or worse, the way struct worked was metaphorically carved into stone by now.

If it’s any consolation, typedef can be used to make structure types into first class citizens:

typedef struct point point;
point p3;         // No "struct" needed now.

That is, typedef makes point in the global namespace be an alias for point in the tags namespace. The fact that they have the same name is fine since they’re in different namespaces. Personally, I do this for my own C code. However, other people have different views on when typedef should be used.

Unions & Enumerations

If you look back at the type-specifier from CRM, you might also notice that neither union nor enum are there either. Unions became part of C by the time of Unix V7 and TCPL1, but enumerations didn’t become part of C until C89.

Regardless, both union and enum work similarly to struct in that their names are in the same tags namespace. Ritchie very likely did this for consistency.

Conclusion

So the most likely answer as to why C requires struct is that it originally made all types start with a keyword that in turn made writing the early compiler simpler.

DEV Community