DEV Community

Cover image for 13 Most Asked Questions About C Programming
KiranPoudel98 for Truemark Technology

Posted on • Originally published at thedevpost.com

13 Most Asked Questions About C Programming

#c

C is a general-purpose programming language developed by Dennis Ritchie and Bell Labs. It is mainly used in advanced scientific systems and operating systems. C is one of the oldest programming languages, but still it is one of the popular languages. So, today we will be checking out the 13 most asked questions about C programming.

13 Most Asked Questions About C Programming

1. Do we cast the result of malloc?

Answer:

No, you don’t cast the result, since:

  • It is unnecessary, as void * is automatically and safely promoted to any other pointer type in this case.
  • It adds clutter to the code, casts are not very easy to read (especially if the pointer type is long).
  • It makes you repeat yourself, which is generally bad.
  • It can hide an error if you forgot to include <stdlib.h>. This can cause crashes (or, worse, not cause a crash until way later in some totally different part of the code). Consider what happens if pointers and integers are differently sized; then you’re hiding a warning by casting and might lose bits of your returned address.

Note: As of C99 implicit functions are gone from C, and this point is no longer relevant since there’s no automatic assumption that undeclared functions return int.

There are simply no benefits to doing it, but a bunch of potential risks, and including the cast indicates that you don’t know about the risks. To add further, your code needlessly shouldn’t repeat the type information (int) which can cause errors. It’s better to de-reference the pointer being used to store the return value, to “lock” the two together:

int *sieve = malloc(length * sizeof *sieve);
Enter fullscreen mode Exit fullscreen mode

This also moves the length to the front for increased visibility, and drops the redundant parentheses with sizeof; they are only needed when the argument is a type name. Many people seem to not know (or ignore) this, which makes their code more verbose. Remember: sizeof is not a function. While moving length to the front may increase visibility in some rare cases, one should also pay attention that in the general case, it should be better to write the expression as:

int *sieve = malloc(sizeof *sieve * length);
Enter fullscreen mode Exit fullscreen mode

Since keeping the sizeof first, in this case, ensures multiplication is done with at least size_t math.

Compare: malloc(sizeof *sieve * length * width) vs. malloc(length * width * sizeof *sieve), the second may overflow the length * width when width and length are smaller types than size_t.

Alternative Answer:

In C, you don’t need to cast the return value of malloc. The pointer to void returned by malloc is automagically converted to the correct type. However, if you want your code to compile with a C++ compiler, a cast is needed. A preferred alternative among the community is to use the following:

int *sieve = malloc(sizeof *sieve * length);
Enter fullscreen mode Exit fullscreen mode

which additionally frees you from having to worry about changing the right-hand side of the expression if ever you change the type of sieve. Casts are bad, especially pointer casts.

2. What does the ??!??! operator do in C?

Answer:

??! is a trigraph that translates to |. So it says:

!ErrorHasOccured() || HandleError();
Enter fullscreen mode Exit fullscreen mode

which, due to short circuiting, is equivalent to:

if (ErrorHasOccured())
    HandleError();
Enter fullscreen mode Exit fullscreen mode

Guru of the Week (deals with C++ but relevant here). Possible origin of trigraphs, it’s more likely due to EBCDIC being difficult (again). This discussion on the IBM developerworks board seems to support that theory. From ISO/IEC 9899:1999 §5.2.1.1, footnote 12 (h/t @Random832):

The trigraph sequences enable the input of characters that are not defined in the Invariant Code Set as described in ISO/IEC 646, which is a subset of the seven-bit US ASCII code set.

3. What is “:-!!” in C code?

Answer:

This is, in effect, a way to check whether the expression e can be evaluated to be 0, and if not, to fail the build. The macro is somewhat misnamed; it should be something more like BUILD_BUG_OR_ZERO, rather than ...ON_ZERO. (There have been occasional discussions about whether this is a confusing name.) You should read the expression like this:

sizeof(struct { int: -!!(e); }))
Enter fullscreen mode Exit fullscreen mode
  • (e): Compute expression e.
  • !!(e): Logically negate twice: 0 if e == 0; otherwise 1.
  • -!!(e): Numerically negate the expression from step 2: 0 if it was 0; otherwise -1.
  • struct{int: -!!(0);} --> struct{int: 0;}: If it was zero, then we declare a struct with an anonymous integer bitfield that has width zero. Everything is fine and we proceed as normal.
  • struct{int: -!!(1);} --> struct{int: -1;}: On the other hand, if it isn’t zero, then it will be some negative number. Declaring any bitfield with negative width is a compilation error.

So we’ll either wind up with a bitfield that has width 0 in a struct, which is fine, or a bitfield with negative width, which is a compilation error. Then we take sizeof that field, so we get a size_t with the appropriate width (which will be zero in the case where e is zero). You don’t want to detect problems in your kernel at runtime that could have been caught earlier. It’s a critical piece of the operating system. To whatever extent problems can be detected at compile-time, so much the better.

Alternative Answer:

The : is a bitfield. As for !!, that is logical double negation and so returns 0 for false or 1 for true. And the - is a minus sign, i.e. arithmetic negation. It’s all just a trick to get the compiler to barf on invalid inputs. Consider BUILD_BUG_ON_ZERO. When -!!(e) evaluates to a negative value, that produces a compile error. Otherwise -!!(e) evaluates to 0, and a 0 width bitfield has size of 0. And hence the macro evaluates to a size_t with value 0. The name is weak in my view because the build, in fact fails when the input is not zero. BUILD_BUG_ON_NULL is very similar, but yields a pointer rather than an int.

4. With arrays, why is it the case that a[5] == 5[a]?

Answer:

The C standard defines the [] operator as follows: a[b] == *(a + b) Therefore a[5] will evaluate to:

*(a + 5)
Enter fullscreen mode Exit fullscreen mode

and 5[a] will evaluate to:

*(5 + a)
Enter fullscreen mode Exit fullscreen mode

a is a pointer to the first element of the array. a[5] is the value that’s 5 elements further from a, which is the same as *(a + 5), and from elementary school math we know those are equal (addition is commutative).

Alternative Answer:

Because array access is defined in terms of pointers. a[i] is defined to mean *(a + i), which is commutative.

5. How do function pointers in C work?

Answer:

Let’s start with a basic function which we will be pointing to:

int addInt(int n, int m) {
    return n+m;
}
Enter fullscreen mode Exit fullscreen mode

First thing, let’s define a pointer to a function which receives 2 ints and returns an int:

int (*functionPtr)(int,int);
Enter fullscreen mode Exit fullscreen mode

Now we can safely point to our function:

functionPtr = &addInt;
Enter fullscreen mode Exit fullscreen mode

Now that we have a pointer to the function, let’s use it:

int sum = (*functionPtr)(2, 3); // sum == 5
Enter fullscreen mode Exit fullscreen mode

Passing the pointer to another function is basically the same:

int add2to3(int (*functionPtr)(int, int)) {
    return (*functionPtr)(2, 3);
}
Enter fullscreen mode Exit fullscreen mode

We can use function pointers in return values as well (try to keep up, it gets messy):

// this is a function called functionFactory which receives parameter n
// and returns a pointer to another function which receives two ints
// and it returns another int
int (*functionFactory(int n))(int, int) {
    printf("Got parameter %d", n);
    int (*functionPtr)(int,int) = &addInt;
    return functionPtr;
}
Enter fullscreen mode Exit fullscreen mode

But it’s much nicer to use a typedef:

typedef int (*myFuncDef)(int, int);
// note that the typedef name is indeed myFuncDef

myFuncDef functionFactory(int n) {
    printf("Got parameter %d", n);
    myFuncDef functionPtr = &addInt;
    return functionPtr;
}
Enter fullscreen mode Exit fullscreen mode

Alternative Answer:

Function pointers in C can be used to perform object-oriented programming in C. For example, the following lines is written in C:

String s1 = newString();
s1->set(s1, "hello");
Enter fullscreen mode Exit fullscreen mode

Yes, the -> and the lack of a new operator is a dead give away, but it sure seems to imply that we’re setting the text of some String class to be "hello". By using function pointers, it is possible to emulate methods in C. How is this accomplished? The String class is actually a struct with a bunch of function pointers which act as a way to simulate methods. The following is a partial declaration of the String class:

typedef struct String_Struct* String;

struct String_Struct
{
    char* (*get)(const void* self);
    void (*set)(const void* self, char* value);
    int (*length)(const void* self);
};

char* getString(const void* self);
void setString(const void* self, char* value);
int lengthString(const void* self);

String newString();
Enter fullscreen mode Exit fullscreen mode

As can be seen, the methods of the String class are actually function pointers to the declared function. In preparing the instance of the String, the newString function is called in order to set up the function pointers to their respective functions:

String newString()
{
    String self = (String)malloc(sizeof(struct String_Struct));

    self->get = &getString;
    self->set = &setString;
    self->length = &lengthString;

    self->set(self, "");

    return self;
}
Enter fullscreen mode Exit fullscreen mode

For example, the getString function that is called by invoking the get method is defined as the following:

char* getString(const void* self_obj)
{
    return ((String)self_obj)->internal->value;
}
Enter fullscreen mode Exit fullscreen mode

One thing that can be noticed is that there is no concept of an instance of an object and having methods that are actually a part of an object, so a “self object” must be passed in on each invocation. (And the internal is just a hidden struct which was omitted from the code listing earlier — it is a way of performing information hiding, but that is not relevant to function pointers.) So, rather than being able to do s1->set("hello");, one must pass in the object to perform the action on s1->set(s1, "hello").

With that minor explanation having to pass in a reference to yourself out of the way, we’ll move to the next part, which is inheritance in C. Let’s say we want to make a subclass of String, say an ImmutableString. In order to make the string immutable, the set method will not be accessible, while maintaining access to get and length, and force the “constructor” to accept a char*:

typedef struct ImmutableString_Struct* ImmutableString;

struct ImmutableString_Struct
{
    String base;

    char* (*get)(const void* self);
    int (*length)(const void* self);
};

ImmutableString newImmutableString(const char* value);
Enter fullscreen mode Exit fullscreen mode

Basically, for all subclasses, the available methods are once again function pointers. This time, the declaration for the set method is not present, therefore, it cannot be called in a ImmutableString. As for the implementation of the ImmutableString, the only relevant code is the “constructor” function, the newImmutableString:

ImmutableString newImmutableString(const char* value)
{
    ImmutableString self = (ImmutableString)malloc(sizeof(struct ImmutableString_Struct));

    self->base = newString();

    self->get = self->base->get;
    self->length = self->base->length;

    self->base->set(self->base, (char*)value);

    return self;
}
Enter fullscreen mode Exit fullscreen mode

In instantiating the ImmutableString, the function pointers to the get and length methods actually refer to the String.get and String.length method, by going through the base variable which is an internally stored String object. The use of a function pointer can achieve the inheritance of a method from a superclass. We can further continue to polymorphism in C. If for example we wanted to change the behavior of the length method to return 0 all the time in the ImmutableString class for some reason, all that would have to be done is to:

  • Add a function that is going to serve as the overriding length method.
  • Go to the “constructor” and set the function pointer to the overriding length method.

Adding an overriding length method in ImmutableString may be performed by adding an lengthOverrideMethod:

int lengthOverrideMethod(const void* self)
{
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

Then, the function pointer for the length method in the constructor is hooked up to the

lengthOverrideMethod:

ImmutableString newImmutableString(const char* value)
{
    ImmutableString self = (ImmutableString)malloc(sizeof(struct ImmutableString_Struct));

    self->base = newString();

    self->get = self->base->get;
    self->length = &lengthOverrideMethod;

    self->base->set(self->base, (char*)value);

    return self;
}
Enter fullscreen mode Exit fullscreen mode

Now, rather than having an identical behavior for the length method in ImmutableString class as the String class, now the length method will refer to the behavior defined in the lengthOverrideMethod function. For more information on how to perform object-oriented programming in C, please refer to the following questions:

6. What does “static” mean in C?

Answer:

  • A static variable inside a function keeps its value between invocations.
  • A static global variable or a function is “seen” only in the file it’s declared in

(1) is the more foreign topic if you’re a newbie, so here’s an example:

#include <stdio.h>

void foo()
{
    int a = 10;
    static int sa = 10;

    a += 5;
    sa += 5;

    printf("a = %d, sa = %d\n", a, sa);
}


int main()
{
    int i;

    for (i = 0; i < 10; ++i)
        foo();
}
Enter fullscreen mode Exit fullscreen mode

This prints:

a = 15, sa = 15
a = 15, sa = 20
a = 15, sa = 25
a = 15, sa = 30
a = 15, sa = 35
a = 15, sa = 40
a = 15, sa = 45
a = 15, sa = 50
a = 15, sa = 55
a = 15, sa = 60
Enter fullscreen mode Exit fullscreen mode

This is useful for cases where a function needs to keep some state between invocations, and you don’t want to use global variables. Beware, however, this feature should be used very sparingly – it makes your code not thread-safe and harder to understand.

(2) Is used widely as an “access control” feature. If you have a .c file implementing some functionality, it usually exposes only a few “public” functions to users. The rest of its functions should be made static, so that the user won’t be able to access them. This is encapsulation, a good practice.

Quoting Wikipedia:

In the C programming language, static is used with global variables and functions to set their scope to the containing file. In local variables, static is used to store the variable in the statically allocated memory instead of the automatically allocated memory. While the language does not dictate the implementation of either type of memory, statically allocated memory is typically reserved in data segment of the program at compile time, while the automatically allocated memory is normally implemented as a transient call stack.

In C++, however, static is also used to define class attributes (shared between all objects of the same class) and methods. In C there are no classes, so this feature is irrelevant.

7. How to determine the size of an array in C?

Answer:

Executive summary:

int a[17];
size_t n = sizeof(a)/sizeof(a[0]);
Enter fullscreen mode Exit fullscreen mode

Full answer:

To determine the size of your array in bytes, you can use the sizeof operator:

int a[17];
size_t n = sizeof(a);
Enter fullscreen mode Exit fullscreen mode

To determine the number of elements in the array, we can divide the total size of the array by the size of the array element. You could do this with the type, like this:

int a[17];
size_t n = sizeof(a) / sizeof(int);
Enter fullscreen mode Exit fullscreen mode

and get the proper answer (68 / 4 = 17), but if the type of a changed you would have a nasty bug if you forgot to change the sizeof(int) as well. So the preferred divisor is sizeof(a[0]) or the equivalent sizeof(*a), the size of the first element of the array.

int a[17];
size_t n = sizeof(a) / sizeof(a[0]);
Enter fullscreen mode Exit fullscreen mode

Another advantage is that you can now easily parameterize the array name in a macro and get:

#define NELEMS(x)  (sizeof(x) / sizeof((x)[0]))

int a[17];
size_t n = NELEMS(a);
Enter fullscreen mode Exit fullscreen mode

Alternative Answer:

The sizeof way is the right way if you are dealing with arrays not received as parameters. An array sent as a parameter to a function is treated as a pointer, so sizeof will return the pointer’s size, instead of the array’s. Thus, inside functions this method does not work. Instead, always pass an additional parameter size_t size indicating the number of elements in the array. Test:

#include <stdio.h>
#include <stdlib.h>

void printSizeOf(int intArray[]);
void printLength(int intArray[]);

int main(int argc, char* argv[])
{
    int array[] = { 0, 1, 2, 3, 4, 5, 6 };

    printf("sizeof of array: %d\n", (int) sizeof(array));
    printSizeOf(array);

    printf("Length of array: %d\n", (int)( sizeof(array) / sizeof(array[0]) ));
    printLength(array);
}

void printSizeOf(int intArray[])
{
    printf("sizeof of parameter: %d\n", (int) sizeof(intArray));
}

void printLength(int intArray[])
{
    printf("Length of parameter: %d\n", (int)( sizeof(intArray) / sizeof(intArray[0]) ));
}
Enter fullscreen mode Exit fullscreen mode

Output (in a 64-bit Linux OS):

sizeof of array: 28
sizeof of parameter: 8
Length of array: 7
Length of parameter: 2
Enter fullscreen mode Exit fullscreen mode

Output (in a 32-bit windows OS):

sizeof of array: 28
sizeof of parameter: 4
Length of array: 7
Length of parameter: 1
Enter fullscreen mode Exit fullscreen mode

8. Why does the C preprocessor interpret the word “linux” as the constant “1”?

Answer:

In the Old Days (pre-ANSI), predefining symbols such as unix and vax was a way to allow code to detect at compile time what system it was being compiled for. There was no official language standard back then (beyond the reference material at the back of the first edition of K&R), and the C code of any complexity was typically a complex maze of #ifdefs to allow for differences between systems. These macro definitions were generally set by the compiler itself, not defined in a library header file. Since there were no real rules about which identifiers could be used by the implementation and which were reserved for programmers, compiler writers felt free to use simple names like unix and assumed that programmers would simply avoid using those names for their own purposes.

The 1989 ANSI C standard introduced rules restricting what symbols an implementation could legally predefine. A macro predefined by the compiler could only have a name starting with two underscores, or with an underscore followed by an uppercase letter, leaving programmers free to use identifiers not matching that pattern and not used in the standard library.

As a result, any compiler that predefines unix or linux is non-conforming, since it will fail to compile perfectly legal code that uses something like int linux = 5;.

As it happens, gcc is non-conforming by default — but it can be made to conform (reasonably well) with the right command-line options:

gcc -std=c90 -pedantic ... # or -std=c89 or -ansi
gcc -std=c99 -pedantic
gcc -std=c11 -pedantic
Enter fullscreen mode Exit fullscreen mode

See the GCC manual for more details.

GCC will be phasing out these definitions in future releases, so you shouldn’t write code that depends on them. If your program needs to know whether it’s being compiled for a Linux target or not it can check whether __linux__ is defined (assuming you’re using GCC or a compiler that’s compatible with it). See the GNU C preprocessor manual for more information.

A largely irrelevant aside: the “Best One Liner” winner of the 1987 International Obfuscated C Code Contest, by David Korn (yes, the author of the Korn Shell) took advantage of the predefined unix macro:

main() { printf(&unix["\021%six\012\0"],(unix)["have"]+"fun"-0x60);}
Enter fullscreen mode Exit fullscreen mode

It prints "unix", but for reasons that have absolutely nothing to do with the spelling of the macro name.

9. How to initialize all members of an array to the same value?

Answer:

Unless that value is 0 (in which case you can omit some part of the initializer and the corresponding elements will be initialized to 0), there’s no easy way. Don’t overlook the obvious solution, though:

int myArray[10] = { 5, 5, 5, 5, 5, 5, 5, 5, 5, 5 };
Enter fullscreen mode Exit fullscreen mode

Elements with missing values will be initialized to 0:

int myArray[10] = { 1, 2 }; // initialize to 1,2,0,0,0...
Enter fullscreen mode Exit fullscreen mode

So this will initialize all elements to 0:

int myArray[10] = { 0 }; // all elements 0
Enter fullscreen mode Exit fullscreen mode

In C++, an empty initialization list will also initialize every element to 0. This is not allowed with C:

int myArray[10] = {}; // all elements 0 in C++
Enter fullscreen mode Exit fullscreen mode

Remember that objects with static storage duration will initialize to 0 if no initializer is specified:

static int myArray[10]; // all elements 0
Enter fullscreen mode Exit fullscreen mode

And that “0” doesn’t necessarily mean “all-bits-zero”, so using the above is better and more portable than memset(). (Floating point values will be initialized to +0, pointers to null value, etc.)

Alternative Answer:

If your compiler is GCC you can use following syntax:

int array[1024] = {[0 ... 1023] = 5};
Enter fullscreen mode Exit fullscreen mode

Check out detailed description: http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Designated-Inits.html

10. What is the difference between ++i and i++?

Answer:

  • ++i will increment the value of i, and then return the incremented value.
i = 1;
j = ++i;
(i is 2, j is 2)
Enter fullscreen mode Exit fullscreen mode
  • i++ will increment the value of i, but return the original value that i held before being incremented.
     i = 1;
     j = i++;
     (i is 2, j is 1)
Enter fullscreen mode Exit fullscreen mode

For a for loop, either works. ++i seems more common, perhaps because that is what is used in K&R.

In any case, follow the guideline “prefer ++i over i++” and you won’t go wrong.

In any non-student-project compiler, there will be no performance difference. You can verify this by looking at the generated code, which will be identical.

It’s different for a C++ object, since operator++() is a function and the compiler can’t know to optimize away the creation of a temporary object to hold the intermediate value.

Alternative Answer:

i++ is known as Post Increment whereas ++i is called Pre Increment.

i++
Enter fullscreen mode Exit fullscreen mode

i++ is post increment because it increments i‘s value by 1 after the operation is over.

Lets see the following example:

int i = 1, j;
j = i++;
Enter fullscreen mode Exit fullscreen mode

Here value of j = 1 but i = 2. Here value of i will be assigned to j first then i will be incremented.

++i
Enter fullscreen mode Exit fullscreen mode

++i is pre increment because it increments i‘s value by 1 before the operation. It means j = i; will execute after i++.

Lets see the following example:

int i = 1, j;
j = ++i;
Enter fullscreen mode Exit fullscreen mode

Here value of j = 2 but i = 2. Here value of i will be assigned to j after the i incremention of i. Similarly ++i will be executed before j=i;.

For the question like which should be used in the incrementation block of a for loop? The answer is, you can use any one, it doesn’t matter. It will execute your for loop the same no. of times.

for(i=0; i<5; i++)
   printf("%d ",i);
Enter fullscreen mode Exit fullscreen mode

And

for(i=0; i<5; ++i)
   printf("%d ",i);
Enter fullscreen mode Exit fullscreen mode

Both the loops will produce same output i.e. 0 1 2 3 4. It only matters where you are using it.

for(i = 0; i<5;)
    printf("%d ",++i);
Enter fullscreen mode Exit fullscreen mode

In this case output will be 1 2 3 4 5.

11. What is the difference between a definition and a declaration?

Answer:

A declaration introduces an identifier and describes its type, be it a type, object, or function. A declaration is what the compiler needs to accept references to that identifier. These are declarations:

extern int bar;
extern int g(int, int);
double f(int, double); // extern can be omitted for function declarations
class foo; // no extern allowed for type declarations
Enter fullscreen mode Exit fullscreen mode

A definition actually instantiates/implements this identifier. It’s what the linker needs in order to link references to those entities. These are definitions corresponding to the above declarations:

int bar;
int g(int lhs, int rhs) {return lhs*rhs;}
double f(int i, double d) {return i+d;}
class foo {};
Enter fullscreen mode Exit fullscreen mode

A definition can be used in the place of a declaration.

An identifier can be declared as often as you want. Thus, the following is legal in C and C++:

double f(int, double);
double f(int, double);
extern double f(int, double); // the same as the two above
extern double f(int, double);
Enter fullscreen mode Exit fullscreen mode

However, it must be defined exactly once. If you forget to define something that’s been declared and referenced somewhere, then the linker doesn’t know what to link references to and complains about missing symbols. If you define something more than once, then the linker doesn’t know which of the definitions to link references to and complains about duplicated symbols.

Since the debate what is a class declaration vs. a class definition in C++ keeps coming up, here is a quote from the C++ standard here. At 3.1/2, C++03 says:

A declaration is a definition unless it […] is a class name declaration […].

3.1/3 then gives a few examples. Amongst them:

[Example: [...]
struct S { int a; int b; }; // defines S, S::a, and S::b [...]
struct S; // declares S
—end example
Enter fullscreen mode Exit fullscreen mode

To sum it up: The C++ standard considers struct x; to be a declaration and struct x {}; a definition. (In other words, “forward declaration” a misnomer, since there are no other forms of class declarations in C++.)

12. What is the strict aliasing rule?

Answer:

A typical situation where you encounter strict aliasing problems is when overlaying a struct (like a device/network msg) onto a buffer of the word size of your system (like a pointer to uint32_ts or uint16_ts). When you overlay a struct onto such a buffer, or a buffer onto such a struct through pointer casting you can easily violate strict aliasing rules.

So in this kind of setup, if you want to send a message to something you would have to have two incompatible pointers pointing to the same chunk of memory. You might then naively code something like this (on a system with sizeof(int) == 2):

typedef struct Msg
{
    unsigned int a;
    unsigned int b;
} Msg;

void SendWord(uint32_t);

int main(void)
{
    // Get a 32-bit buffer from the system
    uint32_t* buff = malloc(sizeof(Msg));

    // Alias that buffer through message
    Msg* msg = (Msg*)(buff);

    // Send a bunch of messages    
    for (int i =0; i < 10; ++i)
    {
        msg->a = i;
        msg->b = i+1;
        SendWord(buff[0]);
        SendWord(buff[1]);   
    }
}
Enter fullscreen mode Exit fullscreen mode

The strict aliasing rule makes this setup illegal: dereferencing a pointer that aliases an object that is not of a compatible type or one of the other types allowed by C 2011 6.5 paragraph 7 is undefined behavior. Unfortunately, you can still code this way, maybe get some warnings, have it compile fine, only to have weird unexpected behavior when you run the code.

(GCC appears somewhat inconsistent in its ability to give aliasing warnings, sometimes giving us a friendly warning and sometimes not.)

To see why this behavior is undefined, we have to think about what the strict aliasing rule buys the compiler. Basically, with this rule, it doesn’t have to think about inserting instructions to refresh the contents of buff every run of the loop. Instead, when optimizing, with some annoyingly unenforced assumptions about aliasing, it can omit those instructions, load buff[0] and buff[1] into CPU registers once before the loop is run, and speed up the body of the loop. Before strict aliasing was introduced, the compiler had to live in a state of paranoia that the contents of buff could change at any time from anywhere by anybody. So to get an extra performance edge, and assuming most people don’t type-pun pointers, the strict aliasing rule was introduced.

Keep in mind, if you think the example is contrived, this might even happen if you’re passing a buffer to another function doing the sending for you, if instead, you have.

void SendMessage(uint32_t* buff, size_t size32)
{
    for (int i = 0; i < size32; ++i) 
    {
        SendWord(buff[i]);
    }
}
Enter fullscreen mode Exit fullscreen mode

And rewrote our earlier loop to take advantage of this convenient function

for (int i = 0; i < 10; ++i)
{
    msg->a = i;
    msg->b = i+1;
    SendMessage(buff, 2);
}
Enter fullscreen mode Exit fullscreen mode

The compiler may or may not be able to or smart enough to try to inline SendMessage and it may or may not decide to load or not load buff again. If SendMessage is part of another API that’s compiled separately, it probably has instructions to load buff’s contents. Then again, maybe you’re in C++ and this is some templated header only implementation that the compiler thinks it can inline. Or maybe it’s just something you wrote in your .c file for your own convenience. Anyway undefined behavior might still ensue. Even when we know some of what’s happening under the hood, it’s still a violation of the rule so no well-defined behavior is guaranteed. So just by wrapping in a function that takes our word delimited buffer doesn’t necessarily help.

So how to get around this?

  • Use a union. Most compilers support this without complaining about strict aliasing. This is allowed in C99 and explicitly allowed in C11.
    union {
        Msg msg;
        unsigned int asBuffer[sizeof(Msg)/sizeof(unsigned int)];
    };
Enter fullscreen mode Exit fullscreen mode
  • You can disable strict aliasing in your compiler (f[no-]strict-aliasing in gcc))
  • You can use char* for aliasing instead of your system’s word. The rules allow an exception for char* (including signed char and unsigned char). It’s always assumed that char* aliases other types. However this won’t work the other way: there’s no assumption that your struct aliases a buffer of chars.

Beginner beware

This is only one potential minefield when overlaying two types onto each other. You should also learn about endianness, word alignment, and how to deal with alignment issues through packing structs correctly.

Footnote

The types that C 2011 6.5 7 allows an lvalue to access are:

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

13. Difference between malloc and calloc?

Answer:

calloc() gives you a zero-initialized buffer, while malloc() leaves the memory uninitialized.

For large allocations, most calloc implementations under mainstream OSes will get known-zeroed pages from the OS (e.g. via POSIX mmap(MAP_ANONYMOUS) or Windows VirtualAlloc) so it doesn’t need to write them in user-space. This is how normal malloc gets more pages from the OS as well; calloc just takes advantage of the OS’s guarantee.

This means calloc memory can still be “clean” and lazily-allocated, and copy-on-write mapped to a system-wide shared physical page of zeros. (Assuming a system with virtual memory.)

Some compilers even can optimize malloc + memset(0) into calloc for you, but you should use calloc explicitly if you want the memory to read as 0.

If you aren’t going to ever read memory before writing it, use malloc so it can (potentially) give you dirty memory from its internal free list instead of getting new pages from the OS. (Or instead of zeroing a block of memory on the free list for a small allocation).

Embedded implementations of calloc may leave it up to calloc itself to zero memory if there’s no OS, or it’s not a fancy multi-user OS that zeros pages to stop information leaks between processes.

On embedded Linux, malloc could map(MAP_UNINITIALIZED|MAP_ANONYMOUS), which is only enabled for some embedded kernels because it’s insecure on a multi-user system.

In Conclusion

These are the 13 most commonly asked questions about C programming. If you have any suggestions or any confusion, please comment below. If you need any help, we will be glad to help you.

We, at Truemark, provide services like web and mobile app development, digital marketing, and website development. So, if you need any help and want to work with us, please feel free to contact us.

Hope this article helped you.

This post was first published on DevPostbyTruemark.

Top comments (0)