DEV Community

Cover image for C and C++ don't give you actual control on memory
Taqmuraz
Taqmuraz

Posted on

C and C++ don't give you actual control on memory

It's common to think that Java, C# and Python have high level of abstraction over hardware, while C and C++ are low level, close to hardware, giving you full control on usage of this exact hardware -- CPU and memory.
And, it's true that you can't allocate memory manually in C#, Java or Python, same as you don't have complete control on what machine does. These languages keep enough distance from hardware, so you, programmer, may forget about memory management and platform-dependent code. Right. But, what about C and C++?

C standard and Undefined Behavior

It's also common to know that C and C++ have such thing as undefined behavior (UB). Usually people call UB something that causes weird bugs or changes program logic when switching platforms. In reality, UB means that if code logic is not defined by standard or explicitly stated as undefined, compiler may generate any code it likes and your program may do whatever compiler developers designed (or made by mistake) to do. Compiler expects that your code never has UB, so when it has, it won't warn you.
Undefined Behavior is not unique for C/C++ standards. Common Lisp standard, for example, also uses term "undefined" to define when compiler developers have freedom designing and optimizing Common Lisp implementation. Though, I am not aware of Common Lisp implementation that leaves such places undefined. In some cases they extend standard, filling "undefined" gaps, in other cases they signal an error when "undefined" scenario happens.
What is really unique about C/C++, is entering "undefined behavior" silently, considering it normal, and moving all responsibility for UB from compiler developers to compiler users. It could be kind of correct, if programs you write would do what you mean them to do. But, C/C++ programs are not the case.

Code you write is not the code you run

C/C++ compilers aggressively force bunch of optimizations, most of which are weird, unnecessary, and dangerous for most of programmers. Such optimizations may be enabled with -O, -O1, -O2, -O3 flags, and some of them may be disabled after with own dedicated flags.
Here is an example. Everyone, who programs on C and C++, knows that if pointers have same binary value, they are equal. Now it's time to surprise, because standard says :
C standard 6.5.10.7

Two pointers compare equal if and only if both are null pointers, both
are pointers to the same object (including a pointer to an object and
a subobject at its beginning) or function, both are pointers to one
past the last element of the same array object, or one is a pointer to
one past the end of one array object and the other is a pointer to the
start of a different array object that happens to immediately follow
the first array object in the address space

Nothing is said about pointers being just integer byte addresses. Instead, it treats pointers as some abstract entities on level higher than system memory control. Why it doesn't say that pointers are equal if their integer addresses are equal? Simple, because C standard thinks they don't. Don't believe me? Check that :

#include "stdio.h"

int main(int argc, char** argv) {
  int x;
  int y;
  int* a = &x + 1;
  int* b = &y;
  printf("%p, %p, %s\n", a, b, a == b ? "true" : "false");
}
Enter fullscreen mode Exit fullscreen mode

Simple program. Outputs are clear for all C programmers : two same pointers and true after. Compiling, running, and...yes.

g++ pointers_eq.c -o peq; ./peq
0x7ffc4bd50194, 0x7ffc4bd50194, true
Enter fullscreen mode Exit fullscreen mode

So, C standard wins? Not yet. We must enable optimizations so miracles start happening.

g++ -O pointers_eq.c -o peq; ./peq
0x7ffffb069694, 0x7ffffb069694, false
Enter fullscreen mode Exit fullscreen mode

Well, we see with own eyes that a and b addresses are equal, how they aren't? Because, C compiler sees that a and b pointers have different origins, and assumes that origins are different enough to replace a == b with false.
You can tell, that nobody is going to write such a program exploiting stack layout. Then I have another example for you.

#include "stdio.h"

void fun (int* a, float* b) {
  *a = 0;
  *b = 3.33;
  *a = *a + 10;
}

typedef union {
  int a;
  float b;
} ab;

int main(int argc, char** argv) {
  ab x;
  fun(&x.a, &x.b);
  printf("%d\n%f\n", x.a, x.b);
}
Enter fullscreen mode Exit fullscreen mode

Two pointers, a and b, share same memory, but point on different types. Naive logic tells, that after calling fun, x.a will have some garbage value and x.b will have corrupt float value close to 3.33.

g++ two_pointers.c -o tp; ./tp
1079320258
3.330002
Enter fullscreen mode Exit fullscreen mode

Now, using -O3 flag

g++ -O3 two_pointers.c -o tp; ./tp
10
0.000000
Enter fullscreen mode Exit fullscreen mode

Let me explain what happened. C standard has a rule named strict aliasing. It regulates how program must access values from memory. Behavior is Undefined, when program reads or writes values in the same memory, accessing it by pointers of incompatible types. While optimizing function fun with -O3,compiler assumed, that a and b point to different memory addresses, otherwise program would violate strict aliasing rule, causing UB. Then, it transformed fun code to the following :

void fun (int* a, float* b) {
  *a = 0;
  *b = 3.33;
  *a = 10;
}
Enter fullscreen mode Exit fullscreen mode

Because *a is equal to 0, it removes addition and assigns it 10.

No real control

There is a lot more of such cases, most confusing for me is that integer numbers overflow causes UB.
Writing program causing UB is so easy that most of existing C programs have it.
You don't have real control on hardware, because when you start doing things you couldn't do in Java, C# or Python, you go beyond C standard and cause UB. C standard does not provide a sandbox, where you can do whatever you want, nobody to blame but you. Instead, it aggressively modifies your code, assuming you don't step out standard bounds. You may argue that you can avoid using compiler optimizations. Yes, you can, but, C and C++ are shamefully slow without those optimizations.

Control is a burden

Language, that would have complete control on memory (let's call it the naive C), makes programs very difficult to optimize. Usually programs don't use all language means, what allows language developers to optimize such cases (like caching short string values in processor's registry). But, the naive C allows programmer to modify any memory at any time, any function call may cause side effects, changing any byte of memory it told to change.
The naive C can't rely on aggressive optimizations, and without them programs won't show performance miracles. Then, writing originally optimized programs would require deeper knowledge of target hardware and intelligence above average (which is rare). And even if you could do so, it would require rewriting all C/C++ standard header files, because their runtime speed depends a lot on aggressive optimizations.

C is an abstract, high level language

Reading C standard, I can't erase impression of being distanced away from hardware and reality. It tries to stand close to hardware, but avoids specifying cases of exact close interaction with hardware. Forget that pointers are integer addresses, forget that integer overflow is a thing, forget that memory is just bytes and bits, with no types. Types supposed to be just hints on how to use memory, but in fact they are way more than that.
I don't mention C++ here, because it's even more distant from hardware than C. Most of C++ programmers avoid using raw pointers and C-like structs, preferring complex object-oriented models with overloaded operators/constructors/destructors. Even though C++ "extends" C, they are very different in real world code.

Final

C/C++ don't give you to actually control memory, instead they give you to play with memory. They give feeling of control, of making impact on computer's state, and nothing more than that.
Thinking there are operations you can do in C/C++ and can't do in other (better) languages is a spread fallacy, based on lack of experience with other, "foreign" languages/concepts.
I am developing graphics/games/compilers using 0 lines of C/C++ code. It is possible, and it is a pleasure.

Top comments (0)