Discussion on: Thank you to byte-sized integers

View post

Some things to be aware of at least with C/C++, using smaller variable types might not actually save any RAM at all.

If you do a basic loop like:

for (unsigned int i=0; i<10; i++) {}

The variable "i" is never even in RAM to begin with. It will be used exclusively within a CPU register. By forcing the data size, you could inadvertently be micro-optimizing the code, because each iteration may need to perform an extra execution to truncate the data within the register to match the given data type (the compiler should be aware of this though and usually optimize it away)

In your other example:

const unsigned short int num = 1;

Is another case of the same thing. The compiler will see it as an unchangeable variable, therefor replace it entirely with an inline literal when generating the assembly code, making the data size moot as well (but once again possibly needing to truncate it with extra instructions due to forced data size)

This is honestly why I love just using the "auto" variable type now. Let the compiler figure out and optimize the best possible data type given the situation and surrounding code, instead of trying to hand-optimize little details.

The main time to worry about specific data sizes I think has to deal with repeatable data, such as arrays and structs. This is especially true when dealing with hardware drivers, where you may need to match a specific bit-specific data structure to match hardware.

Speaking of which, in C/C++, the smallest variable size is actually a single bit, not a byte! But be aware of the CPU and memory controller's minimum "word" size. Allocating a single bit will use up to a "word" of data. Dealing with i386 and AMD64, we can read-write a single byte. Other systems have a minimum of 2-byte or 4-byte words. In these systems, if you want to write a single byte, the compiler behind the scenes actually generates code to pull the word into a register, replace the single byte, then write the register back as a full word. Very slow for single-byte writes, but exceptionally fast when writing bulk data sequentially!

Also, despite it being called a "char", it really is just the same thing as a u8. The terms were made a very long time ago! Regardless of the terms, it is really all about how the underlaying CPU handles things! These languages are simply pulling in CPU raw features and abstracting them away using simpler concepts :)

Vincent Milum Jr • May 2 '19

Also for reference, the u8, i8, u16, i16 etc actually came from a handful of C/C++ libraries, especially in the video game programming world. Rust merely adopted what these libraries were already doing, making it the standard, since it is easier to read/write ;)

And in your picture above, a "long" and "long int" are both traditionally 32-bit, not 64-bit. "int" is a variable length number of bits depending on architecture, and "long long" is the 64-bit variant! This is especially helpful to note when dealing with micro-controllers that may be 8-bit or 16-bit CPUs, or moving up to 64-bit CPUs.

Basti Ortiz • May 3 '19

Oh, boy. That's a lot of low-level stuff I've never even considered. And to think, C++ used to be classified as a "high-level language". Indeed, I have much to learn. Thanks for taking the time to write all of this down for me. It means a lot.