NOTE
Specific data type implementation depends on the system, it's configuration, the programming language and probably a lot more. This post targets the data type in general, not the specific platform implementation. If you find it useful, or if you disagree, feel free to leave a comment
Background
I’m new to software development, but I started out in a low-level world, with the traditional language C. As a beginner learning C, data types are important because all variables must be assigned accordingly, and even more so when dabbling in the world of embedded software. That’s why I felt quite frustrated when moving on to higher level languages like Python and JavaScript, where it is sufficient just writing var
or nothing at all. The interpreter infers the datatype, that does sound a bit sketchy to me.
Reading this blog post, I felt inspired to share some of my knowledge. The goal of this blog post is to go on a little journey through the basic data types, and find out how they differ, and why they are kind of all the same.
Data
No matter what type it is, the underlying format is always zero’s and one’s. This is crucial for my own understanding, modern computers process digital information as bits. Bits are the the components of binary numbers, and in more rigorous terms it is a base two system.
- Base two: [0-1] in each position
- Base ten: [0-9] in each position
For some reason we are familiar with base ten (perhaps our ten fingers), but for a computer base two is practical, as the voltage only have to be read as HIGH or LOW.
The amount of bits determines the amount of different combinations the number can represent:
- 1bit: 2 different options
- 2bit: 4 different options
- 3bit: 8 different options
- 4bit: 16 different options
- 5bit: 32 different options
- 6bit: 64 different options
- 7bit: 128 different options
- 8bit: 256 different options (aka. one byte)
- 16bit: 2^16 different options (aka.
short
) - 32bit: 2^32 different options (aka.
long
) - 64bit: 2^64 different options (aka.
long long
)
Notice the pattern, that often comes in handy when determining what data type to choose.
Type
The type basically comes down to two terms: signed
and unsigned
. When unsigned, only zero and positive numbers can be represented, for 8 bits of data this means 0 to 255. The same 8 bits of data can be unsigned, in that case negative numbers are also represented, -127 to 128.
There are different ways of doing this, but a common way is by two’s complement. The first bit is the sign, one is -1 and zero is +1. Say we take 65 and want its negative counterpart:
-
0b 0100 0001
= 65 -
0b 0100 0001
=(Flip all the bits)=>0b 1011 1110
-
0b 1011 1110
=(Add one)=>0b 1011 1111
= -65
Lastly, two more types must be identified, integer
and decimal
. Integers are pretty straightforward, only "whole" numbers can be represented, no fractions.
1 OR 2 apples, NOT 1.5 apples
The decimal fraction is represented in the decimal
group, often referred to as float
or double
depending on size. It is a bit more complex (great resource => Wikipage), but it still just a signed binary number, the difference is in the interpretation. A decimal
is split into two bit lengths; the significand part represents the value, and the exponent part represents where the decimal point should be placed. Everything else works the same way.
1 OR 2 apples, OR 1.5 apples
Data and type
All data types that I'm aware of are derived from its bit size, sign convention and the interpretation of its value. But what about string
? That is just another way of interpreting a binary value, 65 for an example is the upper case letter A. Check out the ASCII standard to see all commonly used characters. These characters are just signed integer 8bits
, often called char
.
The string
is nothing more than a container that contains a given amount of char
, the same way an array can contain a given amount of any data type. Another case is the bool
, that is false
when zero, else it is true
.
Analyze a data type like this
"sign convention" "number of bits" "interpretation of bits"
Example =>unsigned long int
Why data types matter
When building software I consider size of importance regarding memory usage, and subsequently performance. Of course the machines of our modern world are incredibly powerful, and the way the hardware and operating system are configured matters a lot regarding bit size performance. But I don't see the point of using data types that are larger than what I need, there is no reason to waste potential clock cycles storing the value of one, using 64 bits.
Control is probably even more important. Knowing what data types that are going in and out of functions, and assigning them accordingly massively improves readability, understanding and debugging. The last one because a lot of bugs won't even appear. The two first ones because you will get to see what data you're working with, and how you can manipulate it to your advantage.
Even though data types essentially are the same, just a special configuration of eight or more bits. They all have strengths and weaknesses, and quite different use cases. Knowing data types, means knowing what tool to use for the job, and that is actually very useful. And personally I find liberating knowing that the system isn't actually that complex, just a series HIGH's and LOW's.
Top comments (0)