DEV Community

Chris White
Chris White

Posted on

What Does The Best Programming Language Really Mean: C Language

In the last installment of this series we talked about basic hardware fundamentals and assembly language. Now it's time to look at one of one of the most influence programming languages that drives modern day programming: C.

The Origin

C is certainly not the first high level language that came after assembly. In fact there are around 20 languages that predate C including COBOL, FORTRAN, and BASIC. Dennis Ritchie first developed the language in the 1972, working closely with Ken Thompson. Both of them were on the same team developing UNIX at the time. The name C came from the language it predated: The B language.

Standardization

Brian Kernighan then recommended Dennis to write a book. In 1978 "The C Programming Language" (popularly known as the K&R book). This was the first C specification and the version of C sometimes referred to as "K&R C". In 1985 the American National Standards Institute (ANSI) released a first draft of the C specification. After much work it was ratified in 1989 becoming the first ANSI C and often referred to as C89. C17 is the current standard while C2x is in progress (it's believed that it will be completed in 2023, becoming C23).

The actual C language doesn't do much on its own. In order to provide functionality you would expect the standard C library is heavily utilized. Most Linux operating systems use glibc, which is the GNU implementation of the C library. Another standard C library is musl, which is more lightweight and used by Alpine Linux (popularly used in many container images). These libraries contain some level of the POSIX specification for the C library. It's important to note that some C libraries can also have extensions which give functionality specific to the C library being used, so not 100% of code is standard across systems.

Code Layout

C programs are primarily composed of two main files: header files and source code files. Header files are used to describe the definition of functions, structures, etc. Source code files give the defined functions an actual code implementation. Here's a very simple example (note this is excluding a few best practice for simplification purposes):

my_code.h

int my_code(int x);
Enter fullscreen mode Exit fullscreen mode

my_code.c

int my_code(int x)
{
        return x + x;
}
Enter fullscreen mode Exit fullscreen mode

main.c

#include <stdio.h>
#include "my_code.h"

int main(void)
{
        int y = my_code(6);
        printf("%d\n", y);
        return 0;
}
Enter fullscreen mode Exit fullscreen mode

if I build this using gcc assuming the files are in the same directory:

$ gcc -o my_binary main.c my_code.c
$ $ ./my_binary
12
Enter fullscreen mode Exit fullscreen mode

The one gcc call looks simple but there's actually quite a lot going on, something that will be covered in the next installment as it will take a decent chunk of time to describe. One particular interesting item to note is the presence of {} for a code block and ; to terminate statement which you'll find in some languages such as Javascript.

C Features

So looking at the above I'll dissect it to mention some of the C features.

Preprocessor Directives

#include <stdio.h>
#include "my_code.h"
Enter fullscreen mode Exit fullscreen mode

#include is what's known as a preprocessor directive. #include is similar to importing module code though the technically implementation is vastly different. As far as the <> vs double quotes, <> references system level header files and "" is meant to reference header files internal to the project. The include directive also works with paths so these:

#include <netinet/tcp.h>
#include "../my_code/my_code.h"
#include "/some/path/here/header.h"
Enter fullscreen mode Exit fullscreen mode

Are possible ways includes can be written (though absolute paths are not very maintainable and not recommended). It's also possible to do weird things like this:

#include <stdio.h>
#include "stdio.h"
Enter fullscreen mode Exit fullscreen mode

Though you should never do that. <stdio.h> in particular is one of the standard C library and deals with standard buffered input and output. It's what declares printf or print formatted. There are also a few other preprocessor directives which can:

  • Define values
  • Give a specific type a label
  • Make the existence of code conditional

This makes them extremely powerful for putting together code in an organized fashion.

Typing

In some languages you may see code similar to the following:

my_variable = "Hello World"
Enter fullscreen mode Exit fullscreen mode

You don't need to tell the underlying language that this is a string. In contrast C is a typed language meaning you have to actually say what type a variable is:

int my_variable = 5;
Enter fullscreen mode Exit fullscreen mode

Certain values can also be unsigned which increases the maximum possible value range:

unsigned int my_variable = 5;
Enter fullscreen mode Exit fullscreen mode

Specific types also have specific byte sizes. As an example:

int my_number = 10;
printf("%lu\n", sizeof(my_number));
Enter fullscreen mode Exit fullscreen mode

This will print 4 as the int type is 4 bytes. The main advantage here is that data types are defined at a very low level and fit to an exact byte size. This makes the types very light weight compared to some modern languages can have various attributes attached to even primitive data types. Though this comes at the cost of occasionally having to go through great lengths to achieve operations which are backed by the additional attributes attached to said types.

Characters are also technically implemented as values which can be mapped to integers. Operating systems have a concept of character encoding which maps binary values to their respective characters. Of the more commonly used is ASCII encoding which goes from 0 to 255 in the extended version (127 in the normal version). That means both of these will print out a lowercase a:

char x = 'a';
char y = 97;
Enter fullscreen mode Exit fullscreen mode

Important to note here is that "" is for strings and '' is for characters.

Arrays

Arrays are way to contain a number of values in a single item. Some programming languages refer to similar functionality as lists. Arrays work on the backend by allocating sequential blocks of memory based on the data type being held. For example:

int my_ints[4] = { 1, 7, 9, 10 };
int my_ints[] = { 1, 7, 9, 10 };
Enter fullscreen mode Exit fullscreen mode

The first declaration type allocates a specific number of elements. If no size restraint is declared via the [] it will automatically be set by the number of elements declared. Now while you can initialize arrays without giving them values, the result is not what you'd expect:

#include <stdio.h>

int main(void)
{
        int x[4];
        printf("%d\n", x[1]);
        return 0;
}
Enter fullscreen mode Exit fullscreen mode
$ ./my_binary
32766
Enter fullscreen mode Exit fullscreen mode

The value given is 1 below the maximum value for the integer data type. It's not very useful and this type of behavior is not best practice and should not be relied on. There is a way to take a fixed array and assigned all the values with a per-determined default:

int x[4] = {0};
Enter fullscreen mode Exit fullscreen mode

This will give a result that's easier to reason about and make debugging less of an issue. Another interesting concept with arrays is that their first element starts at 0 and the last element is size of array - 1:

int main(void)
{
        int my_ints[4] = { 1, 7, 9, 10 };
        printf("%d\n%d\n%d\n%d\n", my_ints[0], my_ints[1], my_ints[2], my_ints[3]);
        printf("%d\n", my_ints[4]);
        return 0;
}
Enter fullscreen mode Exit fullscreen mode

The first printf will print all the array values and the second printf will have undefined behavior, essentially trying to access memory after the bounds of the array. Iteration is another interesting concept. In C this occurs via a for loop most of the time, which iterates off array indexes. There's two basic ways to achieve this:

        // static declaration
        int array_size = 4;
        int my_ints[array_size] = { 1, 7, 9, 10 };

        // dynamic declaration
        int my_ints[] = { 1, 7, 9, 10 };
        int array_size = sizeof(my_ints) / sizeof(my_ints[0]);

        // loop
        for (int i = 0; i < array_size; i++) {
                printf("%d\n", my_ints[i]);
        }
Enter fullscreen mode Exit fullscreen mode

The dynamic declaration works because an array is a segment of memory of a number of specific types. It's important to note that sizeof() works off bytes, so the size of the array is the total size in memory of it divided by the size of one of the objects it holds. Due to the last element index being last element - 1, the iteration clause uses < array_size. Using <= instead would cause what's commonly known as an "off by one" bug.

Strings

There technically isn't a string type per-say. Instead a string is an array of characters that gets terminated by a null character commonly written as \0. An example of strings in C:

#include <stdio.h>

int main(void)
{
        char hello_world1[] = "Hello World";
        char hello_world2[12] = { 'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '\0' };
        char hello_world3[12] = { 72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 0 };
        printf("%s\n%s\n%s\n", hello_world1, hello_world2, hello_world3);
        return 0;
}
Enter fullscreen mode Exit fullscreen mode

The output gives:

$ ./my_binary
Hello World
Hello World
Hello World
Enter fullscreen mode Exit fullscreen mode

The " form here as mentioned is for strings specifically and lets you omit the "0" null character. Interestingly enough while string is not a specific data type in C, the C library does contain string.h with some helpful functions for working with strings.

Structures

Structures (sometimes referred to as "structs") are a data type in C used to encapsulate various data types for structured access. It's the closest the language has to the concept of objects, albeit much more primitive. An example declaration and instantiation:

#include <stdio.h>

typedef struct {
        int x;
        int y;
        int z;
} Numbers;

int main(void)
{
        Numbers my_numbers = { 1, 2, 3 };
        Numbers my_numbers2 = { .x = 1, .z = 3, .y = 2 };
        printf("%d\n%d\n", my_numbers.z, my_numbers2.z);
        return 0;
}
Enter fullscreen mode Exit fullscreen mode

First off typedef is an interesting part of the language which lets you essentially alias a specific type with a name. Technically you could do something like struct Numbers but you would have to refer to it as being of type struct Numbers all over the place which is a bit too verbose. typedef allows you to alias this instead so you only have to type Numbers for the variable type making it easier to work with if you're used to the way modern languages handle it. The format:

Numbers my_numbers2 = { .x = 1, .z = 3, .y = 2 };
Enter fullscreen mode Exit fullscreen mode

is actually an initialization method implemented in a later C standard (C99). It allows for assignment similar to how some languages handle keyword arguments. Member access on a basic level use the . notation you might see present in a number of other languages for things like object attribute access.

Pointers

Pass by value and pass by reference is a concept you may come across in programming languages. Pointers are essentially a pass by reference. Instead of passing in an entire value you simply hold a variable that points to the value you want in memory. This is primarily something that came out of the need to squeeze in performance in the early days of computing when memory, CPU, network throughput and disk space were not in the abundance they are today. Here is an example of a pointer declaration and instantiation:

#include <stdio.h>

int main(void)
{
        int x = 3;
        int * y = &x;

        printf("%d\n", *y);
        return 0;
}
Enter fullscreen mode Exit fullscreen mode

This will output the value which is pointed to y, which is x or 3. & is used to get the address location of a value which is where a pointer needs to point to. * is then used to de-reference the pointer and obtain the value at the memory address it's pointing to. Note that since pointers need to point to something, they cannot be assigned to something that has been instantiated in some form, so this won't work:

int *y = &3;
Enter fullscreen mode Exit fullscreen mode

Since 3 is just a number and doesn't have an address space assigned to it. Another interesting tie in with pointers is the ability to traverse arrays using pointer math. As an example:

#include <stdio.h>

int main(void)
{
        int my_numbers[] = { 1, 2, 3, 4 };

        printf("%d\n%d\n", *my_numbers, *(my_numbers + 1));
        return 0;
}
Enter fullscreen mode Exit fullscreen mode

This is because an array is in a sense a pointer to where the first item in a sequence of memory blocks occurs. So adding 1 to a pointer moves on to the next sequential memory block or next element in terms of an array. Unfortunately this creates an interesting predicament security wise:

int my_numbers[] = { 1, 2, 3, 4 };
printf("%d\n%d\n", *my_numbers, *(my_numbers + 7));
Enter fullscreen mode Exit fullscreen mode

Despite the fact that the array doesn't have 8 elements, my compiler is fine with letting the pass and the OS is fine running it. The danger here is that pointers give access to memory. A malicious actor could use pointer bugs in code to potentially read memory they shouldn't or put things into memory to be executed as malicious code.

Dynamic Memory

So what if you don't know the size of an array at runtime? The C library has a header called stdlib.h which contains functions to dynamically allocate such memory. It also lets you be able to resize memory. As an example:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
        int original_size = 4;
        int * my_numbers = (int*) calloc(original_size, sizeof(int));
        my_numbers[0] = 10;
        printf("%d\n", my_numbers[0]);

        my_numbers = realloc(my_numbers, sizeof(int) * (original_size + 4));
        my_numbers[7] = 23;
        printf("%d\n", my_numbers[7]);

        free(my_numbers);
        return 0;
}
Enter fullscreen mode Exit fullscreen mode

So first off is a call to calloc. This is different from malloc which initializes a memory block but doesn't set it to anything. The first argument is how many items, and the second argument is the size of each item in bytes. The (int*) is called a cast, which is essentially telling the system that the memory being allocated is for the purpose of a pointer to ints. Next is realloc:

my_numbers = realloc(my_numbers, sizeof(int) * (original_size + 4));
my_numbers[7] = 23;
printf("%d\n", my_numbers[7]);
Enter fullscreen mode Exit fullscreen mode

The first argument is the memory reserved by calloc and the next is how much to resize it by. Unlike calloc you have to do size multiplication. This gives us a resized array with 8 elements, allowing my_numbers[7] to work. Now for the most important part:

free(my_numbers);
Enter fullscreen mode Exit fullscreen mode

This frees the memory. For a small program like this it technically won't harm you as the OS will free up the allocated memory attached to the program. The issue is with long running programs like a web server for example. As long as the malloc or calloc pointed memory is not freed, it's unavailable to other applications. If you don't and it's part of some continual loop it will start to eat up system memory until none is left. This is what is known as a memory leak. That's why it's a best practice to always use free() for dynamic memory after being done with it.

Conclusion

So this is a somewhat general view of C. The explanation on pointers in particular was scaled down as that alone could fill up an entire article. So some things to take back from C (certainly not an all inclusive list):

  • Types being bound to byte sizes + having signed/unsigned allows for interesting performance optimizations
  • C alone does not have a lot of practical usage without C standard library to support it
  • Pointers and memory allocation, while powerful, can have disastrous consequences if not used properly
  • Variable initialization can get rather weird at times if working with pointers or dynamic memory
  • There's not a wide array of data types in C, so they would have to be hand crafted if needed (such as something like a hash table/map/dictionary)
  • C is very much based on restricted resources which doesn't map well to modern hardware (with the exception of embedded systems)
  • Due to how primitive C is it can take a lot more time to do simple tasks in modern languages such as communicating with JSON REST APIs

Knowing how C works, however, can help in understanding how easy it is to work with some more modern languages and potentially how they might have implemented things. In the next installment (which may take about a week or so to write up) I'll be looking at how C code gets compiled to help in understanding the code to running in your machine process.

Top comments (0)