DEV Community

Is "C Programming language" Still Worth Learning in 2021?

Arpit on July 28, 2020

C has been an evergreen language and played a prominent role for most of the system developments that took place in the last few decades. C program...

Read full post

Josef Richter • Jul 28 '20

It's worth learning because it teaches you the low-level concepts. Even if you never write any production code in C, it will make you a better developer in any other language.

Vlastimil Pospichal • Jul 28 '20

The same goes for Lisp or Forth.

Michel Renaud • Jul 28 '20

argh... I still remember trying to balance parentheses in university. That was a VERY long time ago!

Forth is one of those languages that, among others, I've been wanting to take a look at for many years. I remember reading in a computer magazine in the 80s about a home computer that had Forth as its built-in language instead of BASIC like all the others: en.wikipedia.org/wiki/Jupiter_Ace

Vlastimil Pospichal • Jul 29 '20

You need to define short functions and macros. The parentheses are then much more readable.

Benjamin Trent • Jul 28 '20

LISP! ❤️

Tony Smith • Jul 29 '20

“Let’s Insert Some Parentheses”!

Vlastimil Pospichal • Jul 29 '20

Compare the number of parentheses in the Lisp application and the same Java application. It's almost the same.

hidden_dude • Jul 29 '20

Lisp is great.. but I didn't learn any memory management concepts from it. Neither is it typically used to do hardware level tasks.

Come on, let's not get carried away here.

Vlastimil Pospichal • Jul 29 '20

Lisp uses a garbage collector. It is also used at the hardware level instead of the operating system.

hidden_dude • Jul 29 '20

yes.. I know that.. but when I want to learn about manual memory management, low level driver development and things of that nature.. Lisp is of little use.

Lisp is for learning about higher level constructs.

Emilie Ma • Jul 28 '20

100% this - been going through C for CS50x and it's definitely helped me appreciate some of the lower-level ideas.

Swastik Baranwal • Jul 28 '20

Of course, C is worth learning and it always will. It teaches you almost every topic which other languages will never.

pentacular • Jul 29 '20

For example? :)

I'm not aware of anything unique to C, and I know the language very well.

pentacular • Jul 28 '20

It's worth learning in order to fully appreciate the wonders of undefined behavior.

Consider the following C program -- what does it do?

#include <stdio.h>

int main() {
  int i = 1;
  printf("%d, %d", i++, i);
}

Andrew Harpin • Jul 28 '20

This is the same in any language, there are many ways to do something, not all of them are advised.

Understand the language, learn the best practices and fundamentally write decent code.

I realise this is easier said than done, but it is the golden principal we should be adhering in our products.

Sergiy Yevtushenko • Jul 29 '20 • Edited

learn the best practices

But keep in mind that they are not a dogma and should be broken if there is a significant reason for that.

Andrew Harpin • Jul 29 '20

Agreed, if you can justify the need to do something with a particular unconventional approach, then go for it.

BUT it must be well documented, with the emphasis on how it works and that changes must be carefully considered.

Sergiy Yevtushenko • Jul 29 '20

Again, it depends. For example, now I'm working on a personal project (it's not C but Java). Among goals of this project is the search for new style of writing code. I'm often rewrite code several times in order to make it easier to read and/or more reliable. When I get code which looks satisfactory I often discover that it violates one or more Sonar rules (i.e. "best practices"). In vast majority of cases the considerations behind those rules are no longer valid because whole approach is different. What I'm trying to say is that "best practices" is a set of compatible rules/guides/considerations and there might be more than one such a set.

#benaryorg • Jul 28 '20

This isn't quite "undefined behaviour", just weird syntax and one of those moments when you ought to know operator precedence and evaluation order, which is pretty much the same in every language (in some languages with dialects or multiple compilers it may just be more apparent).
Undefined behaviour would be something along the lines of:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <err.h>

int main(void)
{
    const size_t size = 1024*1024;

    char *data = malloc(size);
    if(!data) { err(1,"malloc"); } // replace with assert if you don't have err.h from libbsd

    memset(data,0,size); // write zeroes
    free(data);
    memset(data,0xff,size); // write ones

    return 0;
}

pentacular • Jul 29 '20 • Edited

It's undefined behavior of the case that "Between two sequence points, an object is modified more than once, or is modified and the prior value is read other than to determine the value to be stored."

This happens because there is no sequence point between the i++ and the i.

Precedence doesn't come into this.

Here's a more interesting variation on your example.

Can you spot the undefined behavior here? :)

int main() {
  char *data = malloc(1);
  if (data) {
    free(data++);
    data++;
  }
}

#benaryorg • Jul 29 '20

Ah, I see.
So C literally doesn't define any order on those instructions and it's up to the compiler?
Wouldn't have expected that, though I've seen the example a few times.
Excuse my hasty assumption then please.

First off, I'd really appreciate it if you specified the syntax in the code blocks so syntax highlighting kicks in ;-)
Something along the lines of that (without the backslashes, markdown dialect doesn't allow nested fences):

\`\`\`c
int main(void)
{
    return 0;
}
\`\`\`

Actually no I can't see the undefined behaviour in that example.
In all cases you're manipulating the pointer only if I see correctly, and since free takes the pointer by value and not reference, you'd end up with a copy of data before increment in the call, and move along the pointer twice afterwards, but in either case the pointer is invalid.
What am I missing?

pentacular • Jul 29 '20

Pointers are only well defined for null pointer values or when pointing into or one past the end of an allocated array.

The first increment satisfies this, since it happens before the free occurs.

After the free, the pointer value is undefined and so the second increment has undefined behavior.

#benaryorg • Aug 7 '20

But you're not actually using that pointer in the code, so I fail to see how that's undefined behaviour.
An invalid pointer which isn't used still doesn't cause any runtime issues, or is there something about that too in the standards?

pentacular • Aug 7 '20

The last increment of the pointer is when it has an undefined value, producing undefined behavior.

For example it might behave like a trap representation.

Regardless, the program cannot be reasoned about after this point. :)

Matthew Stokes • Jul 28 '20

Interesting. As far as I could read up because I didn't think it was either; in most cases the compiler will handle it as you expect, but it doesn't have to according to the spec which is why it is undefined?

There is no guarantee in the specification for c that the increment of i will be done when you use it as the third argument to printf(). So you could reasonably get 1, 1?

I may well have misunderstood though!

pentacular • Jul 29 '20

I think you're imagining that the operations occur in an unspecified order, as would be the case for

foo(a(), b());

There is a sequence point when a call is executed, so a(), and b() occur in some distinct, if unspecified, order.

The program will not have undefined behavior, but may have unspecified behavior (if it depends on the order of those calls), but we can continue to reason about the C Abstract Machine for both cases.

foo(i, i++);

There is no sequence point between i and i++, so they occur at the same time, leading to a violation of the C Abstract Machine, producing undefined behavior.

We cannot reason about the program from this point onward.

Matthew Stokes • Jul 28 '20 • Edited

Print 1 and then 2? Genuinely curious where is the undefined behaviour? :)

Matthew Stokes • Jul 28 '20

Ah, I see it now. There is no guarantee the increment will happen before the print. Only before the next sequence point!

pentacular • Jul 29 '20

The increment must happen before the print, as there is a sequence point between the evaluation of the arguments and the call.

But there are no sequence points between the evaluations of the arguments.

Leading to undefined behavior of the case that "Between two sequence points, an object is modified more than once, or is modified and the prior value is read other than to determine the value to be stored."

Matthew Stokes • Jul 29 '20

Thanks for clarifying! That makes more sense.

Vlastimil Pospichal • Jul 28 '20

This is one of the reasons I write prototypes and tests. I'll try it.

Comment deleted

pentacular • Sep 10 '20

C programs are understood in terms of the CAM (C Abstract Machine).

The compiler's job is to build a program that produces the same output as the CAM would for a given program.

The CAM says that a variable can only be read, or read-to-modify, once between two sequence points.

There are no sequence points between the i++ and i+1, so this produces a read/write conflict, which means that the program has undefined behavior in the CAM, and so the compiler can do whatever it wants.

It could crash, or print out 23, 37 or -9, 12, and these would all be equally correct behaviors.

pentacular • Jul 28 '20

In that case, I think you missed the point -- but I look forward to explaining why your results are wrong. :)

Vlastimil Pospichal • Jul 28 '20 • Edited

It's funny. First using i, then increment i, then use i as a second parameter.

The result is the same:

#include <stdio.h>

int main() {
  int i = 1;
  printf("%d, %d", i++, i+1);
}

pentacular • Jul 29 '20 • Edited

Your results are wrong. :)

They're wrong because they're showing how your implementation decided to implement this undefined behavior, this time, and don't reflect on how C works.

pypdeveloper • Jul 28 '20

I feel that learning C teaches you the real basics of programming.

John Colagioia (he/him) • Jul 28 '20

It's like anything in programming, or engineering, or probably literally any field, to me: You don't need to know C, just like you don't need to know how your car's engine works or the entire bus and train schedule. But knowing it will make a lot of things clearer and will eventually be useful, when you inevitably need to read some old piece of code you rely on to figure out why it's going wrong.

But by the same token, don't imagine that C is the "bottom of the stack." If you're looking at C to believe that you're now the stereotypical gray-bearded "real programmer," keep in mind that there's assembly language, microcode, computer architecture, fabrication, and some physics underneath that you need to explain some fringe aspects. And if all you need to do is make the logo a little bit greener, this may not be the time to dig deeper...

#benaryorg • Jul 28 '20

You don't need to know C, just like you don't need to know how your car's engine works […] will eventually be useful, when you inevitably need to read some old piece of code […]

This is an analogy I hear rather often, but I think it does miss a vital point.
It doesn't just help you when something does inevitably break.
The analogy I'd much prefer to hear from people is something along the lines of "when you know how your car's engine works, you might have a better understanding of the general handling of the car, for example if you knew what the oil in the motor was for exactly, would you care more or less about it?".
I haven't quite found the perfect words for such an analogy though.
What I mean is though, that having the basic understanding does often prevent issues before they arise, and they sometimes can be used to save hell of a lot of work even when no incident as such is involved at all.

Just a small complement to what you wrote, I agree to what you said either way.

John Colagioia (he/him) • Jul 29 '20

In my defense, you'll notice that I never referred to things breaking. You hit the nail on the head as to why: It's just not that simple.

#benaryorg • Jul 29 '20

After rereading the comment; yes, correct, however you're implying that some change is made or some code is read, which doesn't even need to be the case in my experience.
Often it's just useful to know how memory works (and why adding a character in the middle of a string is so terribly complex (or terribly expensive if you're lazy), even if it's not part of some C code at all.
The mindset of string manipulation in C is inherent to every language I know, but C is the only language that makes it painstakingly apparent as to why you can't just move half the string one byte further in memory to insert another character.
C's pointers also make it super easy to understand rope libraries, because it all boils down to "don't move, just store three pointers in a list¹ instead".

Basically knowing C isn't a requirement for any other programming language, but it gives you a natural understanding of some very hard problems, and how to solve them, no matter what language or runtime.

¹: one of the reasons why C is so terribly annoying in every-day use, and the reason I don't use it every day, is the absence of built-in and well supported collections, though there is some library/header file which does have collections (maps, sets, lists, double ended queues, etc.) implemented with a few macros, I am just unable to ever recall which library it was ;-;

Sergiy Yevtushenko • Jul 28 '20

C will help you understand a lot of low level concepts like details of memory layout or how parameters are passed to function, etc. But if I'd suggest to learn modern C++. It will give you the same level of knowledge of low level details, but also give you a extremely powerful and versatile tool to abstract out complexity when and how you would like to. Another interesting alternative might be D language.

Michel Renaud • Jul 28 '20

I agree about modern C++ giving you more, though if it's just to toy around with the low level concepts rather than write something "big" (that's relative), the barrier of entry for C is lower. Well, for me anyway. ;)

Sergiy Yevtushenko • Jul 28 '20

Yes, C definitely has lower entry barrier.

Jesse Phillips • Jul 28 '20

And there a number of articles written related to D.

#dlang

Benjamin Trent • Jul 28 '20

I think it is worth learning for pure pedagogy. The majority of developers will probably never use it directly. But, knowing system level programming (especially with this sharp a tool), will up your overall engineering game.

#benaryorg • Jul 28 '20

Working in a PHP-ish environment (I'm a sysadmin and I talk to a lot of programmers) I can attest that knowledge of how the underlying operating system works is very sparse with PHP devs, even though PHP is very close to C (up to the point of having very neat FFI, bindings for most libc functions, etc., as well as being written in C and originally having been a templating engine for the C language).

This causes some nerve-wracking tickets when you have to explain to a customer than, No, we're not capable of limiting the RAM usage of your cronjob (a PHP script) and running it for an extended time instead, because that is not how memory works.
Background being that the program used some not-quite perfect algorithm for something or other causing exponential memory usage, which in turn was just more than was available, OOM-ing the job every single time.
Yes, you can of course make that job run better, but unless you're planning on adding a terrabyte of swap (in which case exponential memory usage still means that it might break eventually), you'll have to remove some references to objects so they may be garbage collected (e.g. if your script does collect data into one huge PHP array but never removes elements even when they are not needed anymore).

This of course is common knowledge for anyone having ever written software in C.
Furthermore people who know about malloc() tend to also know about the caveats of it's return value; sometimes it returns NULL telling you there is no more memory, sometimes it will return a value and later on crash on access due to CoW, lazy allocation, virtual memory (just throwing out buzzwords here) which on Linux is tied to the sysctl vm.overcommit_memory

It's not that C itself being omnipresent is a reason people should have a look at it, rather than the concepts of computing being so hard-wired into the C programming language style, principles, pitfalls, and whatnot, that knowing C also makes your code better in other languages; for example knowing C you might think of using sendfile in Go which allows the kernel to take care of efficiently sending a file from your filesystem directly to your TCP connection.
Similar things apply to file accesses; people often say that on Linux everything is a file, which isn't really that correct (it'd be more correct on Plan9, but still), on Linux everything is an int, because every resource you work with tends to be an integer.
Your shared memory allocation is represented by its int magic, your TCP connection is a bidirectional socket represented as an int, your file is an int, your cwd is an int, and so on.
No wonder that it's hard for people who tend to work with other more abstract languages to comprehend that you can actually delete an open file on Unix-like systems (mostly talking about Linux here), which then is removed from the directory on your filesystem, but your fd is still open and usable, the file still writable and so on (Firefox uses that because it's the closest you can get to having an otherwise inaccessible, unleakable (automatically deleted upon close()), disk-backed storage).
The same goes for your shell sitting in a directory which is then moved from another shell, and all your calls suddenly fail very oddly, and you have no clue what the hell is wrong, but as soon as you cd "$PWD" everything is back to normal, because your shell's cwd is still just an int and not a string.
The vast amount of issues with mutable strings, the reason and issues of rope libraries and so on, they are all present in other languages, yet you only ever grasp the underlying problem if you know about memory management.

All of these are examples of why I think every sysadmin and at least one developer per team (dunno how your team structure is, what I'm trying to say is that with every pull/merge request one of the involved people) should know the basics of the C programming language at least to the point of memory management and file descriptors.
This is just a personal notion of course, and if you disagree, then that's fine.
I'm just saying that I'm tired of explaining to people that on Linux fork()s CoW mechanism is actually a really neat way of ensuring you can write your current process state consistently to disk like Redis does, as long as you know what you're doing (I remember another PHP dev complaining to me how dumb Redis is to fork() for that, because of the RAM requirements).

Sorry, this turned out to be half rant, half dumping some common pitfalls, but I hope you (the reader of this comment) now have a vague idea how many issues there are which are incomprehensible without some basic understanding of low-level concepts which the C programming language does incorporate in its very essence.