DEV Community

Idan Arye
Idan Arye

Posted on

Don't initialize your variables

Developers coming from C know that variables should always be initialized. Not initializing your variables means they contain junk, and this can result in undefined behavior. For example:

#include<stdio.h>

int main(void) {
    char buffer[256];
    char answer;
    char* name;

    printf("Do you want to enter a name? [yn] ");
    answer = getchar();

    while (getchar() != '\n') { } // because we need CR for getchar but it doesn't read the CR...

    if (answer == 'y') {
        printf("Please enter name: ");
        name = fgets(buffer, 256, stdin);
        if (name == 0) {
            name = "<too long>";
        }
    } else if (answer == 'n') {
        name = "<user refused to enter name>";
    }

    printf("The name is %s\n", name);
    return 0;
}

If the user entered a character that is not y or n, not of the name = ...; statements will be executed, and name will still hold the same value it had when main started. What is that value? In release mode C, that would be whatever random data happened to be in that piece of memory name was assigned. And then we take that utterly random number and pass it to printf where it'll get printed as if it was a string pointer!

If we are lucky, we'll hit some illegal memory address and the OS will stop us. If we aren't it'll just go to some random place at memory and start printing whatever it encounters: passwords, credentials, application tokens...

And of course - this will not be reproducible. Because every time you run the program, there will be a different value at that place in memory and you'll get different results.

To avoid these problems, C developers have conditioned themselves to always initialize their variables. If you don't have something meaningful to put in the point of declaration - just put 0:

#include<stdio.h>

int main(void) {
    char buffer[256] = {};
    char answer = '\0';
    char* name = 0;

    printf("Do you want to enter a name? [yn] ");
    answer = getchar();

    while (getchar() != '\n') { } // because we need CR for getchar but it doesn't read the CR...

    if (answer == 'y') {
        printf("Please enter name: ");
        name = fgets(buffer, 256, stdin);
        if (name == 0) {
            name = "<too long>";
        }
    } else if (answer == 'n') {
        name = "<user refused to enter name>";
    }

    printf("The name is %s\n", name);
    return 0;
}

While null pointer dereference is still formally an undefined behavior, it is still much better than random pointer dereference because your operation system will probably make it s SEGFAULT - which is better than security leaks.

OK, but that's C. What about more modern languages?

There are two main reason this was so needed in C:

  1. Uninitialized variables having junk data.
  2. Inability to declare variables in the middle of a block.

More modern languages allow declaring variables in the middle of a block, so it is usually preferable to only declare the variable at the point where you have something meaningful to put in it.

This greatly reduces the cases where you have to initialize something with a default value - but does not prevent all of them. In our case, for example, name gets its value inside if branches - if we declared it there we wouldn't be able to use it after the if. Some languages (mostly the functional ones) have easy syntax solution, but in most mainstream languages you'd have to either extract it to a function or declare the variable outside the block.

When going with the latter solution, because C is such a common background, many developers will initialize the value. So if we convert our code to Java:

import java.util.Scanner;

public class Main {
    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);

        System.out.print("Do you want to enter a name? [yn] ");
        String answer = scanner.nextLine();

        String name = null;
        if ("y".equals(answer)) {
            System.out.print("Please enter name: ");
            name = scanner.nextLine();
        } else if ("n".equals(answer)) {
            name = "<user refused to enter name>";
        }

        System.out.printf("The name is %s\n", name);
    }
}

Sure, this is Java, a language with managed memory that will never allow undefined behavior from uninitialized variables, so we don't really need to initialize name to null, but better safe than sorry, right?

WRONG!

Java analyses code paths to make sure no variable can be used without being initialized first. So if we remove the initialization:

import java.util.Scanner;

public class Main {
    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);

        System.out.print("Do you want to enter a name? [yn] ");
        String answer = scanner.nextLine();

        String name;
        if ("y".equals(answer)) {
            System.out.print("Please enter name: ");
            name = scanner.nextLine();
        } else if ("n".equals(answer)) {
            name = "<user refused to enter name>";
        }

        System.out.printf("The name is %s\n", name);
    }
}

We'll get a compilation error:

$ javac Main.java 
Main.java:18: error: variable name might not have been initialized
        System.out.printf("The name is %s\n", name);
                                              ^
1 error

I just broke the compilation, but this is a good thing - the compiler found a bug! The same bug we had in the C version - what if the user enters something which isn't y or n. The Java compiler sees that there are three possible code paths that reach the last line but we are only initializing two of them.

To be able to compiler again, we must tell Java what to do in case the user gave an invalid answer. Failure is also an option - as long as we do it intentionally:

import java.util.Scanner;

public class Main {
    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);

        System.out.print("Do you want to enter a name? [yn] ");
        String answer = scanner.nextLine();

        String name;
        if ("y".equals(answer)) {
            System.out.print("Please enter name: ");
            name = scanner.nextLine();
        } else if ("n".equals(answer)) {
            name = "<user refused to enter name>";
        } else {
            System.err.printf("Illegal answer \"%s\". The only legal answers are \"y\" and \"n\".", answer);
            return;
        }

        System.out.printf("The name is %s\n", name);
    }
}

Now there are still three code paths, but in the third we return from the function early, before printing name. The Java compiler can determine that there are no code paths where name is used without being assigned a value first - and thus the compilation succeeds.

This is still initialization

Despite the clickbaity title, we do actually initialize name. We don't do on declaration, but we are initializing it nevertheless. This compiles:

import java.util.Scanner;

public class Main {
    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);

        System.out.print("Do you want to enter a name? [yn] ");
        String answer = scanner.nextLine();

        final String name;
        if ("y".equals(answer)) {
            System.out.print("Please enter name: ");
            name = scanner.nextLine();
        } else if ("n".equals(answer)) {
            name = "<user refused to enter name>";
        } else {
            System.err.printf("Illegal answer \"%s\". The only legal answers are \"y\" and \"n\".", answer);
            return;
        }

        System.out.printf("The name is %s\n", name);
    }
}

Wait - how? Didn't they teach us that you can't change the value of a final variable?

Well, yes, but we are not changing the value of any final variables here - we are just initializing it. Since name has never been assigned before in either of the paths that assign to it, these assignments are actually initializations - which are perfectly fine for final variables. It wouldn't have worked with final String name = null, but without the initialization on declaration it's fine, and even without the final name could be used in lambdas (provided they appeared after the first assignment).

Conclusion

Do initialize your variables - but don't always force a default value when you can't initialize them with a proper one. Know how your language behaves with uninitialized variables and pick the best strategy for uncovering bugs.

Top comments (5)

Collapse
 
voins profile image
Alexey Voinov

I'd say, that when you think you need some default value to initialise your variable, most probably what you need is a new method, that would return properly constructed value. It would most of the time result in a slightly more readable code. The same is valid for C. :)

Collapse
 
idanarye profile image
Idan Arye

I wouldn't get too dogmatic about it though. Extracting stuff to functions doesn't always improve the readability. As a rule of thumb, I find that it only increase the readability if you can find a good name for that function. If the name of the function is not simpler to understand than it's body, you will actually reduce the readability because now readers will have to derail their train of thought and look for the meaning of that function.

Also, it's not always possible. In C, for example, there is that rule of only being able to declare variables at the top of the block.

Consider, for example, this function:

int readOrDefault(char* filename, int default_) {
    FILE* file = fopen(filename, "r");
    int result = 0;
    if (NULL == file) {
        return default_;
    }
    if (1 != fscanf(file, "%d", &result)) {
        abort();
    }
    fclose(file);
    return result;
}

Even if we ignore the fact that we need to pass a pointer to fscanf instead of getting the result from it, we have the problem of the early return in case we couldn't open the file. We can only read the variable into result after that first if, but we can only declare result before that if. Extracting the initialization of result will not work here.

Collapse
 
voins profile image
Alexey Voinov

Yeah, you don't have to declare variables at the top of the block even in C, as was mentioned in another comment. And C99 is almost 20 years old now. :) And no, I never claimed, that it should always result in better code. That's way I always use a lot of 'maybe' or 'most probably'. It always leaves the space to back out. :)

But, that piece of code is a good challenge. I've just realized, that I never actually thought how should clean code look like in C. It's a good challenge, actually. I like it. I think I'll take a few days to think it over and then show my version of it (or admit, that it was impossible to write) :)

Collapse
 
txai profile image
Txai

Only a minor observation that, since C99 you can declare variables in any point in the function, not only at the top. You can even declare the variable inside the for clause

Collapse
 
idanarye profile image
Idan Arye

True, but old habits die hard.