DEV Community

Martin Licht
Martin Licht

Posted on

Escape sequences in C

It is easy to write a string that contains alphanumeric characters. Non-alphanumeric characters must be encoded with an escape sequence. Let us review the details of escape sequences in the C standard.

Escape sequences are special character combinations within your C source code that appear within a string or when specifying a char value. For example:

char* str = "Hello World!\n";
char c = '\n';
Enter fullscreen mode Exit fullscreen mode

Escape sequences begin with a backslash (\) followed by one or more characters that specify the escape sequence. An escape sequence consists of more than one character in your source code, but each escape sequence is converted into a single character during compilation (assuming characters are encoded in a single byte). If the escape sequence is not recognized, then the compiler issues an error.

Let us review the escape sequences, categorized into a few groups.

Special Characters

To use the characters ', ", ?, and \, we must use the corresponding escape sequences \', \", \?, and \\.

Escape Sequence Meaning
\\ backslash
\? question mark
\' single quotation mark
\" double quotation mark

Evidently, we need an escape sequence for the backslash character since the backslash is already part of any escape sequence.

Example:

#include <stdio.h>
int main(){
    printf("Hello\? Is \"\\\\\\\" a string of three backslashes in \'C\'\?");
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

A Note on Line Continuation

The backslash character is also used to continue a string across lines. Specifically, any backslash that is immediately followed by a newline character in the source code is deleted. Effectively, this is used to connect source lines into one logical line. This happens before escape sequences are processed by the compiler.

char* str = "This string ends with \\
\"; // ends with a backslash
Enter fullscreen mode Exit fullscreen mode

Non-printable Characters

Many of these character sequences come from a time when terminals had limited display capabilities:

Escape Sequence Meaning
\a alert
\b backspace
\f form feed
\n newline
\r carriage return
\t horizontal tab
\v vertical tab

Let us summarize them with simple examples:

The escape sequence '\a' is supposed to produce an alert beep. This feature is less common on modern machines.

printf("This may trigger an alert beep!\a\n");
Enter fullscreen mode Exit fullscreen mode

Example using '\t' (horizontal tab):

printf("Column1\tColumn2\tColumn3\n");
Enter fullscreen mode Exit fullscreen mode

Example using '\b' (backspace):

printf("12345\b\b67\n"); // Output: 12367
Enter fullscreen mode Exit fullscreen mode

Example using '\r' (carriage return):

printf("Hello, world!\rBye!\n"); // Output: Bye!o, world!
Enter fullscreen mode Exit fullscreen mode

Example using '\v' (vertical tab):

printf("Line1\vLine2\n");
Enter fullscreen mode Exit fullscreen mode

Example using '\f' (form feed, rarely used today):

printf("First page\fSecond page\n");
Enter fullscreen mode Exit fullscreen mode

Numerical escape sequences

Escape sequences in string and character literals allow us to define characters directly using their numerical code value. This is done either in octal or hexadecimal form.

An escape sequence in octal form consists of \ followed by one, two, or three octal digits (0-7). The integer value determines the character that is put into that position. Notice that at most three octal digits are read, any digits after that (and any non-octal digit) is not part of the escape sequence.

int main(){
    printf("\101\102\103\n"); // prints ABZ
    printf("\17011\17111\17211\n"); // prints x11y11z11
    printf("\7789"); // prints A89  
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

Most importantly, this is the standard way to manually insert the null character manually into a string.

printf("Hallo\0Welt!\n"); // prints Hallo
Enter fullscreen mode Exit fullscreen mode

Similarly, hexadecimal escape sequences begin with \x followed by one or more hexadecimal digits (0-9, a-f, A-F). Any number of digits may follow \x, but whitespace or non-hex characters terminate the sequence. Unlike octal escape sequences, hexadral escape sequences can have an arbitrary number of digits.

printf("\x41\x42\x43\n\x41G\n"); // prints: ABC followed by AG
printf("\xAa\n"); // Mixed uppercase/lowercase is possible
Enter fullscreen mode Exit fullscreen mode

In practice, hexahedral digits for one 8-bit characters are sufficient for any English text. More hexahedral digits are meaningful if the string character type uses more than one byte.

A technical detail is the fact that numerical escape sequences in char constants are always interpreted as unsigned chars. If the numerical value of the escape sequence does not fit into the character type, then an overflow occurs. The compiler may also emit a warning. The result of that overflow is used to determine the character of the escape sequence.

char c = '\777'; // 511 mod 256 = 255, value stored is 255
printf("%d\n", (unsigned char)c); // prints: 255
Enter fullscreen mode Exit fullscreen mode

Technically, numerical escape sequences within a string are parsed as an int, and the result is cast into a character. The translation of escape sequences and overflow management depends on the implementation and the type of characters in the string.

printf("\x4142\n"); // 0x4142 = 16706. The result depends on the implementation
Enter fullscreen mode Exit fullscreen mode

Different string types and glyph encodings shall be a topic for another blog post.

Top comments (0)