I recently had to write a piece of C code that takes some input from stdin, ignores the newline, discards whatever exceeds the buffer, and does this repeatedly in a loop.
Knowing something about scanf syntax (man) I came up with this:
#include <stdio.h>
void take_input(void)
{
char buf[20];
printf("> ");
scanf("%19[^\n]", buf);
printf("input=`%s`\n", buf);
}
int main(void)
{
for (int i = 0; i < 5; i++) {
take_input();
}
return 0;
}
(Note: The original code used an infinite loop, but a simple for is enough to demonstrate the behavior.)
When I ran it, the result was surprising:
$ gcc -o main main.c
$ ./main
> hello world↵
input=`hello world`
> input=`hello world`
> input=`hello world`
> input=`hello world`
> input=`hello world`
$ █
It consumed the string once and printed the same value 5 times. Why?
1. The Stack (Ghost Data)
Stack behavior explains the first confusing part. Although char buf[20] is a locally scoped variable, in this particular case, each invocation of take_input() ends up using the exact same stack memory address.
Printint the address makes this undeniable:
/* ... */
printf("address=%p, input=`%s`\n", (void*)buf, buf);
/* ... */
$ ./main
> hello world
address=0x7ffcb96c49d0, input=`hello world`
> address=0x7ffcb96c49d0, input=`hello world`
> address=0x7ffcb96c49d0, input=`hello world`
<...>
Because the buffer wasn't initialized, and because subsequent scanf calls failed to overwrite it, the old content (the "ghost data") remained unchanged. Initializing the buffer with zeroes fixes the specific issue:
/* ... */
char buf[20] = {0};
/* ... */
Now the buffer resets each time, but scanf still behaves oddly:
$ ./main
> hello world
address=0x7ffea5c2d840, input=`hello world`
> address=0x7ffea5c2d840, input=``
> address=0x7ffea5c2d840, input=``
<...>
2. Scanf, Scansets and Stdin
The real enemy here is the scanset %19[^\n]. It reads up to 19 characters (len(buf) - 1) that are not a newline, but it does not consume the newline itself.
Let's open gdb and inspect stdin to see the wreckage:
$ gcc -g3 -O0 main.c -o main
$ gdb main
(gdb) break 8
Breakpoint 1 at 0x11d3: file main.c, line 8.
(gdb) run
<...>
> hello world↵
<...>
(gdb) p *stdin
$1 = {
_flags = -72539512,
_IO_read_ptr = 0x55555555972b "\n",
_IO_read_end = 0x55555555972c "",
_IO_read_base = 0x555555559720 "hello world\n",
<...>
}
stdin->_IO_read_ptr, the current position of input, still points to the newline. It was not consumed by scanf; it is still sitting in stdin, waiting to be digested.
When the loop runs again, scanf("%19[^\n]", buf) sees the newline immediately, matches zero characters, and aborts. The buffer stays empty (or zeroed), and the loop spins.
Unlike Python's expression input()[:19], which consumes the newline and truncates the string cleanly, C's scanf cannot express that behavior using a single scanset. I found three workable solutions.
Solution A: The "Double Tap" (Extra scanf)
We can use extra scanf calls to force the consumption of the remainder of the line and the newline:
void take_input(void)
{
char buf[20];
printf("> ");
scanf("%19[^\n]", buf);
/* discard the rest of line, if it's >19 chars */
scanf("%*[^\n]");
/* discard the newline */
scanf("%*c");
printf("input=`%s`\n", buf);
}
It looks messy, but it works reliably for whitespace, short inputs, and long truncated inputs:
$ ./main
> hey
input=`hey`
> hello world
input=`hello world`
> a b c d e f g h i j k l m o p q r s t u v w x y z
input=`a b c d e f g h i j`
> ␣␣␣␣␣ /* 5 whitespaces */
input=` `
> abcdefghijklmnpoqrstuvwxyz
input=`abcdefghijklmnpoqrs`
Solution B: fgets (The Standard Way)
A more predictable and explicit approach is fgets:
#include <string.h>
void take_input_2(void)
{
char buf[20];
printf("> ");
if (fgets(buf, sizeof(buf), stdin)) {
char *newline_ptr = strchr(buf, '\n');
if (newline_ptr) {
/* replace \n with \0 to trim it */
*newline_ptr = '\0';
} else {
/* no newline found, meaning the input
was truncated; we must consume
the rest of the line from stdin */
int c;
while ((c = getchar()) != '\n' && c != EOF);
}
}
printf("input=`%s`\n", buf);
}
Solution C: getline (The Heap Way)
There is a handy getline function (man), which is not in ISO C but exists in POSIX. The difference is that getline scans the whole line and allocates the buffer on the heap. You don't have to care about buffer boundaries anymore, but you do need to manually free the memory:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void take_input_3(void)
{
/* initialize the pointer with NULL */
char *line = NULL;
/* getline will return the updated capacity of `line`
- in practice it grows dynamically */
size_t cap = 0;
ssize_t n;
printf("> ");
/* n contains the actual length of the input line,
or -1 on failure or EOF */
n = getline(&line, &cap, stdin);
if (n > 0) {
/* remove newline */
if (line[n-1] == '\n') {
line[n-1] = '\0';
}
printf("input=`%s`\n", line);
}
free(line); /* NB: always free the heap memory */
}
Outcomes
In the end, the exercise became a small reminder of how surprisingly tricky input handling in C can be. scanf looks convenient until you hit edge cases around whitespace and newlines. At that point, "convenience" becomes a liability.
The takeaway is simple: know your tools. If you need predictable, portable, line-oriented input with truncation handling, fgets is almost always the better choice. If you have the luxury of POSIX and heap allocation, getline offers freedom. But scanf? Save that for when you know exactly what the input looks like.
Top comments (2)
The problem with
getlineis that malicious input could send a very long line and crash your program due to running out of memory.The only safe way is to use
fgetsand check the last character: if it's\n, the line fit; else it didn't and you can either (a) complain and error-out or (b) read the input, character by character, until you find a newline to flush the line since the rest of the input is still on the input stream — but this assumes malicious input even has a newline, so you might not crash, but could be looping forever. You could also check the number of characters read and finally exit in defeat if you've exceeded some arbitrarily large number.There's no perfect solution when you can't trust your input (which you never should).
Solid point regarding the OOM/DoS risk, I appreciate the rigorous audit. I'll update the article to reflect this trade-off.