The cost of sync_with_stdio

#cpp #performance

In C++ there's many ways to do a thing. Reading a file contents into a std::string is not an exception.
Let's see how different solutions compare to each other.

You can find the code at my Github repo.

All implementations are reading 98M file filled with zeros. All solutions are compiled as C++ with GCC on a Linux box. They are compiled in two different modes: synced and unsynced. The difference between them is a single line:

std::ios_base::sync_with_stdio(false);

This disables synchronisation between iostreams and C IO functions. This obviously doesn't have any effect on solutions using C IO functions. synced versions don't have this call, so all iostream calls are synchronised.

Reading from file means that program gets file path as its argument and it opens the file, reads contents and closes this file.
Reading from stdin means the program reads contents from its standard input, which is redirected from the file at the shell level.

Notice how C++ synced versions are slower when reading a file using shell indirection instead of directly from a file.

Note that in all cases, the underlying read in performed in chunks, as shown in strace output, so this is not source of the slowdown.

$ strace -e read ./istreambuf_iterator_sync input
[...]
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8191) = 8191
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8191) = 8191
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8191) = 8191
[...]

$ strace -e read ./istreambuf_iterator_sync < input
[...]
read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[...]

There is a difference when it comes to buffer size used to do reads, but that wouldn't cause such massive time differences.

Now let's see if we get similar results when compiled on clang when using its libcpp standard library.

The difference between synced and unsynced is not there. Reading from stdin using iostreams is always slower then reading from a file.

Note that ignore-seekg and seekg-tellg failed to execute when compiled with clang/libc++. It seems that seeking stdin is not valid for some reason.

I think it should work correctly, because when doing redirection, stdin points directly to a file, just as if it was opened directly in application. I would expect it to fail when trying to call seek on stdin that is bound to a pipe, but not it this case.

Let's compare call hierarchy of one solution using iostreams for these two cases:

Reading directly from file:

Reading from stdin which is redirected from a file:

For the second case, there are 205 millions calls to underflow function, which in turn does 205 millions calls to ungetc and getc. These functions are meant for accessing single characters. This means that for each and every character in the input there's a function call to getc and ungetc. This doesn't happen for the first case.

This basically means that when synchronisation with C IO is enabled iostreams are always reading a single character at a time. This allows these two different APIs to access the same underlying file descriptor.

And this is the source of the time differences when reading the file between the synced and unsynced versions.