How Unix commands works

If after using these commands for years you ever wondered how they work, this is a high level exploration of someone who just had the some thoughts some days ago, this is the result of my exploration, I hope you learn something from it and get more interested in OS development in general, first of all, a little bit of history, all theses programs are originally part of the gnu os, the first atempt to create a free os, they were written in the mid eighties and are older than Linux itself, they were developed for unix than ported to Linux and are posix compliant, ls in particular is part of coreutils, a separate module of the gnu os, you can see coreutils source code here, Most of gnu os was developed by Richard Stallman, the original neckbeard

GNU/Coreutils is used by most Linux distribution, and you can even use it on Mac OS and Windows, in Mac Os you can even replace the propietary coreutils. As you might know, Linux is not a OS, Linux is just a kernel, you need higher level programs to use your system, that's why people sometimes refer to Linux as GNU/Linux, because it's the kernel with the other utilities necessary to realistically use the system in your day to day activities, otherwise you would have to implement everything yourself. I'm gonna focus on how ls is implemented on top of the kernel, if you wanna know what happens right after you type ls in your terminal you should read this amazing explanation.

How ls works

You can see ls source code here,
this blog posts focuses on explaining ls, but all other core utilities work in a similar fashion, let's get to it.

the core functionality of ls are actived using two lower level functions opendir and readdir, opendir opens a directory, which is really just a file in Linux like everything else and readdir reads this file line by line, each line as you might have guessed is the directory contents, plus . and .. which is used by the system to reference this directory and where this directory is, if you search for these functions in src/ls.c you would find this

...
DIR *dirp;
struct dirent *next;
dirp = opendir (name); // line 2918
...

for opendir

...
while (1)
    {

      next = readdir (dirp); // line 2988
      if (next)
...

for readdir

As you might have guessed this while loop is ls's core functionality, it's the part that reads the file entries in the dir and prints them out, or do whatever it needs to do depending on the options that you pass to ls, all other stuff in the source code is error handling, parsing and applying options, ls has a ton a options that accumulated over the years, if we remove all these options we could implement ls with just a few lines of code, so let's do that

#include <stdio.h>
#include <dirent.h>

int main()
{
    DIR *folder;
    struct dirent *entry;
    folder = opendir(".");
    if(folder == NULL)
    {
      return(1);
    }
    while( (entry=readdir(folder)) )
    {
        printf("%s ",
                entry->d_name
              );
    }
    printf("\n");
    closedir(folder);
    return(0);
}

if we compile and run our ls version, we get this

as you can see the output it's pretty similar to gnu ls already, the only difference in this case is that ls is color coding different file types in my system and ls by default don't display . and ...

Ok, now we have a pretty good understanding of how ls is implemented and the main libraries that it uses("stdio.h" and "dirent.h"), but where does these libraries come from? we need to dig deeper.

glibc and system calls

stdio.h and dirent.h and pretty much anyother c library installed in your system lives in /usr/include, stdio and dirent in particular are installed when you install glibc, yet another gnu project, printf comes from stdio, opendir and readdir comes from dirent, just like ls these functions are abstraction to make it easier to do the things that you want to do in your system, actually everything is a abstraction all the way down

ls is a application that uses glibc, glibc itself make some system calls

as you can see in the graph we don't necessarily need to use glibc to implement ls we could make the system calls ourselves, let's do that!

#include <linux/fs.h>

...
 u8 Buff [50];
 int fd;
 memset (Buff, 0x00, sizeof (Buff));
 mm_segment_t old_fs = get_fs ();
 set_fs (KERNEL_DS);
 fd = sys_open ("/etc/Info", O_RDONLY, 0);
 if (fd> = 0)
 {
  sys_read (fd, Buff, 50);
  printk ("string:% s/n", Buff);
  sys_close (fd);
 }
 set_fs (old_fs);

Kernel level and User level

The ls clone that we wrote is running on user level, all the functions that we used, opendir, readdir and printf are user level functions, a lot of things are restricted at user level, functions at user level call other functions at kernel level, this is what we call a "system call" so the function opendir for example eventually will call sys_open, which is a system call that it self calls do_sys_open which is a kernel level function, the kernel is actually what comunicates with devices and the cpu

The kernel is a program that is always in memory and facilitates the comunication between the hardware and applications, one of these utilities is the functions that we just used, we could go deeper and explore how do_sys_open is implemented at kernel level, but that's propably another post

As you can see it's a lot harder to understand the sys_call version and you do very little with a lot of code, as you get closer to the kernel you have to deal with internal calls of the system, that's the reason why the kernel exists, it's a abstraction on top of the machine so you don't have to deal with all this low level stuff, you can see in the code that we have to make a lot of different system calls just to read a directory, if you wanna go deeper and learn even more about how linux works I recommend this book this book I'm reading it and it's pretty good

DEV Community

How Unix commands works

How ls works

glibc and system calls

Kernel level and User level

Top comments (0)

Read next

Figma Tokens Tailwind

Gatsby Tailwind

TS1243: '{0}' modifier cannot be used with '{1}' modifier

TS1247: A type literal property cannot have an initializer