Yegane Golipour

Posted on Jun 9

My First Shell Project in C: The Messy Truth About What Worked and What Failed

#programming #webdev #c #bash

Introduction
Project Overview
Working Features
What Went Wrong
Still Not Working
What I’d Like to Add Next
What I Learned
Closing
References

Introduction

This shell is actually my first project after learning C programming. I’ve always been obsessed with how system calls work. I wanted to dive deep into kernel development, but I realized it would be a real pain if I didn’t understand system calls and signals first. So my next option was to build a mini-shell—and let me tell you, implementing some parts was much harder than I thought (╥‸╥). In the end, though, I had a great time making this messy shell (˶˃ ᵕ ˂˶).

This post covers the features I implemented, the failures I encountered, and how the architecture of the shell works. It also explains some parts that were really hard for me to grasp at first. Overall, this is what I’ve learned so far. (You can check out my GitHub to see the full code.)

Project Overview

What it does now:
- Runs external commands like ls, echo, grep, …
- Runs built-in commands (which don’t require fork), such as cd, pwd, …
- Supports job control (so you can use fg, bg, jobs)
- Handles pipelines (e.g., ls | grep foo)
Technologies used:
- Linux system calls: fork, exec, pipe, signal
- Debugging: gdb, valgrind
- Build automation: Makefile

Below are a few GIFs/screenshots illustrating key features:

Flow of my shell:

Working Features

These features work as intended:

Tokenizer
- Handles both single quotes (') and double quotes (")
- Double quotes honor escape characters (e.g., ", ')
Expander
- Expands $VARIABLE, $?, and $$ in both external and built-in commands
External commands
- Executes commands like ls, echo, grep, …
Built-in commands
- cd
- help
- exit
- export
- unset
- pwd
- bg
- fg
- jobs
Error handling for all phases
Simple Redirections with >, <, >>
Job control
- Supports background and foreground jobs
Pipelines

What Went Wrong

To be honest? So many things went wrong. For example, I spent two days on pipes just to learn that you need to close pipe ends in the parent as well as in the children. Below are the concepts that were hardest for me, and where I wasted the most time.

Race Condition in `setpgid`

I had the idea (from blog posts and books) that the first child in a pipeline should become the process‐group leader, and that its PID would be the PGID for the rest of the pipeline. The problem was that the other children don’t automatically know the leader’s PID after forking.One dumb idea I had was to use pipes to synchronize parent and children. but that extra complexity wasn’t necessary.

In fact, you can just call setpgid in the parent immediately after forking each child. (It took me a while to finally find out that honestly.) Since children inherit the updated PGID from the parent’s variable, they automatically join the correct group. I eventually ended up with code like this:

for (proc = job->first_process, proc_num = 0; proc; proc = proc->next, proc_num++) {
    cmd = proc->cmd;
    pid_t pid = fork();

    if (pid < 0) {
        perror("fork failed");
        // handle error…
        return -1;
    }

    if (pid == 0) {
        // In child:
        if (setpgid(0, pgid) < 0) {
            perror("child: setpgid failed");
            exit(EXIT_FAILURE);
        }
        // ... exec_command(cmd) ...
        exec_command(cmd);
        perror("execve failed");
        exit(EXIT_FAILURE);
    }

    // In parent:
    if (proc_num == 0) {
        pgid = pid;
        job->pgid = pgid;
    }
    proc->pid = pid;
    if (setpgid(pid, pgid) < 0 && errno != EACCES && errno != EINVAL) {
        perror("parent: setpgid failed");
    }
    // ...
}

So I understood that we should call setpgid for the first child and all subsequent children in the parent immediately, so each child inherits the correct PGID without extra synchronization.

Using `tcsetpgrp` to Pass Control of the Terminal

I was confused because the “GNU documentation” suggested that you should call tcsetpgrp in child too (maybe I misunderstood it? ). After digging around (and asking ChatGPT), I learned that’s outdated: you only need to call tcsetpgrp in the parent to give the terminal to the child’s process group. Calling it in each child is unnecessary and could even lead to race conditions.

Deadlock in Pipes

It took me way too long to realize that closing unused pipe ends is not optional. If a process doesn’t close the write‐end when it’s done writing, the reading side never sees EOF and hangs forever. Likewise, if the writer doesn’t close its read end, it can still write even when no one is reading. Eventually, the pipe buffer fills up and the writer blocks indefinitely.

I was also surprised that children inherit open file descriptors from the parent. That means you have to close unused pipe ends both in the parent and in each child.

These are the steps I took to implement pipeline feature (Based on the references in Reference section):

first we need to know how many pipes we need. If we have N processes then we need N - 1 pipes to implement the communications between the processes.

All processes cannot write to and read from the same pipe, because there would be a race situation and only one process would be able to write or read.

the pipes should be created in the parent process, because if we create them in children after fork, then after the child exits the pipe is gone too. also based on the design of pipes, the processes communicating over pipes should have a common ancestor, which in our case is the parent.

every child inherits the open file descriptors from its parent. and after creating pipes the file descriptors are open so the parent and the children MUST close the unused fds.

In each child we need to dup2 the pipe's fds and the STDOUT and STDIN file descriptors.

then the children need to close the unused file descriptors.

Example outline in C:

/* IN CHILD */

// FOR STDIN
if (proc_num > 0) {
    dup2(pipes[proc_num - 1][0], STDIN_FILENO);
    close(pipes[proc_num - 1][0]);
}

// FOR STDOUT
if (proc_num < num_procs - 1) {
    dup2(pipes[proc_num][1], STDOUT_FILENO);
    close(pipes[proc_num][1]);
}

// CLOSE ALL PIPE ENDS
for (int i = 0; i < num_procs - 1; i++) {
    close(pipes[i][0]);
    close(pipes[i][1]);
}

/* IN PARENT */

// CLOSE THE PIPES THE PARENT DON'T NEED
if (proc_num > 0)
    close(pipes[proc_num - 1][0]);
if (proc_num < num_procs - 1)
    close(pipes[proc_num][1]);


/* AFTER FINISHING ALL FORK */

for (int j = 0; j < num_procs - 1; j++) {
    close(pipes[j][0]);
    close(pipes[j][1]);
}

This is from The Linux Programming Interface:

Also, it was surprising to me that child processes inherit open pipe ends from the parent, so the parent also needs to close the unused pipe ends.

Why Concurrency Is Key in Pipes

At first, I thought: “Why can’t the parent just wait() for each child in sequence?” I considered two scenarios:

A. When data < PIPE_BUF capacity

If you’re lucky, the pipe never fills, so the child writing to the pipe finishes quickly, then the next child reads, etc. It “works,” but it’s slow—processes aren’t running concurrently, so performance sucks.

B. When data ≥ PIPE_BUF capacity

The writer writes until the pipe buffer is full. Since no reader is consuming yet (because the parent is busy waiting on the previous child), the writer blocks permanently. Deadlock. If the reader ever closes the read end, the writer gets SIGPIPE and crashes.

So the whole point of pipes is that the data flows between processes. as soon as the data is available the other process must try to read it. the write end should not be blocked for a long time.

Signal Handling

In my opinion, signal handling is the hardest part of building a shell—especially if you’re new to how they work.

Blocking signals in the parent while setting up the process group: When you fork children, you don’t want terminal‐generated signals (like SIGINT from Ctrl+C) to hit the shell (parent) before it gives control of the terminal to the child’s process group. So you must block signals until each child’s PGID is set and terminal control is passed.
SIGCHLD: You need to block SIGCHLD in the parent until you finish setting each child’s process group. Imagine a child executes and exits so fast that the parent hasn’t yet called setpgid(child_pid, child_pid). So when we have a SIGCHLD handler and this handler reaps children, then the child that has finished its execution gets reaped and shell continues and tries to setpgid(pid, pid). This fails because there is not such child, so parent fails with ESRCH (no such child).. So block SIGCHLD until after setpgid.

Still Not Working

Background job notifications sometimes fail Occasionally, background jobs finish but don’t send a notification. The shell only updates job status when a prompt is printed or when you run jobs. Yet sometimes jobs finish and never show up in jobs. I still haven’t tracked down the exact cause. (Any tips here are greatly appreciated.)

What I’d Like to Add Next

These are features I plan to implement:

Command history (so users can press ↑/↓ to cycle through past commands)
Tab completion
Heredoc support (e.g., cat <<EOF … EOF)

I’m also working on unit and integration tests. Additionally, I’d like to research optimized data structures for job control instead of my current linked lists + hash tables.

What I Learned

Building a mini-shell feels like a “basic” exercise, but for me it was a real challenge. Especially since I was new to system calls and still learning C (unfortunately, I’m still nowhere near an expert (─ ‿ ─) ).

Tokenization is much harder than I thought. There are so many edge cases: quoted strings, escaped characters, variables expansion. Halfway through, you realize there are dozens of cases you didn’t consider initially.
Signals are monsters. Getting signal blocking/unblocking, SIGCHLD, and SIGINT/SIGTSTP behavior right is a nightmare to debug without experience.
Debugging in C without GDB and Valgrind is total pain. Whenever I tried to “printf” my way through pointers and memory errors, I wasted hours.
I didn’t know built-in commands shouldn’t create job entries. I only discovered that while implementing jobs—and yes, I used ChatGPT to figure it out. (I know I shouldn’t rely on AI, but I was completely stuck.)

Closing

If you’ve written a shell or worked with system-level C, I’d really appreciate your feedback on where my understanding is flawed or where I could’ve implemented things differently. Feel free to suggest resources or projects I should tackle next because I’m still learning ( ◡̀_◡́)ᕤ, and every bit of advice helps.
(I update this post when I implement something new.)

References

I used a lot of resources—blogs, Reddit posts, Stack Overflow threads (some links I lost track of), but these were my main references:

The Linux Programming Interface by Michael Kerrisk
Advanced Programming in the UNIX Environment (2nd ed.) by W. Richard Stevens & Stephen A. Rago
Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau & Andrea C. Arpaci-Dusseau
The GNU C Library Reference Manual

Some Reddit/Stack Overflow threads I saved:

Top comments (1)

Jafar Esmaili • Jul 18

Thank you for sharing references and idea

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

DEV Community

My First Shell Project in C: The Messy Truth About What Worked and What Failed

Table Of Contents

Introduction

Project Overview

Working Features