DEV Community

Nicola Apicella
Nicola Apicella

Posted on

Linux terminals, tty, pty and shell

This is the first of two articles about Linux terminals. By the end of the two articles, we should be able to:

  • describe the main components in the terminal subsystem
  • know the difference between TTY, PTY and Shell
  • answer what happens when we press a key in a Terminal (like Xterm, etc.)
  • build a simple remote terminal application using golang

Or at least I hope so :)

What's a terminal?

Generally speaking a terminal is a relatively dumb electromechanical device with an input interface (like a keyboard) and an output interface (like a display or sheet of paper).

It is connected to another device (like a computer) via two logical channels, and all it does is:

  • send the keystrokes down the first line
  • read from the second line and print them on a sheet of paper

The commercial name for this type of device is teletypewriter, Teletype or TTY (remember this word as it will come up
again later). The machines would provide a user interface to early mainframe computers and minicomputers, sending typed data to the computer and printing the response.

Teletypewriter
By ArnoldReinhold - Own work, CC BY-SA 3.0, Link

To understand how a modern Terminal works we need to dwell just a bit on how teletypes used to work.
Each machine is connected via two cables: one to send instructions to the computer and one to receive output from the computer.
These cables are connected to the computer through a serial cable plugged into a Universal Asynchronous Receiver and Transmitter (UART).

The computer has an UART driver to read for the hardware device.
The sequence of characters is passed to TTY driver which applies the line discipline. The line discipline is in charge of converting special characters (like end of line, backspaces), and echoing what has been received back to the teletype, so that the user can see what it has been typed (line disciplines will be discussed in the next post of the series).
It is also responsible to buffer the characters.
When enter is pressed, the buffered data is passed to the foreground process for the session associated with the TTY. As a user, you can execute several processes in parallel, but only interact with one at a time, letting the others working (or waiting) in the background.

The whole stack as defined above is called a TTY device.
The foreground process is a computer program called Shell.

Gotcha: The words terminal and TTY device are basically interchangeable as they mean the same thing

What's a shell?

Shells are user space applications that use the kernel API in just the same way as it is used by other application programs. A shell manages the user–system interaction by prompting users for input, interpreting their input, and then handling an output from the underlying operating system (much like a read–eval–print loop, REPL).
For example, if the input is 'cat file | grep hello', bash will interpret that and figure it needs to run the program cat passing 'file' as parameter and pipe the output to grep.

It also controls programs execution (feature called job control): kills them (CTRL + C), suspends them (CTRL + Z), sets them to run in the foreground (fg) or in the background (bg).

They can also run in non interactive mode, via script which contains a sequence of commands.

Bash, Zsh, Fish and sh are all different flavors of shells.

What's a terminal emulator?

Let's move to more recent times. Computers started becoming smaller and smaller, with everything packed in one single box.
For the first time the terminal was not a physical device connected via UART to the computer. The terminal became a computer program in the kernel which would send characters directly to the TTY driver, read from it and print to the screen.

That is, the kernel program would emulate the physical terminal device, thus the name terminal emulator.
Note that, although emulated they were and are still called Teletypes.

Don't get fooled by the word emulator. A terminal emulator is as dumb as the physical terminals used to be, it listens for events coming from the keyboard and sends it down to the driver. The difference is that there is no physical device or cable which is connected to the TTY driver.

How do I see a terminal emulated TTY?

If you run a Linux OS on your machine press Ctrl+Alt+F1. You'll get a TTY emulated by the kernel! You can get other TTYs by pressing Ctrl+Alt with the function keys from (F2 to F6). By pressing Ctrl+Alt+F7 you'll get back to the GUI (X session).

Let's recap the main concepts so far:

  • Terminal and TTY can be used interchangeably
  • Teletypes (TTY) is physical electromechanical originally designed for telegraphy, then adapted to send input and get output from mainframes
  • A Teletype can be emulated by a computer program running as a module in the kernel

What's a pseudo terminal (PTY)?

It's Teletype emulated by a computer program running in the user land.
Compare that with a TTY: the difference is where the program runs; it's not a kernel program but one that runs in the user land.

I won't (and probably couldn't) give a complete description of kernel vs user land, I am just going to say that stuff which runs in the kernel have access to a privileged mode. That allows the kernel to access the hardware of the machine.
Programs in user land instead interact only with the kernel, not directly with the hardware.
If something goes wrong with a kernel module, all the system might be compromised, whereas if something goes wrong with a program in user land, only that program is impacted; in the worst case a reboot will bring the system back to normal.
This is definitively a good argument to move terminal emulation in user land. It is easier for developers to build one.

I guess the main reason why PTY exists is to facilitate moving the terminal emulation into user land, while still keeping the TTY subsystem (session management and line discipline) intact.

How does it work? (High level)

Terminal emulator (or any other program) can ask the kernel for a pair of characters files (called PTY master and PTY slave).
On the master side you have the terminal emulator, while on the slave side you have a Shell.
Between master and slave sits the TTY driver (line discipline, session management, etc.) which copies stuff from/to PTY master and slave.

Let's see what happens when...

you type something in a terminal emulator in the user land like XTerm or any any other application you use to get a terminal.

Usually we say we open 'the terminal' or we open 'a bash', but what it actually happens is:

  • a GUI which emulates the terminal starts (like the Terminal or Xterm UI application).
  • it draws the UI to the video and requests a pty from the OS.
  • launches bash as subprocess
  • The std input, output and error of the bash will be set to be the pty slave.
  • XTerm listens for keyboard events and sends the characters to the pty master
  • The line discipline gets the character and buffers them. It copies them to the slave only when you press enter. It also writes back its input to the master (echoing back). Remember the terminal is dumb, it will only show stuff on the screen if it comes from the pty master. Thus, the line discipline echoes back the character so that the terminal can draw it on the video, allowing you to see what you just typed.
  • When you press enter, the TTY driver (it's 'just' a kernel module) takes care of copying the buffered data to the pty slave
  • bash (which was waiting for input on standard input) finally reads the characters (for example 'ls -la'). Again, remember that bash standard input is set to be the PTY slave.
  • At this points bash interprets the character and figures it needs to run 'ls'
  • It forks the process and runs 'ls' in it. The forked process will have the same stdin, stdout and stderr used by bash, which is the PTY slave.
  • ls runs and prints to standard output (once again, this is the pty slave)
  • the tty driver copies the characters to the master(no, the line discipline does not intervene on the way back)
  • XTerm reads in a loop the bytes from the pty master and redraws the UI

I think we made it! That's roughly what happens when we run a command in a terminal emulator. The drawing should help consolidate the workflow:

PTY Master and Slave

As an experiment, run the 'ps' command in new instance of a terminal.
If you did not run any process yet, you will see that the only two programs associated with the terminal are 'ps' and 'bash'.
'ps' is the program we just started and 'bash' was started by the terminal. 'pts/0' which you see in the results is the PTY slave we talked about.

> ps
  PID TTY          TIME CMD
26113 pts/0    00:00:00 ps
30985 pts/0    00:00:00 bash
Enter fullscreen mode Exit fullscreen mode

Let's what happens when...

you start a program that reads from the standard input from a terminal emulator. The program can be as simple as reading from standard input and writing it to the standard output, like the following.

package main

import (
    "bufio"
    "fmt"
    "os"
)

func main() {
  reader := bufio.NewReader(os.Stdin)
  fmt.Print("Enter text: ")
  text, _ := reader.ReadString('\n')
  fmt.Println(text)
}
Enter fullscreen mode Exit fullscreen mode
> go build -o simple-program
> ./simpleprogram
Enter text: 
Enter fullscreen mode Exit fullscreen mode

Try to write something while the program is awaiting for input, but instead of pressing enter press 'CTRL + W'.
You would see that the word you wrote gets deleted. Why?
'Ctrl + W' are the characters assigned to a line discipline rule called werase. Look back at the drawing. Characters typed get send to the PTY master and from there to the TTY driver which implements the line discipline. When the line discipline reads the characters 'CTRL + W', it removes the last word from its internal buffer and send to the PTY master the instructions to set the cursor back N position (where N is the length of the word) and to delete the characters from the display along the way
(we shall see how these instructions look like in a moment).

Try the same experiment, but this time instead of typing characters, press the arrow keys.
What are those weird characters ^[A^[B^[C^[D?
As we said, the terminal is simply sending keystrokes to the master. But what about keys that do not have a character representation,like our arrows?
When that happens, the terminal encodes them using multiple characters; for example the up arrow is encoded with ^[A, where ^[ is called escape sequence.
Thus, what happens when you press the arrows key is exactly what happens when you press any other key, it's just that what it gets echoed back looks weird because it was encoded that way when it was sent to the PTY master.

I hear you saying...but wait, when I press the arrows keys in the terminal with no program running I get the bash history!
This is because the ^[A ends in the bash program which interprets them as a request to get the current entry in the job history. It prints to standard out the code to clear the current line (to delete whatever was echoed so far, including the ^[A characters for the Up key) and then print the bash history line. I guess we do not see the encoded character because everything happens really fast.

The takeaway here is that the line discipline does not handle keys like Up, Left, Down, Right.

How can a program control the terminal

A shell, the TTY driver and a program we write can instruct the terminal to do stuff for us: move the cursor back or one line down, print the next line read and clear screen.
The way for programs to control the terminal is standardized by the ANSI escape codes. When the terminal reads them from the pty master will perform the operation associated with the code.

Want to change the color of the text from your program?
Just print to standard out the ANSI escape code for coloring the text.
Standard out is the PTY slave, TTY driver copies the character to the PTY master, terminal gets the code and understands it needs to set the color to print the text on the screen. Voilà'!

This is a simple program which instruct the terminal emulator to print the line using the red color. As you'll see it's as simple sending to standard out the ANSI code for changing the color to red (\033[1;31m), the string we want to write and the ANSI code to reset the color (\033[0m).

package main

import "fmt"

const redColor = "\033[1;31m%s\033[0m"

func main() {
    fmt.Printf(redColor, "Error")
    fmt.Println("")
}
Enter fullscreen mode Exit fullscreen mode

ANSI code could be a whole different article, by itself, but I am going to stop here. Want to have some fun with ANSI Code? Take a look at this article.

Conclusions

I believe this to be a good point to stop.
We have talked about TTY and PTY and seen what's their relationship with a shell.
In the next article of the series we will take a closer look at the line discipline, talk about what happens when we use programs like vim and finally write a simple golang program to create our own remote terminal.

Top comments (25)

Collapse
 
dwgillies profile image
Donald Gillies • Edited

In the old days it was too expensive for every character to be read and interpreted by the underlying program (i.e. a shell) because the UNIX RAM was small (at most 64Kwords) and with 20 people typing 60wpm the system would need 100 program context switches + swaps from disk per second. In this situation 100% of cpu time would be spent handling keypresses - no time would be left over to run programs!

The line discipline was actually a programmable middleman kernel module that could buffer from all 20 ttys until each user had successfully finished a command, at which point the middleman passed the text on to the underlying program, i.e the shell or a line-editor like ed / xed. Only one line discipline needed to be in memory for all 20 users. If command entry time is 30 secs you would now have a program wakeup every 1.5 seconds - much better than 100 times per second!

The line discipline is sort of like emacs, which has a function table of size=127 (one for each key) and does a table lookup and invokes the associated function the most common being "buffer-me" but important ones being "newline" (which sends the buffered text to the shell), "erase", "word-erase", "line-erase" (used to be bound to "@" not ^U and would erase your line buffer, displaying @ CR LF and leaving the discarded text on the screen, one line above.) Only the truly important keypresses like , ^C ^Z ^S ^P ^O ^Z ^Y required interaction with the underlying program, i.e. a shell or editor or command pipeline.

You could put the terminal in "raw mode" which is also known as "no line discipline" and the function table would be filled with 127 copies of "send-char-to-program" function, immediately producing a task wakeup. That began happening for interactive games such as rogue and later, editors like vi / vim / emacs which were viable on faster CPUs (like the 3 mips PDP-11/70). Most people are too young to know that the vi / vim editor at first used line-discipline (=cooked, the opposite of raw) mode. When you inserted characters into the middle of a line (by typing e.g. "ixyzpdq") the screen would be messed up and xyzpdq would overwrite later characters and the screen wouldn't get fixed until you hit escape. This fixup policy made vi / vim efficient on slower machines. It runs in full raw mode today but originally it used the line discipline to avoid swaps and process wakeups.

Today computers are 3000x faster with 1,000,000x more memory and have only 1 user, so the feature is an unnecessary artifact of history.

Collapse
 
napicella profile image
Nicola Apicella

Hi Donald! Thank you for sharing this!

When I started reading about the line discipline I was hoping to find its history. I intuitively thought the reasons were hardware limitations but I did not find much on the topic. Thanks to you I now have a clear picture of the context and the constraints that led to the line discipline.

Since I wrote the article, I do not think I'll offend anyone by saying that your comment is (even?) more valuable then the article itself - I know it is for me. In the second part of the series I briefly describe the line discipline, do you think I could include your comment in the article? Of course, I'll give full credit to you.

Thank you :)

Collapse
 
dwgillies profile image
Donald Gillies

No worries Nicola you helped me immensely with your kubernetes internals article, thanks! I started with UNIX v7 in 1977 and worked with or befriended some of the original UNIX inventors and developers - that wealth can now be shared!

Thread Thread
 
napicella profile image
Nicola Apicella

Thanks!

Collapse
 
laixintao profile image
赖信涛

Thanks Donald!

( I logged in my dev account that not in use for a long time just for click a "like" on this excellent comment.

Collapse
 
ultrassak profile image
ultraSsak

Hi there,
I have a question kinda related to this topic :)

How can I "mirror" everything that's happening on one terminal to another?

Backstory:
Sometime ago, when updating my rPI over ssh (no keyboard/mouse directly connected, only things connected are display over mHDMI, and power plug) I wanted to see on /dev/tty1 (the one that is displayed by physical monitor) whats happening, ie progress of update.
After some adjusting I managed to do this (Arch) like this:

pacman -Syyu | tee /dev/tty1

It worked, but...
How to do that from outside, as an owner of the system.

Fictional case:
I have found out that there is suspicious ssh connection made to my PC, happening on /dev/pts/1. I, as a root, want to hook into it, to see whats happening in it, without disturbing any communication in it.

I've tried this:
cat /dev/pts/1 > /dev/tty1, but that is not exactly working as I thought (characters are lost in pts1, and system lags hard)

Any idea? :)

Collapse
 
napicella profile image
Nicola Apicella

Hi,

I think you could use tmux. When you ssh to the machine, create a tmux session.
Then from the second PC, ssh to the machine and attach to the tmux session you have created.
Of course, this works if you ssh both times with the same user.

A more hackish way would be to redirect standard out and error of bash also to a file.
Then from the second terminal you could tail the fail:

Terminal 1

> bash -i 2>&1 | tee -a out

Terminal 2

tail -f out

I am not sure how reliable it is, bit it seems to work:
screen

I would stick with tmux though :)

Your solution of redirecting the tty does not work because /dev/tty1 is a special file:

crw--w---- 1 root tty 4, 0 Apr 17 23:10 /dev/tty0

The c at the beginning means is a character device. Although these files have the same primitives of regular files (open, read, write, etc.), they are not a representation of data on the disk - that is, they might be weird.

Collapse
 
ultrassak profile image
ultraSsak

Learn something new every day ;)
Thanks for your answer, but it's still not exactly what drills my mind.
This requires the user (first terminal) to do something special before hand, to allow root from second terminal to "spy" on it.
There has to be another way.

Thread Thread
 
napicella profile image
Nicola Apicella

Hi, sure, no problem.

Not sure what you mean by something special.
You can set up tmux to start automatically when ssh ing to the box, for example see this stack overflow answer: stackoverflow.com/a/40192494
The fist user does need to do anything :)

That's one way to tell ssh what to do when you log into the box. I believe you could also change the ssh agent config to achieve the same goal.

Thread Thread
 
ultrassak profile image
ultraSsak

Will look into it soon'ish,
Thanks! :)

Thread Thread
 
napicella profile image
Nicola Apicella

It also looks like you can spoof the tty output by using eBPF, but it basically requires running the program in the kernel. Again, it's not as easy as using tmux XD

github.com/iovisor/bcc/blob/master...

Collapse
 
1chtulhu profile image
Max Yudkin
  1. XTerm listens for keyboard events and sends the characters to the pty master
  2. The line discipline gets the character and buffers them... It also writes back its input to the master (echoing back).

Hi,

Please, help me to clarify sequence of events:

  1. xterm writes all keystrokes to pty master (fopen-ed /dev/ptmx)
  2. line discipline automatically (managed by tty driver) gets those characters
  3. buffers them and writes(echoes) back to pty master stream (fopen-ed in step 1)
  4. xterm reads from pty master (fopen-ed in step 1) and redraw UI

question 1: why does xterm need echo from line discipline if in step 1 it already writes to pty master?

question 2: what echo operation really does? If it simply writes to pty master, why no recursion occur (pty master <-> line discipline)?

Collapse
 
napicella profile image
Nicola Apicella

Back in the day, terminal were essentially just a peripheral. They were only able to send and receive data to the mainframe. Those terminals were not smart enough (not enough memory or compute) to be able to process the data typed by the user. The mainframe had to do everything for them (including echoing back characters).

A lot changed since then. We have all have plenty of computing power in our laptops and do not need to connect to mainframes anymore. What did not change is the architecture. When laptop started to become common, smart folks decided to reuse the same architecture by replicating in software (in the OS) what went on between terminals and mainframes.

Each machine is connected via two cables: one to send instructions to the computer and one to receive output from the computer.
These cables are connected to the computer through a serial cable plugged into a Universal Asynchronous Receiver and Transmitter (UART).

Collapse
 
zcooper17 profile image
Zane

I'm also confused about both questions.

Collapse
 
angelgruevski profile image
angelgruevski

"The std input, output and error of the bash will be set to be the pty slave."

What do you mean standard streams of bash will be set to be the pty slave? How are they set, is pty slave just another program and we use pipes to pass the data from pty slave to bash?

Collapse
 
napicella profile image
Nicola Apicella

Hi! I'll try to unpack your question:

  1. The pty slave is a device file
  2. Each program (of course that includes bash) is associated with at least 3 files descriptors (fd0 -> standard in, fd1 -> standard out, fd2 -> standard error). Normally fd0 is the file descriptor of the keyboard device file.

The std input, output and error of bash will be set to be the pty slave means:
the fd0, fd1 and fd2 of bash all have the same value, which is the file descriptor of the pty slave device file.

Collapse
 
maxfraguas profile image
Maximiliano • Edited

Hi Nicola,

First of all, let me thank you for your article, its was very useful and it guided me into other topics of study that clarified many doubts I had. Keep the good teaching!

Secondly, I still have two doubts.

You explained that when Bash opens a process (command), that process also sets its files descriptors to the same PTY slave file as Bash, and from there the TTY driver reads the output and sends it back to the terminal through the PTY Master.

I have two questions about this mechanism:

If a command (process) initiated by Bash, has its output set to the same PTY Slave as Bash, wouldn't Bash read the command's output as an input?

My other doubt is about a command whose output is piped as the input of a second command?
I'm guessing that in such cases, the stdout of the first command is set to a different intermediate file, which would be read as the stdin of the second command, then the second command writes its output to the same PTY slave file used by Bash. Am I close?

Thread Thread
 
napicella profile image
Nicola Apicella

Hello Maximiliano, thanks!

Regarding the first question - no the process standard input, out and error will be connected to the PTY slave. This seems to be one of the most tricky things to get, my guess is because the concept of a kernel module (like the pty module) might be confusing. I was thinking to write a follow up article on that.

About the second question - I do not think Bash needs to create temp files. I have always assumed that Bash duplicates the file descriptor so that the output of one program can be passed as standard input to next one in the pipe. This seems to be confirmed by the Bash source code: github.com/bminor/bash/blob/master...

That being said, pipes are a different beast. I do have an article which touches on that, but I haven't dived deep on they work behind the scene.

Collapse
 
gypsydave5 profile image
David Wickes

Oh, this is brilliant! A great explanation - well done.

(very much looking forward to the next bit)

Collapse
 
napicella profile image
Nicola Apicella

I am glad you liked it.
Thank you David :)

Collapse
 
instinct profile image
Instinct

I was searching for something like this which could help differentiate the gui terminals and the pure cli terminals.

Collapse
 
xargo16 profile image
Dawid Burdun

Hi, Nicola
Amazing article! However I have one question regarding the "What's a shell' section:
You wrote:

It [shell] also controls programs execution (feature called job control): kills them (CTRL + C), suspends them (CTRL + Z) ...

However, in the 2nd part of the article where you're describing the "Line Discipline", you've said that when we type CTRL +C or CTRL +Z, the "Line Discipline" sees that and sends proper signal to the process.

So who is responsible for this job control? Is it a shell or line discipline?

Collapse
 
rzkmak profile image
Rizki

Great explanation! Thanks for sharing~

Collapse
 
napicella profile image
Nicola Apicella

Thanks!

Collapse
 
hiroto profile image
Hiroto

Thanks for your great article!
I have a question about your article.
You said PTY is "Teletype emulated by a computer program running in the user land".
However, other airticle says "the pseudoterminal lives in the OS kernel.", so PTY doesn't seem to run in user land.
ishuah.com/2021/03/10/build-a-term...
Is PTY the terminal emurator which runs in user land? Or does PTY mean PTY master and slave, which are running in the OS kernel?
I am really confused with the difference between pty and tty....