DEV Community

Edmund Eryuba
Edmund Eryuba

Posted on

Introduction to Linux for Data Engineers, Including Practical Use of Vi and Nano with Examples

Linux is an open-source operating system that is based on the Unix operating system. It was created by Linus Torvalds in 1991.
Open-source means that the source code of the operating system is available to the public. This allows anyone to modify the original code, customize it, and distribute the new operating system to potential users.

Why should you learn about Linux?

In today's data center landscape, Linux and Microsoft Windows stand out as the primary contenders, with Linux having a major share.

Here are several compelling reasons to learn Linux:

  • Given the prevalence of Linux hosting, there is a high chance that your application will be hosted on Linux. So, learning Linux as a data engineer or developer becomes increasingly valuable.
  • With cloud computing becoming the norm, chances are high that your cloud instances will rely on Linux.
  • Linux serves as the foundation for many operating systems for the Internet of Things (IoT) and mobile applications.
  • Linux is built for automation, which is central to data engineering. Linux enables repeatability, fault tolerance and observability of the entire workflow.

What is a Linux Kernel?

The kernel is the central component of an operating system that manages the computer and its hardware operations. It handles memory operations and CPU time.

The kernel acts as a bridge between applications and the hardware-level data processing using inter-process communication and system calls.
The kernel loads into memory first when an operating system starts and remains there until the system shuts down. It is responsible for tasks like disk management, task management, and memory management.

What is a Linux distribution?

The Linux kernel is reused and configured differently across distributions. You can further combine different utilities and software to create a completely new operating system.

A Linux distribution or distro is a version of the Linux operating system that includes the Linux kernel, system utilities, and other software. Being open source, a Linux distribution is a collaborative effort involving multiple independent open-source development communities.

Today, there are thousands of Linux distributions to choose from, offering differing goals and criteria for selecting and supporting the software provided by their distribution.

Distributions vary from one to the other, but they generally have several common characteristics:

  • A distribution consists of a Linux kernel.
  • It supports user space programs.
  • A distribution may be small and single-purpose or include thousands of open-source programs.
  • Some means of installing and updating the distribution and its components should be provided.

Some popular Linux distributions are:

  1. Ubuntu: One of the most widely used and popular Linux distributions. It is user-friendly and recommended for beginners.
  2. Linux Mint: Based on Ubuntu, Linux Mint provides a user-friendly experience with a focus on multimedia support.
  3. Arch Linux: Popular among experienced users, Arch is a lightweight and flexible distribution aimed at users who prefer a DIY approach.
  4. Manjaro: Based on Arch Linux, Manjaro provides a user-friendly experience with pre-installed software and easy system management tools.
  5. Kali Linux: Kali Linux provides a comprehensive suite of security tools and is mostly focused on cybersecurity and hacking.

How to install and access Linux

There are various methods that can be utilized in order to access Linux including on a Windows machine. This section goes into detail exploring these methods.

Install Linux as the primary OS

Installing Linux as the primary OS is the most efficient way to use Linux, as you can use the full power of your machine.
We'll focus on installing Ubuntu, which is one of the most popular Linux distributions. Linux has other numerous distributions suited for user specific applications that can be explored based on user preference.

  • Step 1 – Download the Ubuntu iso file. Make sure to select a stable release that is labelled "LTS". LTS stands for Long Term Support which means you can get free security and maintenance updates for a long time (usually 5 years).
  • Step 2 – Create a bootable pen drive: There are a number of softwares that can create a bootable pen drive.
  • Step 3 – Boot from the pen drive: Once your bootable pen drive is ready, insert it and boot from the pen drive. The boot menu depends on your laptop. You can google the boot menu for your laptop model.
  • Step 4 – Follow the prompts. Once, the boot process starts, select try or install ubuntu. The process will take some time. Once the GUI appears, you can select the language, and keyboard layout and continue. Enter your login and name. Remember the credentials as you will need them to log in to your system and access full privileges. Wait for the installation to complete.
  • Step 5 – Restart: Click on restart now and remove the pen drive.
  • Step 6 – Login: Login with the credentials you entered earlier.

And there you go! Now you can install apps and customize your desktop.

Accessing the terminal

An important part is learning about the terminal where you'll run all the commands and see the magic happen. You can search for the terminal by pressing the "windows" key and typing "terminal".
The shortcut for opening the terminal is ctrl + alt + t.

You can also open the terminal from inside a folder. Right click where you are and click on "Open in Terminal". This will open the terminal in the same path.

How to use Linux on a Windows machine

Sometimes you might need to run both Linux and Windows side by side. Luckily, there are some ways you can get the best of both worlds without getting different computers for each operating system.
This section explores a few ways to use Linux on a Windows machine.

Option 1: "Dual-boot" Linux + Windows

With dual boot, you can install Linux alongside Windows on your computer, allowing you to choose which operating system to use at startup.

This requires partitioning your hard drive and installing Linux on a separate partition. With this approach, you can only use one operating system at a time.

Option 2: Use Windows Subsystem for Linux (WSL)

Windows Subsystem for Linux provides a compatibility layer that lets you run Linux binary executables natively on Windows.

Using WSL has some advantages. The setup for WSL is simple and not time-consuming. It is lightweight compared to virtual machines (VMs) where you have to allocate resources from the host machine. You don't need to install any ISO or virtual disc image for Linux machines which tend to be heavy files. You can use Windows and Linux side by side.

How to install WSL2

First, enable the Windows Subsystem for Linux option in settings.

  • Go to Start. Search for "Turn Windows features on or off."
  • Check the option "Windows Subsystem for Linux" if it isn't already.

Windows features

  • Next, open your command prompt and provide the installation commands.
  • Open Command Prompt as an administrator:
  • Run the command below:
wsl –install
Enter fullscreen mode Exit fullscreen mode

Note: By default, Ubuntu will be installed.

  • Once installation is complete, you'll need to reboot your Windows machine. So, restart your Windows machine.

Once installation of Ubuntu is complete, you'll be prompted to enter your username and password.
And, that's it! You are ready to use Ubuntu.
Launch Ubuntu by searching from the start menu.

Option 3: Use a Virtual Machine (VM)

A virtual machine (VM) is a software emulation of a physical computer system. It allows you to run multiple operating systems and applications on a single physical machine simultaneously.

You can use virtualization software such as Oracle VirtualBox or VMware to create a virtual machine running Linux within your Windows environment. This allows you to run Linux as a guest operating system alongside Windows.

VM software provides options to allocate and manage hardware resources for each VM, including CPU cores, memory, disk space, and network bandwidth. You can adjust these allocations based on the requirements of the guest operating systems and applications.

Option 4: Use a Browser-based Solution

Browser-based solutions are particularly useful for quick testing, learning, or accessing Linux environments from devices that don't have Linux installed.
You can either use online code editors or web-based terminals to access Linux. Note that you usually don't have full administration privileges in these cases.

Online code editors: They offer editors with built-in Linux terminals. While their primary purpose is coding, you can also utilize the Linux terminal to execute commands and perform tasks.

Replit is an example of an online code editor, where you can write your code and access the Linux shell at the same time.

Web-based Linux terminals: Online Linux terminals allow you to access a Linux command-line interface directly from your browser. These terminals provide a web-based interface to a Linux shell, enabling you to execute commands and work with Linux utilities.
One such example is JSLinux.

Option 5: Use a Cloud-based Solution

Instead of running Linux directly on your Windows machine, you can consider using cloud-based Linux environments or virtual private servers (VPS) to access and work with Linux remotely.

Services like Amazon EC2, Microsoft Azure, or DigitalOcean provide Linux instances that you can connect to from your Windows computer. Note that some of these services offer free tiers, but they are not usually free in the long run.

Introduction to Bash Shell and System Commands

The Linux command line is provided by a program called the shell. Over the years, the shell program has evolved to cater to various options.

Different users can be configured to use different shells. But most users prefer to stick with the current default shell. The default shell for many Linux distros is the GNU Bourne-Again Shell (bash). Bash is succeeded by the Bourne shell (sh).

To find out your current shell, open your terminal and enter the following command:

echo $SHELL
Enter fullscreen mode Exit fullscreen mode

Command breakdown:

  • The echo command is used to print on the terminal.
  • The $SHELL is a special variable that holds the name of the current shell.

In my setup, the output is /bin/bash. This means that I am using the bash shell.

bin bash

Bash is very powerful as it can simplify certain operations that are hard to accomplish efficiently with a GUI (or Graphical User Interface). Remember that most servers do not have a GUI, and it is best to learn to use the powers of a command line interface (CLI).

Terminal vs Shell

The terms terminal and shell are often used interchangeably, but they refer to different parts of the command-line interface.
The terminal is the interface you use to interact with the shell. The shell is the command interpreter that processes and executes your commands.

What is a prompt?

When a shell is used interactively, it displays a $ when it is waiting for a command from the user. This is called the shell prompt.

[username@host ~]$
Enter fullscreen mode Exit fullscreen mode

If the shell is running as root, the prompt is changed to #.

[root@host ~]#
Enter fullscreen mode Exit fullscreen mode

Command Structure

A command is a program that performs a specific operation. Once you have access to the shell, you can enter any command after the $ sign and see the output on the terminal.
Generally, Linux commands follow this syntax:

command [options] [arguments]
Enter fullscreen mode Exit fullscreen mode

Here is the breakdown of the above syntax:

  • command: This is the name of the command you want to execute. ls (list), cp (copy), and rm (remove) are common Linux commands.
  • [options]: Options, or flags, often preceded by a hyphen (-) or double hyphen (--), modify the behavior of the command. They can change how the command operates. For example, ls -a uses the -a option to display hidden files in the current directory.
  • [arguments]: Arguments are the inputs for the commands that require one. These could be filenames, user names, or other data that the command will act upon. For example, in the command cat access.log, cat is the command and access.log is the input. As a result, the cat command displays the contents of the access.log file.

Options and arguments are not required for all commands. Some commands can be run without any options or arguments, while others might require one or both to function correctly. You can always refer to the command's manual to check the options and arguments it supports. You can view a command's manual using the man command.

You can access the manual page for ls with man ls.

man ls

Manual pages are a great and quick way to access the documentation. I highly recommend going through man pages for the commands that you use the most.

Managing Files From the Command line

The Linux File-system Hierarchy

All files in Linux are stored in a file-system. It follows an inverted-tree-like structure because the root is at the topmost part.

The / is the root directory and the starting point of the file system. The root directory contains all other directories and files on the system. The / character also serves as a directory separator between path names. For example, /home/alice forms a complete path.
You can learn more about the file system using the man hier command.

man hier output

Navigating the Linux File-system

The absolute path is the full path from the root directory to the file or directory. It always starts with a /. For example, /home/john/documents.
The relative path, on the other hand, is the path from the current directory to the destination file or directory. It does not start with a /. For example, documents/work/project.

Locating your current directory: You can locate your current directory in the Linux file system using the pwd command.

pwd command

Changing directories: The command to change directories is cd and it stands for change directory. You can use the cd command to navigate to a different directory.

Some other commonly used cd shortcuts are:

Command Description
cd .. Go back one directory
cd ../.. Go back two directories
cd or cd ~ Go to the home directory
cd - Go to the previous path

Managing Files and Directories

Creating new directories: You can create an empty directory using the mkdir command.

# creates an empty directory named "foo" in the current folder
mkdir foo
Enter fullscreen mode Exit fullscreen mode

You can also create directories recursively using the -p option.

Creating new files: The touch command creates an empty file. You can use it like this:

# creates empty file "file.txt" in the current folder
touch file.txt
Enter fullscreen mode Exit fullscreen mode

The file names can be chained together if you want to create multiple files in a single command.

# creates empty files "file1.txt", "file2.txt", and "file3.txt" in the current folder

touch file1.txt file2.txt file3.txt
Enter fullscreen mode Exit fullscreen mode

Removing files and directories: You can use the rm command to remove both files and non-empty directories. The rmdir command removes an empty directory.

Command Description
rm file.txt Removes the file file.txt
rm -r directory Removes the directory directory and its contents
rm -f file.txt Removes the file file.txt without prompting for confirmation
rmdir directory Removes an empty directory

Copying files using the cp command: To copy files in Linux, use the cp command.

  • Syntax to copy files: cp source_file destination_of_file This command copies a file named file1.txt to a new file location /home/adam/log.
cp file1.txt /home/adam/logs
Enter fullscreen mode Exit fullscreen mode

The cp command also creates a copy of one file with the provided name.
This command copies a file named file1.txt to another file named file2.txt in the same folder.

cp file1.txt file2.txt
Enter fullscreen mode Exit fullscreen mode

Moving and renaming files and folders: The mv command is used to rename and move files and folders from one directory to the other.

  • Syntax to move files: mv source_file destination_directory
# Moves a file named file1.txt to a directory named backup

mv file1.txt backup/
Enter fullscreen mode Exit fullscreen mode

To move a directory and its contents:

mv dir1/ backup/
Enter fullscreen mode Exit fullscreen mode

Renaming files and folders in Linux is also done with the mv command.

Syntax to rename files: mv old_name new_name

#Renames a file from file1.txt to file2.txt

mv file1.txt file2.txt
Enter fullscreen mode Exit fullscreen mode

Locating Files and Folders: The find command lets you efficiently search for files, folders, and character and block devices.
Below is the basic syntax of the find command:

find /path/ -type f -name file-to-search
Enter fullscreen mode Exit fullscreen mode

Where,

  • /path is the path where the file is expected to be found. This is the starting point for searching files. The path can also be/or . which represents the root and current directory, respectively.
  • -type represents the file descriptors. They can be any of the below:
  • fRegular file such as text files, images, and hidden files.
  • dDirectory. These are the folders under consideration.
  • lSymbolic link. Symbolic links point to files and are similar to shortcuts.
  • cCharacter devices. Files that are used to access character devices are called character device files. Drivers communicate with character devices by sending and receiving single characters (bytes, octets). Examples include keyboards, sound cards, and the mouse.
  • bBlock devices. Files that are used to access block devices are called block device files. Drivers communicate with block devices by sending and receiving entire blocks of data. Examples include USB and CD-ROM
  • -name is the name of the file type that you want to search.

Basic Commands for Viewing Files

Display files and files contents: The cat command in Linux is used to display the contents of a file.

Here is the basic syntax of the cat command:

cat [options] [file]
Enter fullscreen mode Exit fullscreen mode

If you want to view the contents of a file named file.txt, you can use the following command:

cat file.txt
Enter fullscreen mode Exit fullscreen mode

This will display all the contents of the file on the terminal at once.

Viewing text files interactively using less and more

While cat displays the entire file at once, less and more allow you to view the contents of a file interactively. This is useful when you want to scroll through a large file or search for specific content.

The syntax of the less command is:

less [options] [file]
Enter fullscreen mode Exit fullscreen mode

The more command is similar to less but has fewer features. It is used to display the contents of a file one screen at a time.
The syntax of the more command is:

more [options] [file]
Enter fullscreen mode Exit fullscreen mode

The Essentials of Text Editing in Linux

Text editing skills using the command line are one of the most crucial skills in Linux. In this section, you will learn how to use two popular text editors in Linux: Vim and Nano. Vim and nano are safe choices to learn text editing as they are present on most Linux distributions.

Mastering Vim: Introductory Guide to Vim

Introduction to Vim

Vim is a popular text editing tool for the command line. Vim comes with its advantages: it is powerful, customizable, and fast. Vim has two variations: Vim (vim) and Vim tiny (vi). Vim tiny is a smaller version of Vim that lacks some features of Vim.

Here are some reasons why you should consider learning Vim:

  • Most servers are accessed via a CLI, so in system administration, you don't necessarily have the luxury of a GUI. But Vim will always be there.
  • Vim uses a keyboard-centric approach, as it is designed to be used without a mouse, which can significantly speed up editing tasks once you have learned the keyboard shortcuts. This also makes it faster than GUI tools.
  • Vim is suitable for all – beginners and advanced users. Vim supports complex string searches, highlighting searches, and much more. Through plugins, Vim provides extended capabilities to developers and system admins that includes code completion, syntax highlighting, file management, version control, and more.

The three Vim modes

You need to know the 3 operating modes of Vim and how to switch between them. Keystrokes behave differently in each command mode. The three modes are as follows:

  1. Command mode.
  2. Edit mode.
  3. Visual mode.

Command Mode.

When you start Vim, you land in the command mode by default. This mode allows you to access other modes.
To switch to other modes, you need to be present in the command mode first

Edit Mode

This mode allows you to make changes to the file. To enter edit mode, press I while in command mode.

Visual mode

This mode allows you to work on a single character, a block of text, or lines of text. Let's break it down into simple steps. Remember, use the below combinations when in command mode.

  • Shift + V → Select multiple lines.
  • Ctrl + V → Block mode
  • V → Character mode The visual mode comes in handy when you need to copy and paste or edit lines in bulk.

Extended command mode.

The extended command mode allows you to perform advanced operations like searching, setting line numbers, and highlighting text. We'll cover extended mode in the next section.

Shortcuts in Vim: Making Editing Faster

Note: All these shortcuts work in the command mode only.

Basic Navigation

Command Explanation
h Move left
j Move down
k Move up
l Move right
0 Move to the beginning of the line
$ Move to the end of the line
gg Move to the beginning of the file
G Move to the end of the file
Ctrl+d Move half-page down
Ctrl+u Move half-page up

Editing

Command Explanation
i Enter insert mode before the cursor
I Enter insert mode at the beginning of the line
a Enter insert mode after the cursor
A Enter insert mode at the end of the line
o Open a new line below the current line and enter insert mode
O Open a new line above the current line and enter insert mode
x Delete the character under the cursor
dd Delete the current line
yy Yank (copy) the current line
p Paste below the cursor
P Paste above the cursor

Searching and Replacing

Command Explanation
/ Search for a pattern which will take you to its next occurrence
? Search for a pattern that will take you to its previous occurrence
n Repeat the last search in the same direction
N Repeat the last search in the opposite direction
:%s/old/new/g Replace all occurrences of old with new in the file

Exiting

Command Explanation
:w Save the file but don't exit
:q Quit Vim (fails if there are unsaved changes)
:wq or :x Save and quit
:q! Quit without saving

Multiple Windows

Command Explanation
:split or :sp Split the window horizontally
:vsplit or :vsp Split the window vertically
Ctrl+w followed by h/j/k/l Navigate between split windows

Mastering Nano

Getting started with Nano: The user-friendly text editor

Nano is a user-friendly text editor that is easy to use and is perfect for beginners. It is pre-installed on most Linux distributions.
To create a new file using Nano, use the following command:

nano
Enter fullscreen mode Exit fullscreen mode

To start editing an existing file with Nano, use the following command:

nano filename
Enter fullscreen mode Exit fullscreen mode

List of key bindings in Nano

General

Command Explanation
Ctrl+X Exit Nano (prompting to save if changes are made)
Ctrl+O Save the file
Ctrl+R Read a file into the current file
Ctrl+G Display the help text

Editing

Command Explanation
Ctrl+K Cut the current line and store it in the cutbuffer
Ctrl+U Paste the contents of the cutbuffer into the current line
Alt+6 Copy the current line and store it in the cutbuffer
Ctrl+J Justify the current paragraph

Navigation

Command Explanation
Ctrl+A Move to the beginning of the line
Ctrl+E Move to the end of the line
Ctrl+C Display the current line number and file information
Ctrl+_ (Ctrl+Shift+-) Go to a specific line (and optionally, column) number
Ctrl+Y Scroll up one page
Ctrl+V Scroll down one page

Search and Replace

Command Explanation
Ctrl+W Search for a string (then Enter to search again)
Alt+W Repeat the last search but in the opposite direction
Ctrl+\ Search and replace

Miscellaneous

Command Explanation
Ctrl+T Invoke the spell checker, if available
Ctrl+D Delete the character under the cursor (does not cut it)
Ctrl+L Refresh (redraw) the current screen
Alt+U Undo the last operation
Alt+E Redo the last undone operation

Conclusion

This article introduced Linux from both a conceptual and practical perspective, covering its core components, common distributions, and different ways to access it. We explored essential command-line skills, including file system navigation, system commands, and text editing using Vim and Nano.

For data engineers, Linux is a critical platform because most data systems and cloud infrastructures run on it. Mastery of Linux enables efficient automation, system management, troubleshooting, and deployment of data pipelines. As a result, Linux is not just a supporting skill, but a foundational requirement for working effectively in modern data engineering environments.

Top comments (0)