Linux is an open-source operating system that is based on the Unix operating system. It was created by Linus Torvalds in 1991.
Open-source means that the source code of the operating system is available to the public. This allows anyone to modify the original code, customize it, and distribute the new operating system to potential users.
Why should you learn about Linux?
In today's data center landscape, Linux and Microsoft Windows stand out as the primary contenders, with Linux having a major share.
Here are several compelling reasons to learn Linux:
- Given the prevalence of Linux hosting, there is a high chance that your application will be hosted on Linux. So, learning Linux as a data engineer or developer becomes increasingly valuable.
- With cloud computing becoming the norm, chances are high that your cloud instances will rely on Linux.
- Linux serves as the foundation for many operating systems for the Internet of Things (IoT) and mobile applications.
- Linux is built for automation, which is central to data engineering. Linux enables repeatability, fault tolerance and observability of the entire workflow.
What is a Linux Kernel?
The kernel is the central component of an operating system that manages the computer and its hardware operations. It handles memory operations and CPU time.
The kernel acts as a bridge between applications and the hardware-level data processing using inter-process communication and system calls.
The kernel loads into memory first when an operating system starts and remains there until the system shuts down. It is responsible for tasks like disk management, task management, and memory management.
What is a Linux distribution?
The Linux kernel is reused and configured differently across distributions. You can further combine different utilities and software to create a completely new operating system.
A Linux distribution or distro is a version of the Linux operating system that includes the Linux kernel, system utilities, and other software. Being open source, a Linux distribution is a collaborative effort involving multiple independent open-source development communities.
Today, there are thousands of Linux distributions to choose from, offering differing goals and criteria for selecting and supporting the software provided by their distribution.
Distributions vary from one to the other, but they generally have several common characteristics:
- A distribution consists of a Linux kernel.
- It supports user space programs.
- A distribution may be small and single-purpose or include thousands of open-source programs.
- Some means of installing and updating the distribution and its components should be provided.
Some popular Linux distributions are:
- Ubuntu: One of the most widely used and popular Linux distributions. It is user-friendly and recommended for beginners.
- Linux Mint: Based on Ubuntu, Linux Mint provides a user-friendly experience with a focus on multimedia support.
- Arch Linux: Popular among experienced users, Arch is a lightweight and flexible distribution aimed at users who prefer a DIY approach.
- Manjaro: Based on Arch Linux, Manjaro provides a user-friendly experience with pre-installed software and easy system management tools.
- Kali Linux: Kali Linux provides a comprehensive suite of security tools and is mostly focused on cybersecurity and hacking.
How to install and access Linux
There are various methods that can be utilized in order to access Linux including on a Windows machine. This section goes into detail exploring these methods.
Install Linux as the primary OS
Installing Linux as the primary OS is the most efficient way to use Linux, as you can use the full power of your machine.
We'll focus on installing Ubuntu, which is one of the most popular Linux distributions. Linux has other numerous distributions suited for user specific applications that can be explored based on user preference.
- Step 1 – Download the Ubuntu iso file. Make sure to select a stable release that is labelled "LTS". LTS stands for Long Term Support which means you can get free security and maintenance updates for a long time (usually 5 years).
- Step 2 – Create a bootable pen drive: There are a number of softwares that can create a bootable pen drive.
- Step 3 – Boot from the pen drive: Once your bootable pen drive is ready, insert it and boot from the pen drive. The boot menu depends on your laptop. You can google the boot menu for your laptop model.
- Step 4 – Follow the prompts. Once, the boot process starts, select try or install ubuntu. The process will take some time. Once the GUI appears, you can select the language, and keyboard layout and continue. Enter your login and name. Remember the credentials as you will need them to log in to your system and access full privileges. Wait for the installation to complete.
- Step 5 – Restart: Click on restart now and remove the pen drive.
- Step 6 – Login: Login with the credentials you entered earlier.
And there you go! Now you can install apps and customize your desktop.
Accessing the terminal
An important part is learning about the terminal where you'll run all the commands and see the magic happen. You can search for the terminal by pressing the "windows" key and typing "terminal".
The shortcut for opening the terminal is ctrl + alt + t.
You can also open the terminal from inside a folder. Right click where you are and click on "Open in Terminal". This will open the terminal in the same path.
How to use Linux on a Windows machine
Sometimes you might need to run both Linux and Windows side by side. Luckily, there are some ways you can get the best of both worlds without getting different computers for each operating system.
This section explores a few ways to use Linux on a Windows machine.
Option 1: "Dual-boot" Linux + Windows
With dual boot, you can install Linux alongside Windows on your computer, allowing you to choose which operating system to use at startup.
This requires partitioning your hard drive and installing Linux on a separate partition. With this approach, you can only use one operating system at a time.
Option 2: Use Windows Subsystem for Linux (WSL)
Windows Subsystem for Linux provides a compatibility layer that lets you run Linux binary executables natively on Windows.
Using WSL has some advantages. The setup for WSL is simple and not time-consuming. It is lightweight compared to virtual machines (VMs) where you have to allocate resources from the host machine. You don't need to install any ISO or virtual disc image for Linux machines which tend to be heavy files. You can use Windows and Linux side by side.
How to install WSL2
First, enable the Windows Subsystem for Linux option in settings.
- Go to Start. Search for "Turn Windows features on or off."
- Check the option "Windows Subsystem for Linux" if it isn't already.
- Next, open your command prompt and provide the installation commands.
- Open Command Prompt as an administrator:
- Run the command below:
wsl –install
Note: By default, Ubuntu will be installed.
- Once installation is complete, you'll need to reboot your Windows machine. So, restart your Windows machine.
Once installation of Ubuntu is complete, you'll be prompted to enter your username and password.
And, that's it! You are ready to use Ubuntu.
Launch Ubuntu by searching from the start menu.
Option 3: Use a Virtual Machine (VM)
A virtual machine (VM) is a software emulation of a physical computer system. It allows you to run multiple operating systems and applications on a single physical machine simultaneously.
You can use virtualization software such as Oracle VirtualBox or VMware to create a virtual machine running Linux within your Windows environment. This allows you to run Linux as a guest operating system alongside Windows.
VM software provides options to allocate and manage hardware resources for each VM, including CPU cores, memory, disk space, and network bandwidth. You can adjust these allocations based on the requirements of the guest operating systems and applications.
Option 4: Use a Browser-based Solution
Browser-based solutions are particularly useful for quick testing, learning, or accessing Linux environments from devices that don't have Linux installed.
You can either use online code editors or web-based terminals to access Linux. Note that you usually don't have full administration privileges in these cases.
Online code editors: They offer editors with built-in Linux terminals. While their primary purpose is coding, you can also utilize the Linux terminal to execute commands and perform tasks.
Replit is an example of an online code editor, where you can write your code and access the Linux shell at the same time.
Web-based Linux terminals: Online Linux terminals allow you to access a Linux command-line interface directly from your browser. These terminals provide a web-based interface to a Linux shell, enabling you to execute commands and work with Linux utilities.
One such example is JSLinux.
Option 5: Use a Cloud-based Solution
Instead of running Linux directly on your Windows machine, you can consider using cloud-based Linux environments or virtual private servers (VPS) to access and work with Linux remotely.
Services like Amazon EC2, Microsoft Azure, or DigitalOcean provide Linux instances that you can connect to from your Windows computer. Note that some of these services offer free tiers, but they are not usually free in the long run.
Introduction to Bash Shell and System Commands
The Linux command line is provided by a program called the shell. Over the years, the shell program has evolved to cater to various options.
Different users can be configured to use different shells. But most users prefer to stick with the current default shell. The default shell for many Linux distros is the GNU Bourne-Again Shell (bash). Bash is succeeded by the Bourne shell (sh).
To find out your current shell, open your terminal and enter the following command:
echo $SHELL
Command breakdown:
- The
echocommand is used to print on the terminal. - The
$SHELLis a special variable that holds the name of the current shell.
In my setup, the output is /bin/bash. This means that I am using the bash shell.
Bash is very powerful as it can simplify certain operations that are hard to accomplish efficiently with a GUI (or Graphical User Interface). Remember that most servers do not have a GUI, and it is best to learn to use the powers of a command line interface (CLI).
Terminal vs Shell
The terms terminal and shell are often used interchangeably, but they refer to different parts of the command-line interface.
The terminal is the interface you use to interact with the shell. The shell is the command interpreter that processes and executes your commands.
What is a prompt?
When a shell is used interactively, it displays a $ when it is waiting for a command from the user. This is called the shell prompt.
[username@host ~]$
If the shell is running as root, the prompt is changed to #.
[root@host ~]#
Command Structure
A command is a program that performs a specific operation. Once you have access to the shell, you can enter any command after the $ sign and see the output on the terminal.
Generally, Linux commands follow this syntax:
command [options] [arguments]
Here is the breakdown of the above syntax:
-
command: This is the name of the command you want to execute.ls(list),cp(copy), andrm(remove) are common Linux commands. -
[options]: Options, or flags, often preceded by a hyphen (-) or double hyphen (--), modify the behavior of the command. They can change how the command operates. For example,ls -auses the-aoption to display hidden files in the current directory. -
[arguments]: Arguments are the inputs for the commands that require one. These could be filenames, user names, or other data that the command will act upon. For example, in the commandcat access.log,catis the command andaccess.logis the input. As a result, thecatcommand displays the contents of theaccess.logfile.
Options and arguments are not required for all commands. Some commands can be run without any options or arguments, while others might require one or both to function correctly. You can always refer to the command's manual to check the options and arguments it supports. You can view a command's manual using the man command.
You can access the manual page for ls with man ls.
Manual pages are a great and quick way to access the documentation. I highly recommend going through man pages for the commands that you use the most.
Managing Files From the Command line
The Linux File-system Hierarchy
All files in Linux are stored in a file-system. It follows an inverted-tree-like structure because the root is at the topmost part.
The / is the root directory and the starting point of the file system. The root directory contains all other directories and files on the system. The / character also serves as a directory separator between path names. For example, /home/alice forms a complete path.
You can learn more about the file system using the man hier command.
Navigating the Linux File-system
The absolute path is the full path from the root directory to the file or directory. It always starts with a /. For example, /home/john/documents.
The relative path, on the other hand, is the path from the current directory to the destination file or directory. It does not start with a /. For example, documents/work/project.
Locating your current directory: You can locate your current directory in the Linux file system using the pwd command.
Changing directories: The command to change directories is cd and it stands for change directory. You can use the cd command to navigate to a different directory.
Some other commonly used cd shortcuts are:
| Command | Description |
|---|---|
cd .. |
Go back one directory |
cd ../.. |
Go back two directories |
cd or cd ~ |
Go to the home directory |
cd - |
Go to the previous path |
Managing Files and Directories
Creating new directories: You can create an empty directory using the mkdir command.
# creates an empty directory named "foo" in the current folder
mkdir foo
You can also create directories recursively using the -p option.
Creating new files: The touch command creates an empty file. You can use it like this:
# creates empty file "file.txt" in the current folder
touch file.txt
The file names can be chained together if you want to create multiple files in a single command.
# creates empty files "file1.txt", "file2.txt", and "file3.txt" in the current folder
touch file1.txt file2.txt file3.txt
Removing files and directories: You can use the rm command to remove both files and non-empty directories. The rmdir command removes an empty directory.
| Command | Description |
|---|---|
rm file.txt |
Removes the file file.txt |
rm -r directory |
Removes the directory directory and its contents |
rm -f file.txt |
Removes the file file.txt without prompting for confirmation |
rmdir directory |
Removes an empty directory |
Copying files using the cp command: To copy files in Linux, use the cp command.
- Syntax to copy files:
cp source_file destination_of_fileThis command copies a file named file1.txt to a new file location /home/adam/log.
cp file1.txt /home/adam/logs
The cp command also creates a copy of one file with the provided name.
This command copies a file named file1.txt to another file named file2.txt in the same folder.
cp file1.txt file2.txt
Moving and renaming files and folders: The mv command is used to rename and move files and folders from one directory to the other.
- Syntax to move files:
mv source_file destination_directory
# Moves a file named file1.txt to a directory named backup
mv file1.txt backup/
To move a directory and its contents:
mv dir1/ backup/
Renaming files and folders in Linux is also done with the mv command.
Syntax to rename files: mv old_name new_name
#Renames a file from file1.txt to file2.txt
mv file1.txt file2.txt
Locating Files and Folders: The find command lets you efficiently search for files, folders, and character and block devices.
Below is the basic syntax of the find command:
find /path/ -type f -name file-to-search
Where,
-
/pathis the path where the file is expected to be found. This is the starting point for searching files. The path can also be/or . which represents the root and current directory, respectively. -
-typerepresents the file descriptors. They can be any of the below: -
f– Regular file such as text files, images, and hidden files. -
d– Directory. These are the folders under consideration. -
l– Symbolic link. Symbolic links point to files and are similar to shortcuts. -
c– Character devices. Files that are used to access character devices are called character device files. Drivers communicate with character devices by sending and receiving single characters (bytes, octets). Examples include keyboards, sound cards, and the mouse. -
b– Block devices. Files that are used to access block devices are called block device files. Drivers communicate with block devices by sending and receiving entire blocks of data. Examples include USB and CD-ROM -
-nameis the name of the file type that you want to search.
Basic Commands for Viewing Files
Display files and files contents: The cat command in Linux is used to display the contents of a file.
Here is the basic syntax of the cat command:
cat [options] [file]
If you want to view the contents of a file named file.txt, you can use the following command:
cat file.txt
This will display all the contents of the file on the terminal at once.
Viewing text files interactively using less and more
While cat displays the entire file at once, less and more allow you to view the contents of a file interactively. This is useful when you want to scroll through a large file or search for specific content.
The syntax of the less command is:
less [options] [file]
The more command is similar to less but has fewer features. It is used to display the contents of a file one screen at a time.
The syntax of the more command is:
more [options] [file]
The Essentials of Text Editing in Linux
Text editing skills using the command line are one of the most crucial skills in Linux. In this section, you will learn how to use two popular text editors in Linux: Vim and Nano. Vim and nano are safe choices to learn text editing as they are present on most Linux distributions.
Mastering Vim: Introductory Guide to Vim
Introduction to Vim
Vim is a popular text editing tool for the command line. Vim comes with its advantages: it is powerful, customizable, and fast. Vim has two variations: Vim (vim) and Vim tiny (vi). Vim tiny is a smaller version of Vim that lacks some features of Vim.
Here are some reasons why you should consider learning Vim:
- Most servers are accessed via a CLI, so in system administration, you don't necessarily have the luxury of a GUI. But Vim will always be there.
- Vim uses a keyboard-centric approach, as it is designed to be used without a mouse, which can significantly speed up editing tasks once you have learned the keyboard shortcuts. This also makes it faster than GUI tools.
- Vim is suitable for all – beginners and advanced users. Vim supports complex string searches, highlighting searches, and much more. Through plugins, Vim provides extended capabilities to developers and system admins that includes code completion, syntax highlighting, file management, version control, and more.
The three Vim modes
You need to know the 3 operating modes of Vim and how to switch between them. Keystrokes behave differently in each command mode. The three modes are as follows:
- Command mode.
- Edit mode.
- Visual mode.
Command Mode.
When you start Vim, you land in the command mode by default. This mode allows you to access other modes.
To switch to other modes, you need to be present in the command mode first
Edit Mode
This mode allows you to make changes to the file. To enter edit mode, press I while in command mode.
Visual mode
This mode allows you to work on a single character, a block of text, or lines of text. Let's break it down into simple steps. Remember, use the below combinations when in command mode.
-
Shift + V→ Select multiple lines. -
Ctrl + V→ Block mode -
V→ Character mode The visual mode comes in handy when you need to copy and paste or edit lines in bulk.
Extended command mode.
The extended command mode allows you to perform advanced operations like searching, setting line numbers, and highlighting text. We'll cover extended mode in the next section.
Shortcuts in Vim: Making Editing Faster
Note: All these shortcuts work in the command mode only.
Basic Navigation
| Command | Explanation |
|---|---|
h |
Move left |
j |
Move down |
k |
Move up |
l |
Move right |
0 |
Move to the beginning of the line |
$ |
Move to the end of the line |
gg |
Move to the beginning of the file |
G |
Move to the end of the file |
Ctrl+d |
Move half-page down |
Ctrl+u |
Move half-page up |
Editing
| Command | Explanation |
|---|---|
i |
Enter insert mode before the cursor |
I |
Enter insert mode at the beginning of the line |
a |
Enter insert mode after the cursor |
A |
Enter insert mode at the end of the line |
o |
Open a new line below the current line and enter insert mode |
O |
Open a new line above the current line and enter insert mode |
x |
Delete the character under the cursor |
dd |
Delete the current line |
yy |
Yank (copy) the current line |
p |
Paste below the cursor |
P |
Paste above the cursor |
Searching and Replacing
| Command | Explanation |
|---|---|
/ |
Search for a pattern which will take you to its next occurrence |
? |
Search for a pattern that will take you to its previous occurrence |
n |
Repeat the last search in the same direction |
N |
Repeat the last search in the opposite direction |
:%s/old/new/g |
Replace all occurrences of old with new in the file |
Exiting
| Command | Explanation |
|---|---|
:w |
Save the file but don't exit |
:q |
Quit Vim (fails if there are unsaved changes) |
:wq or :x
|
Save and quit |
:q! |
Quit without saving |
Multiple Windows
| Command | Explanation |
|---|---|
:split or :sp
|
Split the window horizontally |
:vsplit or :vsp
|
Split the window vertically |
Ctrl+w followed by h/j/k/l |
Navigate between split windows |
Mastering Nano
Getting started with Nano: The user-friendly text editor
Nano is a user-friendly text editor that is easy to use and is perfect for beginners. It is pre-installed on most Linux distributions.
To create a new file using Nano, use the following command:
nano
To start editing an existing file with Nano, use the following command:
nano filename
List of key bindings in Nano
General
| Command | Explanation |
|---|---|
Ctrl+X |
Exit Nano (prompting to save if changes are made) |
Ctrl+O |
Save the file |
Ctrl+R |
Read a file into the current file |
Ctrl+G |
Display the help text |
Editing
| Command | Explanation |
|---|---|
Ctrl+K |
Cut the current line and store it in the cutbuffer |
Ctrl+U |
Paste the contents of the cutbuffer into the current line |
Alt+6 |
Copy the current line and store it in the cutbuffer |
| Ctrl+J | Justify the current paragraph |
Navigation
| Command | Explanation |
|---|---|
Ctrl+A |
Move to the beginning of the line |
Ctrl+E |
Move to the end of the line |
Ctrl+C |
Display the current line number and file information |
Ctrl+_ (Ctrl+Shift+-) |
Go to a specific line (and optionally, column) number |
Ctrl+Y |
Scroll up one page |
Ctrl+V |
Scroll down one page |
Search and Replace
| Command | Explanation |
|---|---|
Ctrl+W |
Search for a string (then Enter to search again) |
Alt+W |
Repeat the last search but in the opposite direction |
Ctrl+\ |
Search and replace |
Miscellaneous
| Command | Explanation |
|---|---|
Ctrl+T |
Invoke the spell checker, if available |
Ctrl+D |
Delete the character under the cursor (does not cut it) |
Ctrl+L |
Refresh (redraw) the current screen |
Alt+U |
Undo the last operation |
Alt+E |
Redo the last undone operation |
Conclusion
This article introduced Linux from both a conceptual and practical perspective, covering its core components, common distributions, and different ways to access it. We explored essential command-line skills, including file system navigation, system commands, and text editing using Vim and Nano.
For data engineers, Linux is a critical platform because most data systems and cloud infrastructures run on it. Mastery of Linux enables efficient automation, system management, troubleshooting, and deployment of data pipelines. As a result, Linux is not just a supporting skill, but a foundational requirement for working effectively in modern data engineering environments.





Top comments (0)