1. What is Linux, and Why Data Engineers Use It
Linux is a widely used operating system for servers and the cloud. Most data platforms — such as Hadoop, Spark, Kafka, Airflow, and cloud machines — run on Linux.
For data engineers, Linux is important because:
Most data systems run on Linux servers – If you deploy data pipelines, databases, or analytics platforms, you are almost always working on Linux.
It is efficient and stable – Linux handles large data processing jobs well and can run continuously without frequent restarts.
It gives you control – You can automate tasks, manage files, and inspect logs directly from the terminal.
Cloud platforms use Linux – AWS, Azure, and Google Cloud primarily use Linux-based virtual machines.
In simple terms: if you work with data at scale, Linux is the environment where that work lives.
2. The Linux Terminal (Command Line)
Linux is often used through the terminal. Instead of clicking buttons, you type commands. This may feel strange at first, but it is powerful and fast once you get used to it.
2.1. Basic Linux Commands Every Beginner Should Know
Below are some common commands data engineers use daily:
pwd - check current directory
ls - list files and folders
mkdir new_directory - create a new directory
cd new_directory - move into the directory
touch empty_file - create an empty file
cat empty_file - view the file
3. Why Text Editors Matter in Data Engineering
As a data engineer, you constantly edit:
- Configuration files
- SQL scripts
- Python or Bash scripts
- Log files
On Linux, you often edit on the command line without a graphical editor.
The two most common terminal editors are:
Vi or Vim - Very powerful, with a steep learning curve
Nano - Simple and beginner-friendly
- Using Nano (Best for Beginners) 4.1 Opening Nano
To create or open a file with Nano:
nano pipeline_notes.txt
You will see a simple editor with instructions at the bottom.
4.2 Editing a File in Nano
Inside Nano, type the following:
This file documents our data pipeline.
Source: CSV files
Destination: Data Warehouse
Nano works like a normal editor, just type.
4.3 Saving and Exiting Nano
Press Ctrl + 0 to save the file
Press Enter to confirm the filename
Press Ctrl + X to exit Nano.
This simplicity makes Nano great for Linux users.
5. Using Vi(Very Common on Servers)
The image below shows different commands used to navigate servers using Vi:

Vi is available on almost every Linux system. It has different modes, which is what confuses most people.
5.1 Opening a File with Vi
vi pipeline_notes.txt
You start in Normal Mode (You cannot type text yet)
5.2 Entering Insert Mode
To start typing:
- Press i (insert mode)
Now type:
Processed daily using a cron job
Owner: Data Engineering Team
5.3 Saving and Exiting Vi
Press Esc (return to normal mode)
Type: wq
Press Enter
Explanation: -:w>write(save)-:q>quit
5.4 If You Make a Mistake
To exit without saving:
:q!
6. Viewing the Final File from the Terminal
After editing with Nano or Vi, you can confirm the contents:
cat pipeline_notes.txt
Output:
This file documents our data pipeline.
Source: CSV files
Destination: Data Warehouse
Processed daily using a cron job
Owner: Data Engineering Team
7. How This Connects to Real Data Engineering Work
In real projects, data engineers use Linux to:
- SSH into cloud servers
- Edit Airflow DAGs using Vi or Nano
- Check pipeline logs
- Automate jobs using shell scripts
- Manage data files and folders
For example:
ssh user@data-server
cd /opt/airflow/dags
vi daily_sales_pipeline.py
This is very common in production environments.
8. Summary
- Linux is the default environment for data engineering work
- Knowing Linux commands helps you move faster and troubleshoot issues
- Nano is simple and ideal for beginners
- Vi is powerful and widely available on servers
- Text editing in the terminal is a core practical skill for data engineers
If you are new to Linux, start with Nano, learn the basics of Vi, and practice daily.
Top comments (0)