If you’re getting into data engineering, Linux is not optional.it’s a core skill.
Most data systems in the real world run on Linux, and knowing your way around the terminal makes your work faster, cleaner, and more powerful.
This article explains why Linux matters for data engineers, introduces essential Linux commands, and shows how to create and edit files using Vi and Nano, all in plain language.
Why Linux Is Important for Data Engineers
As a data engineer, you will work with:
- Data pipelines (ETL / ELT)
- Servers and cloud machines (AWS, GCP, Azure)
- Databases (Postgres, MySQl)
- Big data tools (Spark, Kafka, Airflow)
Almost all of these run on Linux servers.
Linux helps you:
- Work directly on production servers
- Automate tasks using scripts
- Debug issues quickly
- Handle large files efficiently
- Understand how data flows at system level If you can use Linux confidently, you immediately stand out as “production-ready”.
Understanding the Linux Terminal
The terminal is just a way to talk to your computer using commands instead of clicking buttons.
eg:
ls - shows whta files are in
Essential Linux Commands for Data Engineers
pwd – Where am I?
pwd
Output:
/home/rose
This shows your current directory.
ls – List files
Output:
data scripts README.md
Common options:
ls -l # detailed view
ls -a # include hidden files
cd – Move between folders- I mean change to folder you want
cd dev
Go back:
cd ..
Go home:
cd ~
mkdir – Create folders
mkdir dataengineering
This is very common when organizing ETL jobs.
touch – Create files
touch extract_data.py
Creates an empty file — perfect for scripts.
cat – View file content
cat README.md
Use:
q → quit from where you are
/error → search for “error”
This is extremely useful for debugging pipelines.
Editing Files with Nano
Nano is simple and safe for beginners.
Open a file with Nano
nano extract_data.py
Write:
print("Extracting data...")
Nano shortcuts:
CTRL + O → Save
Enter → Confirm
CTRL + X → Exit
Nano tells you the shortcuts at the bottom
Editing Files with Vi
Vi (or Vim) is everywhere in Linux servers.
Open a file
vi transform.sql
Vi modes
Normal mode i.e navigation
Insert mode i.e typing
Command mode i.e saving & quitting
Start typing
Press:
i
Now type:
SELECT * FROM users;
Save and exit
Press:
ESC
Then type:
:wq
And press Enter.
Exit without saving
:q!
Practical Example: Creating a Data Script
mkdir etl
cd etl
touch extract.sh
nano extract.sh
Inside the file:
!/bin/bash
echo "Starting data extraction..."
Make it executable:
chmod +x extract.sh
Run it:
./extract.sh
Output:
Starting data extraction...
Permissions
Linux controls who can read, write, or execute files.
Check permissions:
ls -l
Example:
-rwxr-xr-- extract.sh
Meaning:
Owner can read/write/execute
Group can read/execute
Others can read
This matters a lot on shared servers.
Where You’ll Use These Skills as a Data Engineer
- SSH into cloud servers
- Edit Airflow DAGs
- Inspect Spark logs
- Manage cron jobs
- Automate daily pipelines
- Debug production failures
Linux is the operating system of data infrastructure.
Top comments (0)