Introduction
As a student of Data Engineering, learning and understanding the fundamentals of Linux is a MUST. As a matter of fact, for one to smoothly learn and grow in the field of Data Engineering they have to be good at Linux.
What is Linux
Linux is an open-source operating system mostly used in servers, cloud platforms and data systems to:
- Run applications and services.
- Process and manage large amounts of data
- Host websites and backend systems
- Automate tasks and workflows (using scripts and schedulers)
- Support cloud infrastructure (virtual machines, containers)
- Ensure stability, security, and high performance for systems that must run 24/7
In short, Linux provides a secure and reliable environment to efficiently and continuously run applications, data pipelines, and cloud services.
Why Linux For Data Engineers
Most of the core daily operations of a Data Engineer (DE) are carried out on Linux as most of the Data Systems run on it.
These might include operations like
1. Running Data Pipeline
Data Pipelines such as ETL/ELT are usually handled on Linux servers, which include ingesting data from APIs, processing large files, transforming data using Python or Spark and also loading data into data warehouses.
2. Automation and Scheduling
With Linux tools such as cron, you can schedule jobs and use bash scripts to automate tasks e.g weekly logs cleaning, archive data periodically and run scripts on schedules that have been set.
3. Handling Big Data
To handle large data, you need to have frameworks that run only on Linux, such as Hadoop for distributed storage and processing, Spark for fast processing of large data, Kafka for streaming the data and Airflow for workflow orchestration which is the process of organizing, scheduling, and managing multiple tasks so they run in the correct order and at the right time and with complete reliability from start to finish.
4. Working with Cloud Infrastructure
Most of the cloud infrastructures that run on major cloud platforms such as Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure offer Linux run infrastructure such as
- Virtual Machines VMs - Ubuntu, Red Hat, Debian -Containers & Orchestration – Docker, Kubernetes
- Big Data Services – Hadoop, Spark, Kafka clusters
- Databases – MySQL, PostgreSQL, MongoDB, Cassandra
- Data Warehouses – BigQuery engines, Redshift nodes (Linux-based)
5. File and Data Management
With Linux you can effectively and efficiently handle large files and perform tasks such as moving massive datasets, compressing files, searching logs and streaming data. All of these tasks are done by executing commands such as ls, cd, cp, grep, mv e.t.c
Running Linux Terminal and Its Commands
Running a Linux terminal means using text commands to control a Linux system, either locally or on a server.
1. On a Linux machine (Ubuntu, etc.)
Press Ctrl + Alt + T
Or search “Terminal” in applications
2. On Windows (Most common)
Option A: Windows Subsystem for Linux (WSL)
- Install WSL
- Open Ubuntu from the Start Menu
- This gives you a real Linux terminal inside Windows
Option B: Git Bash
- Install Git
- Open Git Bash
- Linux-like commands (not full Linux, but useful)
3.On macOS
- Open Terminal (Spotlight → Terminal)
- macOS is Unix-based, very similar to Linux
4. On a Remote Server (Cloud/Linux server)
Use SSH:
ssh username@server_ip
This opens a Linux terminal on a remote machine.
Basic Linux Commands
Accessing Server
Access remote server, you will need the server username, the server ip_address and the password for the server
ssh username@server_ip
Update and upgrade server if and when necessary
sudo apt update && sudo apt upgrade
Check the version of the ubuntu server you are using*
lsb_release -a
To understand the specifications of your VM understand the space usage and remaining storage
df -h
To see the list of all files in the server
ls
Red - Zipped Files
Blue - Folders
White - Files
To see the list of all files hidden and unhidden in the server
ls -a
Print your current directory
pwd
To Add another user in your server
sudo adduser 'username'
Changing from the Super User 'Root' to the regular user n the server and changing directory
su 'username'
cd
Creating Directories and Files and navigating between them
mkdir - Create a directory
touch - Create an empty file
cd 'mkdir' - To access or open your directory
cd .. to move one step back from your current location
cd + space - To go back to the end of the path
cp - copy files
mv - To move/Rename files
rm - To delete a file
rm -r - To delete a folder
Copying file from the local machine to the server
cp 'file_name' user_name@ip:path_to_the_serve_loaction_of_choice
Copying file from the server to the local machine
scp username@remote_host:/remote/path/to/file /local/path/to/destination
Copying folder from the local machine to the server
scp -r /local/path/to/folder ibrahim@157.245.209.236:/home/ibrahim/
scp -r MyMusicFolder ibrahim@157.245.209.236:/home/ibrahim/
Copying folder from the server to the local machine
scp -r ibrahim@157.245.209.236:/home/ibrahim/MyMusicFolder /local/destination/
scp -r ibrahim@157.245.209.236:/home/ibrahim/MyMusicFolder ~/Downloads/
You can also rename the folder during transfer:
scp -r ibrahim@157.245.209.236:/home/ibrahim/MyMusicFolder ~/Downloads/NewFolderName
For large folders, consider adding -C to compress during transfer (faster for slow connections):
scp -r -C MyMusicFolder ibrahim@157.245.209.236:/home/ibrahim/
Copying files from the internet to your server
wget 'link'
Writing and Reading line on an empty file in the server
echo 'The line you wish to write' >> file_name
cat 'file name' - Read a file
Editing Using Nano and Vi
Nano is a simple, beginner-friendly text editor you use directly in the Linux terminal. It comes in handy when editing files, writing scripts and viewing changes to files on the servers.
nano app.py - to open nano interface
Ctl + O - save
Ctrl + x -exit
If the file doesn't exist, Nano creates the file.
Vim is a modal text editor used in the Linux terminal and is widely used in servers, cloud machines, and containers.
vim app.py - to open vim interface
i --> insert mode
Type your text
Esc --> back to Linux
:w --> Save
:q --> Quit
:wq --> save and quit
:q! --> quit without saving



















Top comments (0)