DEV Community

Shaban Ibrahim
Shaban Ibrahim

Posted on

LINUX FOR A ROOKIE DATA ENGINEERING STUDENT

Cover page

Introduction

As a student of Data Engineering, learning and understanding the fundamentals of Linux is a MUST. As a matter of fact, for one to smoothly learn and grow in the field of Data Engineering they have to be good at Linux.

What is Linux
Linux is an open-source operating system mostly used in servers, cloud platforms and data systems to:

  • Run applications and services.
  • Process and manage large amounts of data
  • Host websites and backend systems
  • Automate tasks and workflows (using scripts and schedulers)
  • Support cloud infrastructure (virtual machines, containers)
  • Ensure stability, security, and high performance for systems that must run 24/7

In short, Linux provides a secure and reliable environment to efficiently and continuously run applications, data pipelines, and cloud services.

Why Linux For Data Engineers

Most of the core daily operations of a Data Engineer (DE) are carried out on Linux as most of the Data Systems run on it.
These might include operations like

1. Running Data Pipeline
Data Pipelines such as ETL/ELT are usually handled on Linux servers, which include ingesting data from APIs, processing large files, transforming data using Python or Spark and also loading data into data warehouses.

2. Automation and Scheduling
With Linux tools such as cron, you can schedule jobs and use bash scripts to automate tasks e.g weekly logs cleaning, archive data periodically and run scripts on schedules that have been set.

3. Handling Big Data
To handle large data, you need to have frameworks that run only on Linux, such as Hadoop for distributed storage and processing, Spark for fast processing of large data, Kafka for streaming the data and Airflow for workflow orchestration which is the process of organizing, scheduling, and managing multiple tasks so they run in the correct order and at the right time and with complete reliability from start to finish.

4. Working with Cloud Infrastructure
Most of the cloud infrastructures that run on major cloud platforms such as Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure offer Linux run infrastructure such as

  • Virtual Machines VMs - Ubuntu, Red Hat, Debian -Containers & Orchestration – Docker, Kubernetes
  • Big Data Services – Hadoop, Spark, Kafka clusters
  • Databases – MySQL, PostgreSQL, MongoDB, Cassandra
  • Data Warehouses – BigQuery engines, Redshift nodes (Linux-based)

5. File and Data Management
With Linux you can effectively and efficiently handle large files and perform tasks such as moving massive datasets, compressing files, searching logs and streaming data. All of these tasks are done by executing commands such as ls, cd, cp, grep, mv e.t.c

Running Linux Terminal and Its Commands

Running a Linux terminal means using text commands to control a Linux system, either locally or on a server.

1. On a Linux machine (Ubuntu, etc.)
Press Ctrl + Alt + T
Or search “Terminal” in applications

2. On Windows (Most common)
Option A: Windows Subsystem for Linux (WSL)

  • Install WSL
  • Open Ubuntu from the Start Menu
  • This gives you a real Linux terminal inside Windows

Option B: Git Bash

  • Install Git
  • Open Git Bash
  • Linux-like commands (not full Linux, but useful)

3.On macOS

  • Open Terminal (Spotlight → Terminal)
  • macOS is Unix-based, very similar to Linux

4. On a Remote Server (Cloud/Linux server)
Use SSH:

ssh username@server_ip
Enter fullscreen mode Exit fullscreen mode

This opens a Linux terminal on a remote machine.

Basic Linux Commands

Accessing Server

Access remote server, you will need the server username, the server ip_address and the password for the server

ssh username@server_ip
Enter fullscreen mode Exit fullscreen mode

Image001

Update and upgrade server if and when necessary

sudo apt update && sudo apt upgrade
Enter fullscreen mode Exit fullscreen mode

Check the version of the ubuntu server you are using*

lsb_release -a
Enter fullscreen mode Exit fullscreen mode

Image 02

To understand the specifications of your VM understand the space usage and remaining storage

df -h
Enter fullscreen mode Exit fullscreen mode

Image 03

To see the list of all files in the server

ls
Enter fullscreen mode Exit fullscreen mode

Image 04

  • Red - Zipped Files

  • Blue - Folders

  • White - Files

To see the list of all files hidden and unhidden in the server

ls -a
Enter fullscreen mode Exit fullscreen mode

Image 04A

Print your current directory

pwd
Enter fullscreen mode Exit fullscreen mode

Image 05

To Add another user in your server

sudo adduser 'username'
Enter fullscreen mode Exit fullscreen mode

Image 06

Changing from the Super User 'Root' to the regular user n the server and changing directory

su 'username'
cd
Enter fullscreen mode Exit fullscreen mode

Image 07

Creating Directories and Files and navigating between them

mkdir - Create a directory
touch - Create an empty file
cd 'mkdir' - To access or open your directory 
cd .. to move one step back from your current location
cd + space - To go back to the end of the path
cp - copy files
mv - To move/Rename files
rm - To delete a file
rm -r - To delete a folder
Enter fullscreen mode Exit fullscreen mode

Image 08

Image 09

Copying file from the local machine to the server

cp 'file_name' user_name@ip:path_to_the_serve_loaction_of_choice
Enter fullscreen mode Exit fullscreen mode

Image 10

Image 11

Copying file from the server to the local machine

scp username@remote_host:/remote/path/to/file /local/path/to/destination
Enter fullscreen mode Exit fullscreen mode

Image 12

Image 13

Copying folder from the local machine to the server

scp -r /local/path/to/folder ibrahim@157.245.209.236:/home/ibrahim/

scp -r MyMusicFolder ibrahim@157.245.209.236:/home/ibrahim/

Enter fullscreen mode Exit fullscreen mode

Copying folder from the server to the local machine

scp -r ibrahim@157.245.209.236:/home/ibrahim/MyMusicFolder /local/destination/

scp -r ibrahim@157.245.209.236:/home/ibrahim/MyMusicFolder ~/Downloads/
Enter fullscreen mode Exit fullscreen mode

You can also rename the folder during transfer:

scp -r ibrahim@157.245.209.236:/home/ibrahim/MyMusicFolder ~/Downloads/NewFolderName
Enter fullscreen mode Exit fullscreen mode

For large folders, consider adding -C to compress during transfer (faster for slow connections):

scp -r -C MyMusicFolder ibrahim@157.245.209.236:/home/ibrahim/
Enter fullscreen mode Exit fullscreen mode

Copying files from the internet to your server

wget 'link'
Enter fullscreen mode Exit fullscreen mode

Image 1

Writing and Reading line on an empty file in the server

echo 'The line you wish to write' >> file_name

cat 'file name' - Read a file
Enter fullscreen mode Exit fullscreen mode

Image 2

Editing Using Nano and Vi

Nano is a simple, beginner-friendly text editor you use directly in the Linux terminal. It comes in handy when editing files, writing scripts and viewing changes to files on the servers.

nano app.py - to open nano interface
Ctl + O - save
Ctrl + x -exit

Enter fullscreen mode Exit fullscreen mode

If the file doesn't exist, Nano creates the file.

Image 4

Vim is a modal text editor used in the Linux terminal and is widely used in servers, cloud machines, and containers.

vim app.py - to open vim interface
i --> insert mode
Type your text
Esc --> back to Linux
:w --> Save
:q --> Quit
:wq --> save and quit
:q! --> quit without saving
Enter fullscreen mode Exit fullscreen mode

Image04

Top comments (0)