Why Linux?
Shifting from Windows to Linux for the first time can be daunting. Having no graphical interface to maneuver around where you can click on icons and folders, instead there's a black screen waiting for you to key in commands. So why is it essential in a data engineer's day-to-day?
1. Servers run Linux
Nearly all public cloud workloads and majority of servers powering data systems run on Linux.
2. Data engineering tools
Hadoop/Spark/Kafka were built for Unix-like systems. These core data engineering tools are designed and optimized to run on Linux. Development, testing, and production deployment naturally happen there.
3. Performance & Stability
Linux servers can run for years without reboots, crucial for long-running data pipelines and streaming jobs.
Another is that Linux is free & open-source, which is critical for scalable, cost-effective data infrastructure. These are just some of the reasons why Linux is crucial for a data engineer. Let's see examples of some basic Linux commands which we can correlate to when were using Windows.
Basic Linux Commands
To demonstrate this, we're going to connect to a remote server from Git Bash. We do this by using SSH (Secure Shell) which allows us to access and use the server's resources and run commands. The syntax to connect to the server is:
ssh user@ip
See below on connecting to a remote server provisioned on DigitalOcean:
After successfully connecting to the remote server, let's run some commands.
-
whoami: This prints out the current user
-
df -h: Displays disk usage of all mount points
-
pwd: Prints the current working directory
-
ls: Lists all files and folders in your current directory.
NB: files are highlighted in white while folders are highlighted in blue. Zipped files in orange / red.
-
cd: Changes directory to the folder you specify
From the above illustration, we changed directory from /root to /root/eveningClass. eveningClass is a folder within /root and we confirmed it by printing the current working directory (pwd) then by listing (ls) all files and folders within /root/eveningClass
-
cat: Allows a user to read / display the contents of a file
-
sudo adduser username: This creates a new user account with the specified username
From above, we've created a new user account 'nganga', we can verify the user has been added by displaying (cat) the contents of the file in the directory: /etc/passwd
-
mkdir: Creates a new directory / folder. You also need to specify the name of directory you want to create
-
touch: Creates a new file
-
echo: Can be used to write to a file
-
scp: Stands for Secure Copy. Which allows us to copy files from your local machine to a remote server and vice versa. Let's begin with copying from the local machine:
1. scp from local machine to remote host
From above illustration, we have a file SecondFile.txt on the local machine and we've copied it to the remote host in the directory: /root/eveningClass/new_dir as seen below:
2. scp from remote host to local machine
We can copy a file from the remote host as shown below:
-
cp: This copies a file to a specified destination directory
The file NewFile.txt has been copied from /root/eveningClass/new_dir/ to /root/eveningClass/
-
mv: This moves a file from directory to another
In above example, we've moved the file NewFile.txt from /root/eveningClass/ to /root/eveningClass/new_dir/
Creating and Editing Files with Nano and Vi
Vi and Nano are text editors used in Linux/Unix terminal environments. Nano is simple and intuitive (like Notepad), while Vi/Vim is more powerful but has a learning curve.
1. Nano Editor
- Opening/Creating Files
Type normally - just start typing
Move cursor - Arrow keys work as expected
Save file - Ctrl + O (Write Out), then press Enter
Exit - Ctrl + X
Search - Ctrl + W, type word, press Enter
2. Vi Editor
Vi has 3 main modes:
Normal mode (default) - for navigation and commands
Insert mode - for typing text
Visual mode - for selecting text
- Opening/Creating Files
Essential Vi Commands
From Normal mode to Insert mode:
i - insert before cursor
a - append after cursor
o - open new line below
I - insert at beginning of line
A - append at end of line
Saving and Quitting (Normal mode):
:w - save (write)
:q - quit
:wq or ZZ - save and quit
:q! - quit without saving
:w filename - save as new file
Navigation (Normal mode):
h - left
j - down
k - up
l - right
0 - beginning of line
$ - end of line
gg - top of file
G - bottom of file
:5 - go to line 5
Editing (Normal mode):
x - delete character
dd - delete line
yy - copy line
p - paste below
P - paste above
u - undo
Ctrl+r - redo
Conclusion
In this article we have covered the following:
Explained why Linux is important for data engineers
Demonstrated basic Linux commands
Shown practical usage of Vi and Nano (e.g., creating and editing files)

























Top comments (0)