Getting into data engineering as a beginner you quickly realize the importance of grasping Linux. Most systems run on Linux therefore it is compulsory to learn to navigate your way around the terminal which makes your work cleaner and faster.
Importance of Linux to Data Engineers.
As a data engineer, you will work with:
Servers and cloud machines - AWS, GCP, Azure
Databases - Postgres, MySQl
Data pipelines - ETL / ELT Tools
Big data tools - Spark, Kafka, Airflow
Almost all of these run on Linux servers.
Linux helps you:
Work directly on production servers
Automate tasks using scripts
Debug issues quickly
Handle large files efficiently
Understand how data flows at system level
Understanding commands in the Linux terminal.
The terminal is just a way to talk to your computer using commands instead of clicking buttons.
Networking and data transfer.
When getting into cloud servers.
ssh
To download information from the internet
wgetNavigation and file management.
List files
ls
This shows the current directory.
pwd
To create a folder use
mkdir 'folder_name'
Move between folders
cd 'folder_name'
To go back
cd ..
Go home:
cd ~
To create files
touch 'file_name'
To move files without leaving a copy behind
mv 'destination_location'
To move a file while leaving a copy behind
cp 'source_file' 'destination_file'
Permanently removing a file
rm
Permanently removing a folder
rm -r
- File viewing and editing
Creating and editing a file using the vi editor, nano can also be used.
vi'file_name'
Viewing file content
cat 'file_name'
- Permissions and user management. File ownership is an important component of Unix that provides a secure method for storing files.
Owner permissions − The owner's permissions determine what actions the owner of the file can perform on the file.
Group permissions − The group's permissions determine what actions a user, who is a member of the group that a file belongs to, can perform on the file.
Other permissions − The permissions for others indicate what action all other users can perform on the file.
To check permissions for files
ls -l
The basic building blocks of permissions are the read, write and execute permissions
Changing the owner of a file
chown 'user_name plus file_name'
Changing the group of a file
chgrp 'name_group or group_ID plus the file_name'
At first its confusing when trying to grasp all these at a go but eventually they sink in, hang in there!
Top comments (0)