Linux stands out as a usefool tool in data engineering because of it's unique features: the Command Line interface CLI, Compatibility with most Data Tools, Security and Scalability as well as cost effectiveness due to being an open source platform. These attributes make the work of an individual or organisation in Data Engineering easier.
As a beginner here are some of the things to look out for as you start your journer in Data Engineering.
Using the CLI
The Command Line Interface is a tool used to interact with programs using commands, more like shortcuts to get things done faster.
It involves typing reserved words(Commands) in an interface that does not allow use of other input devices such as a mouse. While most beginners find this unconventional, mastery of the CLI will make you realise it is one of the easiest and most convenient tools moreso in Data Engineering.
The CLI can be used to:
- Manage files and folders better known as directories
- Manage processes and running applications
- Configure and manage your network
- Check system information
- Process,compress and archive data
- Create scripts and many more
Here is what you need to get started:
Navigating the CLI
While the CLI usually seems intimidating to a new user, familiarity and ease builds up by knowing the right tips and tricks.
Get used to using the keyboard only.
To run a command type the command and hit enter(the cli will respond by running the command or give you an error
To clear your screen - use a command clear or ctrl + l
To use a recently used command - use the up arrow key
To interrupt a process before it completes - ctrl + c
To autocomplete use tab key
to Copy or paste text - ctrl + shift + c/v
To open a new tab - ctrl + shift + T
- Basic commands that you need to know for Data engineering basics
Commands in file management
mkdir <directory name> - Create a directory
pwd - print working directory
ls -List directory contents
cd - cahnge directory
rm - remove an empty directory
cp - copy files or directories
scp - securely copy files to a server
mv - move or rename files
touch - create an empty file
cat - concatenate and display contents in a file
File management is important to allow or deny different users to make changes to your files.
Think of 3 people working on a document where Person A is allowed to view, edit and process the document, Person B can only do two of the activities and person C can only do one.
this protects the document from unintentional distortion or unwanted changes.
A rule of thumb is the principle of least privilege where a user is given only the level of access they need.
File permissions
ls -l - view permissions
chmod - modify permissions for users
wget - download files from the web
curl - transfer data from or to a server
ssh - connect to a remote server securely
scp - securely copy files to a server
Let us look at an example of how to work with terminal as a data engineer
Create a directory, navigate to the directory and create a new file. -file.txt

Add some text to the file using vi or nano (inbuilt editors)
- Check file permissions
We can also download other tools using the CLI. An example is Docker - a containerization tool used by engineers to build, ship and run applications in lightweight packages called containers.
On a linux based terminal:
- Download the script
curl -fsSL https://get.docker.com -o get-docker.sh Check if Docker is successfully installed
docker --version or docker version for extra details




Top comments (0)