DEV Community

GeraldM
GeraldM

Posted on

Introduction to Linux for Data Engineers, Including Practical Use of Vi and Nano with Examples

Introduction to Linux for Data Engineers, Including Practical Use of Vi and Nano with Examples

Why Is Linux Important to Data Engineers?

Linux is a key pillar in data engineering because it creates a solid foundation for nearly all modern data platforms by providing stability, performance, and tooling needed to build and operate data pipelines. Most data engineering tools such as Cloud services, Databases and tools such as Kafka run natively on Linux. Linux has a powerful command-line, scripting capabilities and process management which enables building, automation, monitoring and troubleshooting data workflows. As a data engineer, mastering Linux allows you to work closer to the systems that store and move data, leading to faster debugging, better performance and more resilient data pipelines.

Intro to Linux

As a data engineer, you will be interacting with servers a lot. And where will these servers be located? On the cloud (remote location).

Let’s start here: What is a server? We can describe it as a computer or computing device dedicated to providing a specific service. Most of the time these devices have high computing power.

What is Cloud in computing? Cloud is a collection of many servers that provide computing resources over the internet, while cloud services are the platforms that give access to these resources. The companies that offer them are known as cloud service providers (CSPs). Examples of major cloud service providers include Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).

For example: you are working on a project on your computer and you realize that your computer does not have enough storage space and computing power to successfully do this project. Now let’s say this project will be completed within three months. To solve your challenge , you can choose to purchase a new computer with better storage and computing power. Would it be wise to do so? No. You go to your friend to ask them for help and explain to them your problem. Your friend tells you that he has a computer that has the resources you need but he is also using it. He then proposes to create an account for you on this computer which you can access and use via the internet as it is located at his house and use it to do your project when he will then delete your account. Now, you accessing your friend's computer over the internet and using it to do your project, that is cloud computing. Only that it comes at a fee

Basic Linux commands

What is a directory: Referencing Windows, a directory is the equivalent of a folder on windows.

The following are some basic linux commands to get you started:

pwd – Shows current directory

Displays the full path of the directory you’re currently in.

pwd

Example output:

mkdir – Create a directory

Creates a new folder.

cd – Change directory

Move between directories.

cd DataEng: we can now navigate to the directory we have created

cd .. : It takes back to the previous directory

ls – List files and directories

Shows files and folders in a directory.

ls -la:detailed list and includes hidden files

touch – Create an empty file
Creates a new empty file.

cp – Copy files or directories
Copies files or folders from one location to another.

Eg. Here we make copy our new empty file in the TestDirectory folder. With copying, the original file remains.

mv – Move or rename files

Moves files or renames them.

Example: Here we have created a new file testfile2.py and used the mv command to move it to the TestDirectory. As you can see, with mv the file does not remain in the previous directory.

rm – Remove files or directories

Deletes files or folders.
rm testfile.py: It deletes the file

Example: Here we have created a directory and then deleted it using rm -rcommand

Note: Deleted files do not go to a recycle bin. They are completely removed from the computer.

cat – View file contents

Displays the contents of a file.

Example: Here we are using cat to see the contents inside the file testfile.py

Creating a user on Linux
Who is a user on Linux? A user on Linux is an account that represents a person, service, or process allowed to log in and interact with the system. Each user has a unique ID (UID), owns files and processes, and is granted specific permissions that control what they can access or modify on the system.

Command to create a user: sudo adduser 'username'

Example: Here we are creating a user named TestUser

Checking if the user has been created we use the command id

To navigate to switch to the user profile you have created, you use the command su ‘username’

When you try to perform actions with the new user, you will get this notification

What is required now is to give user privileges to perform these actions by adding them to a user group with these privileges/rights. We do this by using the command sudo usermod \-aG

Creating and Editing files on Linux

There are two mainly used text editors on linux: nano and vim

1.Nano
Going back to our file we created named testfile.py, lets edit it now.

nano ‘filename’: using this command, we now access the file and we can write/edit its contents.

We are taken to this interface where we can write into the file or make changes to the existing contents

At the bottom of the editor, there are guides:

Ctrl \+ O then enter: saves the file

Ctrl \+ U: Paste a line

Ctrl \+ W: search for a word within the contents of the file

Ctrl \+ X: to exit the editor

2. Vim
To create a file using vim, we use the command vim ‘filename’

We are then taken to an interface where we edit/write into the file

To write into the file, we press i to go to insert mode where we can write/edit contents on the file

After writing into the file, to exit insert mode, we press the esc key.

To save the file, we type :w then press enter to save the file.

To exit vim, we type :q
To save and exit vim at the same time, we combine the two and use :wq

We can then use cat command to confirm the contents have been successfully written and saved on the file we have created.

Conclusion

Having learned the commands above, you are now familiar with the core Linux skills essential for data engineering. You can confidently navigate the filesystem using the terminal, create and manage files and directories, and read from and write to files (tasks that are fundamental when working with data pipelines, configuration files, and scripts). These skills provide a strong foundation for operating data engineering tools and platforms that run primarily on Linux environments.

Top comments (0)