Overview
Data Engineers (DEs) are involved in building and maintaining systems that collect, store, and prepare data for data scientists and analysts to use. To be able to achieve this, they employ various techniques and tools, and one of these tools is Linux.
Linux is an operating system (OS) which is open-sourced and is based on the Unix system. It contains several distributions such as Ubuntu, Fedora, Debian, and many others, all of which use the command line interface. Linux is especially known for its versatility and security, characteristics that make the OS more desirable to work with for technical professionals such as Data Engineers.
DEs prefer to use Linux due to the following:
- Most servers and Virtual Machines (VMs) run Linux.
- Easier automation and scripting
- High compatibility with Containerization and orchestration tools.
- Resource control
- Troubleshooting and monitoring
- High compatibility with big-data tooling
Basic Linux Commands for Data Engineers
For a DE to be able to use the command-line interface in the Linux OS seamlessly, it is essential to understand some of the basic commands in Linux.
Some of the commands that are deemed basic for a DE are:
ls : To list files and directories.
pwd: To print the current working directory.
cd : To change to a different directory.
cp : To copy a file to a different directory while still maintaining the original copy.
mv : To move a file to a different directory and delete the original copy.
rm : To delete a file.
ssh: To connect to a remote server.
tar, zip: For Creating archives and compressions
chmod: For changing access permissions of files and directories.
Practical Usage of Vi and Nano
Introduction
Vi and Nano are the most common text editors (tools that are used in creating, viewing, and manipulating texts in files) found in Linux.
DEs use text editors for writing and editing code, editing configuration files, creating scripts and automations, and writing documentation.
Vi
It is a modal, keyboard-driven editor found on almost all Linux systems. Modal implies that the editor operates in different modes, where the same key does different things depending on the current mode. One mode is used for typing (insert mode) while the other is used for commands to navigate, delete, copy, and search (Normal mode). To change what your keystrokes can do, one is obligated to switch modes.
Vi is found in many servers and it is adopted by DEs to edit files on remote servers over SSH.
To open a Vi text editor, use the Linux command:
$vi <file_name>
Some Key commands that are used in the Vi text editor:
:w - To write (save) the contents of the buffer to the file without exiting the editor.
:q - Quit Vi.
:wq - Save and quit.
:q! - Force quit without saving.
i- To write on the document
Esc- To transition between modes.
Nano
Nano, on the other hand, is modeless, as it uses one mode for commands and typing. Keys are used to insert texts, while commands employ shortcuts like Ctrl and Alt. DEs prefer nano when carrying out minute text edits on documents.
To open the nano text editor, use the command:
$nano <file_name>
Once within the text editor, the following commands can be used to manipulate the file:
Ctr+O - save the file
Ctr+X - exit
Ctr+K - Cut the current line
Ctr+U - Paste
Ctr+W - search
Ctr+\ - replace
Ctr+G - help


Top comments (0)