Introduction
When most people hear the term Linux, the imagery that comes to mind quite often is a room full of tech savy geeks hunched over their keyboards typing away in nerd-anese. But what if i told you that Linux is way more than just a playground for programmers?? In the vast data cosmos, linux is largely the backbone that supports data driven decisions powering hundreds and thousands of businesses today. Let's dive into breaking down this operating system into bite sized chunks for any aspiring/beginning data engineers, shall we??
What is the importance of linux for data engineers??
Linux can be considered the soft white underbelly for data engineers, offering both incredible perfomance and flexibility while at the same time having a steep learning curve as it is command line dependent forcing you to get handsy with commands and functions as opposed to the usual operating system that dumbens it down making it more user/beginner friendly by using more direct icons and navigation techniques. So what exactly are the perks of using linux for a data engineer?
1.Perfomance: For it to be compatible to data engineers, it is essential that is has the capability to handle large volumes of data and in record time.
2.Compatibility: a large number of data engineering tools and frameworks such as Apache hadoop, Spark,fink and loads of other varieties run natively on linux making it borderline unbeatable.
3.Scalability: data is an ever growing entity and thus demands that its environment is just as flexible and capable of adapting to increased workloads.
4.Open Source: linux allows for engineers to use and customize the system to their needs.
5.Community Support: When a technology has many users, you will find abundant learning materials, discussion and help forums readily available.
Basic linux commands form the foundation of a data engineer's career and understanding them is essential for working with data systems. some of these commands may include:
- pwd - print working directory:used to show the current working directory.
- ls -list: this shows files and directories in the current directory.
- cd - change directory: used to navigate between directories.
- mkdir - make directory: this is used to c reate a new directory.
- rm - Remove: this can be used to delete files or directories.-
- touch : this creates an empty file in the current directory.
- cat : this disolays the contents within a file.
This list above includes several commands that are frequently redundant in your day to day operations. Proficiency builds over time and consistent exposure to the systems will enhance your ability to work efficiently.
Text Editors
Linux make further use of text editors but for today's article, we shall briefly dive into 2 : nano and vim.
Nano
This is a straightforward, user friendly command line tet editor designed for ease of use. It displays commands at the bottom of the screen, making it more accessible for beginners and is commonly used for quick file edits.
The following commands are various use cases for the nano text editor:
Creating a file using nano:
To create a file using nano the command "nano" followed by the file name and its extension is used. e.g :
nano testfile.txt
NB: .txt is the file extension and is relative to the file being created/used.
NOTE: Most of the commands in Nano are executed using key combinations, typically involving Control (CTRL) button or the alternate (ALT) key, rather than single-key commands. Some of the possible key combinations may include but are not limited to:
- CTRL + Y - to move down one page.
- CTRL + V - to move up one page.
- CTRL + O - to save a file i.e write out.
- CTRL + X -to exit nano.
- CTRL + R -to insert/read another file into the current one.
- CTRL + K - to cut marked text.
- CTRL + U - to paste previously cut/copied text.
Vi
This is a powerful, modal text editor that comes pre installed on virtually all linux systems. It operates in different modes and is known fo rits efficiency once mastered. It uses keyboard commands rather than menus for all operations.
Creating a file using vi:
The syntax for file creation can be carried over from nano. i.e
vi testfile.py
TO NOTE:
A)Vi is case-sensitive, so a lowercase letter and an uppercase letter may have different meanings within the editor.
B)To open an existing file in Vi, the same command format is used. It is important to ensure that the filename matches exactly the one used during creation otherwise, Vi will create a new file. Additionally, filenames should not contain spaces, as this may result in unintended file creation.Some of the commands for vi include but are not limited to:
- i - Insert before cursor
- I - Insert at beginning of line
- a - Append after cursor
- A- Append at end of line
- o - Open new line below
- O - Open new line above
- s - Substitute character (delete char and enter insert mode)
- S - Substitute entire line
- I - to enter insert mode and enable typing in the created file.
- ESC - to return to command mode.
- :WQ - to save and exit the file in vi back to command mode.
Through this article, we have explored why linux is an indispensable tool for data engineers and how the basic commands form the foundarion of professional data work, while the command line may seem daunting at first, i am reminded that every expert was once a beginner and the journey of a thousand miles only sets pace from the first step! Until next time, keep your data clean and your terminal keen, peace.

Top comments (0)