DEV Community

Cover image for Introduction to Linux for Data Engineers, Including Practical Use of Vi and Nano with Examples
Gathuru_M
Gathuru_M

Posted on

Introduction to Linux for Data Engineers, Including Practical Use of Vi and Nano with Examples

Why Every Data Engineer Needs to Make Friends with Linux

Diving into Data Engineering, you’re probably well into writing some Python scripts or SQL queries. But there's mentions of "The Server" or "Production," and suddenly everyone is talking about Linux.

If you’re wondering why you can’t just keep using Windows or macOS for everything, you’re not alone. Let’s break down why Linux.

If you look at regular people browsing the web, about 70-75% of them use Windows. It’s comfortable, right? But in the "under-the-hood" world, where we have servers that actually run the internet and process massive datasets, Linux owns about 90% to 100% of that space.

Knowing Linux isn't just a "nice to have"; it’s your secret weapon. If you can navigate a Linux terminal, you instantly become more hireable because you can actually manage the tools you build.

Here is why Linux is a big deal:

  1. It’s where your work actually lives: You might write code on your laptop, but your data pipelines (the stuff that moves and cleans data) will almost certainly live on a Linux server. If it’s in "Production," it’s on Linux.

  2. Security: In data engineering, you’re handling sensitive information — names, emails, credit card digits. Linux is built like a fortress. It handles permissions and privacy way better than the standard consumer OS.

  3. The Command Line makes you a pro: Typing commands might feel like you're in an old hacker movie, but it’s actually way faster than clicking through menus. Mastering the command line makes you faster, more confident, and—honestly—just a better engineer.

  4. It’s built for speed: Linux is "lightweight." It doesn’t waste energy on background apps you don't need. This means your data pipelines run faster and more efficiently.


Part 2: Getting Set Up (The WSL2 Guide)

On Linux/Mac: You’re already set! Just search for "Terminal" in your apps.

On Windows: The best way to do this is through WSL2 (Windows Subsystem for Linux).

How to install WSL2:

Enable the Feature: Go to your Windows Search Bar and Type 'Turn on Windows Features On or Off'.
Find "Windows Subsystem for Linux" in the list, check the box, and click OK.


Open PowerShell as Administrator and type:
wsl --install

Restart: You must restart your computer for the changes to take effect.

Finalize: After rebooting, a terminal will open. Follow the prompts to create a Username and Password. (Note: The password won't show characters as you type—this is normal!).


Part 3: The Essential Commands

We first need to know how to navigate this environment. Here are some basic commands you can try out.

  • pwd (Print Working Directory): Tells you exactly which folder you are currently in.
  • ls (List): Shows you what’s inside your current folder.
  • cd (Change Directory): Your "walking" command. Use cd .. to go back one folder.
  • mkdir folder_name (Make Directory): Creates a new folder.
  • touch filename: Creates a brand new, empty file.
  • rm filename (Remove): Deletes a file. Be careful: there is no "Undo."

Pro Productivity Tips

  • Tab Completion: Start typing a name and hit Tab. Linux will finish the word for you.
  • The Up Arrow: Hit the up arrow to see commands you typed previously.
  • clear: Wipes the screen clean.
  • Ctrl + C: The "Emergency Stop" button if a command is stuck.

Part 4: Real-World Practice

We’re going to create a workspace and download a real dataset (the famous Iris flower dataset).

# Create a folder and redirect to it
mkdir linux_practice
cd linux_practice
Enter fullscreen mode Exit fullscreen mode

Create your own file

Before we download anything, let's create a small file to store settings or write small notes.
touch my_notes.txt

If you run ls, you’ll see your new empty file sitting there.

Download the data using 'wget'

wget https://raw.githubusercontent.com/dataprofessor/data/master/iris.csv
Enter fullscreen mode Exit fullscreen mode

If you run ls again, you’ll see your new csv file added to the list.

Peek at the top 10 rows

head -n 10 iris.csv
Enter fullscreen mode Exit fullscreen mode


Part 5: Mastering the Editors (Nano vs. Vim)

On a remote server, you don't have VS Code. You have to edit files inside the terminal.

1. Nano:

Nano is like a very basic version of Notepad that lives in your terminal.

How to use it:

  1. Type nano iris.csv.
  2. Editing: Use your arrow keys to move the cursor and just start typing.
  3. The Bottom Menu: See those ^ symbols? That means the Ctrl key.
  4. To Save: Press Ctrl + O (Write Out), then hit Enter.
  5. To Exit: Press Ctrl + X.

2. Vim:

How to use it:

  1. Type vim iris.csv.
  2. Normal Mode: You start here. You cannot type text yet. This mode is for moving around and running commands.
  3. Insert Mode: To start typing, press the letter i. You’ll see -- INSERT -- at the bottom. Now you can edit the file.
  4. Saving your work: * First, hit the Esc key to leave "Insert Mode" and go back to "Normal Mode."
  5. Type :wq and hit Enter. (w means save/write, q means quit.
  6. The Emergency Exit: If you messed up and want to leave without saving, hit Esc, then type :q! and hit Enter.

Conclusion

You navigated a server, pulled data, and edited files using the terminal.👏

You can also use a "cheat sheet" to guide you for the first few weeks. Happy coding!

Top comments (0)