DEV Community

Cover image for Introduction to Linux for Data Engineers
Lawrence Murithi
Lawrence Murithi

Posted on

Introduction to Linux for Data Engineers

Introduction

Linux is one of the most important tools for data engineers. Most data systems today run on Linux servers, including cloud platforms, databases, and big data tools like Hadoop and Spark. Understanding Linux basics is, therefore, a key skill for anyone starting a career in data engineering.

This article introduces Linux in a simple way. It explains why Linux is important for data engineers, shows basic Linux commands, and demonstrates how to create and edit files using Vi and Nano, which are common Linux text editors.

Why Linux for Data Engineers

Linux is important for data engineers for several reasons:

  • Most data pipelines run on Linux servers
  • Cloud platforms like AWS, Azure, and Google Cloud use Linux
  • Tools such as Hadoop, Spark, Airflow, and Kafka are built for Linux
  • Linux is stable, secure, and efficient for large data processing
  • Data engineers often work with Log files, Configuration files and Scripts written in Python, SQL, or Bash. Linux makes it easy to manage these files directly from the terminal.

Basic Linux Commands For Beginners

Linux commands are instructions typed in the terminal to tell the operating system what to do, such as creating files, moving between folders, or running programs. They allow users to interact directly with the system in a fast and efficient way. Linux commands help manage files, automate tasks and work effectively on servers, which is essential in data engineering and software development.
Below are some of the beginner linux commands.

  • ssh root@IP - connects to the server connect to server
  • pwd - Shows current directory
    current directory

  • ls - shows all files and folders in the current directory.
    list of files

  • cd - Changes directory and cd .. moves back one level
    Changes directory

  • mkdir- Creates a new directory
    Creates a new folder

  • touch - Creates an empty file
    Creates a new file

  • cp - Copies files and cp -r copies folder
    copy files

  • mv - Moves or renames files
    rename file/folder

  • rm - Deletes a file and rm -r deletes a folder
    delete file/folder
    -cat - display file content
    display file content

Linux Vi and Nano Text Editors
Linux editors are programs used to create, open, and edit text files directly from the terminal. They are important because many configuration files, scripts, and logs in Linux are text-based. Data engineers and developers often use Linux editors when working on servers where graphical tools are not available.
Some of the common Linux text editors are Vi and Nano.
1. Nano Editor
Nano is a simple and beginner-friendly editor.
To open or create a file with Nano:

nano filename.txt
Enter fullscreen mode Exit fullscreen mode

opening nano
The command opens the window below.
nano editor

Other nano commands

Command What it does

Ctrl + O Saves the file
Ctrl + X Exits Nano
Ctrl + G Shows help
Ctrl + W Searches for text
Ctrl + K Cuts (removes) a line
Ctrl + U Pastes a cut line
Ctrl + A Moves cursor to start of line
Ctrl + E Moves cursor to end of line
Ctrl + C Shows current line and column
Ctrl + _ Go to a specific line number

2. Vi/Vim Editor
Vi is a powerful editor and widely used in professional environments.
Vi has 3 main modes:

  • Normal mode - navigation and commands
  • Insert mode - typing text
  • Visual mode - selecting text To open a file in Vi editor:
vi filename.txt
Enter fullscreen mode Exit fullscreen mode

open vi editor
The below window opens when the command is prompted.
vi editor

Other vi commands

Entering Insert Mode

i - insert before cursor
a - append after cursor
o - open new line below
I - insert at beginning of line
A - append at end of line

Saving and Exiting/quiting

:w - save (write)
:q - quit
:wq or ZZ - save and quit
:q! - quit without saving
:w filename - save as new file

Navigation Commands

h - Move left
l - Move right
j - Move down
k - Move up
gg - Go to start of file
G - Go to end of file
0 - Start of line
$ - End of line

Editing Commands

x - delete character
dd - delete line
yy - copy line
p - paste below
P - paste above
u - undo
Ctrl+r - redo

Conclusion
Linux is a core skill for data engineers because it is used in servers, cloud platforms, and data tools. Basic Linux commands help you move around the system and manage files.
Learning Linux early makes it easier to work with data pipelines, scripts, and production systems.

Top comments (0)