<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hosea Mutwiri</title>
    <description>The latest articles on DEV Community by Hosea Mutwiri (@hoseamutwiri).</description>
    <link>https://dev.to/hoseamutwiri</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3819046%2Ffee39f3f-4524-4bec-8fb3-7e9dd0492bf7.png</url>
      <title>DEV Community: Hosea Mutwiri</title>
      <link>https://dev.to/hoseamutwiri</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hoseamutwiri"/>
    <language>en</language>
    <item>
      <title>How Linux Is Used in Real-World Data Engineering</title>
      <dc:creator>Hosea Mutwiri</dc:creator>
      <pubDate>Sun, 29 Mar 2026 23:44:48 +0000</pubDate>
      <link>https://dev.to/hoseamutwiri/how-linux-is-used-in-real-world-data-engineering-2f3m</link>
      <guid>https://dev.to/hoseamutwiri/how-linux-is-used-in-real-world-data-engineering-2f3m</guid>
      <description>&lt;p&gt;If you’re just getting started in data engineering, you’ll hear one word again and again: &lt;strong&gt;Linux&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In this beginner friendly guide, you’ll learn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What Linux is&lt;/li&gt;
&lt;li&gt;A quick history of Linux&lt;/li&gt;
&lt;li&gt;Why data engineers rely on it&lt;/li&gt;
&lt;li&gt;The Linux commands worth learning first&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What is Linux?
&lt;/h3&gt;

&lt;p&gt;Linux is an operating system, like Windows or macOS. An operating system manages your machine’s hardware (CPU, memory, storage, networking) and provides the foundation that applications run on. Without an operating system, your computer or server cannot do much at all.&lt;/p&gt;

&lt;p&gt;A complete Linux system typically includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bootloader - Loads the operating system when the machine starts&lt;/li&gt;
&lt;li&gt;Kernel - The core component that manages hardware and system resources&lt;/li&gt;
&lt;li&gt;Init system - Starts and supervises services&lt;/li&gt;
&lt;li&gt;Daemons - Background processes (for example, logging, scheduling, and networking)&lt;/li&gt;
&lt;li&gt;Graphical server - Powers a desktop interface&lt;/li&gt;
&lt;li&gt;Desktop environment and applications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Linux is free and open source, which means anyone can view, modify, and share the code. It comes in many versions called distributions (or “distros”), each tuned for different needs. If you want to explore what’s out there, &lt;a href="https://distrowatch.com/" rel="noopener noreferrer"&gt;DistroWatch&lt;/a&gt; is a good place to browse.&lt;/p&gt;

&lt;h3&gt;
  
  
  Brief history of Linux
&lt;/h3&gt;

&lt;p&gt;In 1991, Finnish student Linus Torvalds started building Linux as a personal project while studying at the University of Helsinki. At the time, he was frustrated by the limitations of MINIX, a small Unix like teaching system, and wanted a free alternative.&lt;/p&gt;

&lt;p&gt;Torvalds released the first version of the Linux kernel that year. In parallel, Richard Stallman and the Free Software Foundation (FSF) were developing the GNU project, which provided many of the tools and utilities people use in a Unix-like system.&lt;/p&gt;

&lt;p&gt;Together, the Linux kernel and GNU tools formed the complete system many people refer to as GNU/Linux. What began as a hobby project grew into a global collaboration and now powers much of the world’s servers, cloud platforms and embedded devices.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why do data engineers use Linux?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Data engineering&lt;/strong&gt; is about building and maintaining pipelines that move data from raw sources into clean, reliable datasets for analysts and data scientists and most of that work happens on servers, and Linux dominates the server world.&lt;/p&gt;

&lt;p&gt;Here’s why it matters so much:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automation&lt;/strong&gt;
Data pipelines depend on scripting, scheduling, and file processing. With command line and tools like &lt;code&gt;grep&lt;/code&gt;, &lt;code&gt;awk&lt;/code&gt;, &lt;code&gt;sed&lt;/code&gt;, and &lt;code&gt;find&lt;/code&gt;, plus job scheduling with &lt;strong&gt;&lt;code&gt;cron&lt;/code&gt;&lt;/strong&gt;. You can automate ETL runs, backups and routine maintenance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stability and performance&lt;/strong&gt;
Linux systems are known for running reliably for long periods. That matters when pipelines need to work 24/7 and process large volumes of data without interruptions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt;
Linux system can run continuously for years without reboots hence efficiency in data processing pipelines that need to run 24/7 without interruptions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compatibility and ecosystem&lt;/strong&gt;
Many core data tools are built for Linux or run best on it, including Apache Spark, Kafka, Airflow, Hadoop, Docker, Kubernetes, PostgreSQL, and most cloud services (AWS, GCP, Azure).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Basic Linux commands every data engineer should learn first
&lt;/h3&gt;

&lt;p&gt;Here are the most useful beginner commands, grouped by category.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Directory navigation&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt; &lt;span class="c"&gt;# Print the current working directory&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;path] &lt;span class="c"&gt;# Change directory&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; .. &lt;span class="c"&gt;# Go up one level&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; ~ &lt;span class="c"&gt;# Go to your home directory&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;File and directory management&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="c"&gt;# List files in the current directory&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="c"&gt;# Detailed list (permissions, owner, size, date)&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-la&lt;/span&gt; &lt;span class="c"&gt;# Show all files, including hidden ones&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-lh&lt;/span&gt; &lt;span class="c"&gt;# Human readable file sizes&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;name] &lt;span class="c"&gt;# Create a new directory&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;parent/child] &lt;span class="c"&gt;# Create nested directories&lt;/span&gt;
&lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;source&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;destination] &lt;span class="c"&gt;# Copy files or directories (use -r for recursive)&lt;/span&gt;
&lt;span class="nb"&gt;mv&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;source&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;destination] &lt;span class="c"&gt;# Move or rename files or directories&lt;/span&gt;
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;file] &lt;span class="c"&gt;# Remove a file (use -r for directories)&lt;/span&gt;
&lt;span class="nb"&gt;rmdir&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;directory] &lt;span class="c"&gt;# Remove an empty directory&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Viewing and searching content&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;file] &lt;span class="c"&gt;# Print the file contents (best for small files)&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;file] &lt;span class="c"&gt;# Print with line numbers&lt;/span&gt;
&lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;file] &lt;span class="c"&gt;# Show the first 10 lines (handy for checking CSV headers)&lt;/span&gt;
&lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;file] &lt;span class="c"&gt;# Show the last 10 lines (use -f to follow logs in real time)&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"string"&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;file] &lt;span class="c"&gt;# Search for a string in a file&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s2"&gt;"string"&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;file] &lt;span class="c"&gt;# Case-insensitive search&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s2"&gt;"string"&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;file] &lt;span class="c"&gt;# Show matching line numbers&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"string"&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;file] &lt;span class="c"&gt;# Count matching lines&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Other useful commands&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"text"&lt;/span&gt; &lt;span class="c"&gt;# Print text (useful for debugging scripts)&lt;/span&gt;
clear &lt;span class="c"&gt;# Clear the terminal screen&lt;/span&gt;
find &lt;span class="o"&gt;[&lt;/span&gt;path] &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="s2"&gt;"*.csv"&lt;/span&gt; &lt;span class="c"&gt;# Find files by name&lt;/span&gt;
&lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;file] &lt;span class="c"&gt;# Count lines, words, and characters&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;As someone who has just started learning data engineering (currently in week 3), it’s already clear how central Linux is to real world data work.&lt;br&gt;
This article was my attempt to explain the basics: what Linux is, its history, why it matters, and the essential commands every beginner should learn.&lt;br&gt;
I still have a long way to go, but practicing these commands has already boosted my confidence. &lt;br&gt;
Thanks for reading! I would like to hear your tips for beginners in the comments.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>linux</category>
      <category>data</category>
      <category>dataengineering</category>
    </item>
  </channel>
</rss>
