<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tony Kamande</title>
    <description>The latest articles on DEV Community by Tony Kamande (@tony-kamande).</description>
    <link>https://dev.to/tony-kamande</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3822906%2Fdcabc09c-9852-4ae3-9c4f-b32854d38b39.jpg</url>
      <title>DEV Community: Tony Kamande</title>
      <link>https://dev.to/tony-kamande</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tony-kamande"/>
    <language>en</language>
    <item>
      <title>Linux Fundamentals for Data Engineering</title>
      <dc:creator>Tony Kamande</dc:creator>
      <pubDate>Tue, 09 Jun 2026 13:48:45 +0000</pubDate>
      <link>https://dev.to/tony-kamande/linux-fundamentals-for-data-engineering-3jlj</link>
      <guid>https://dev.to/tony-kamande/linux-fundamentals-for-data-engineering-3jlj</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Data engineering is the backbone of modern data-driven organizations. Every day, businesses generate massive amounts of data that must be collected, stored, processed, and analyzed. Behind these processes are data engineers who build and maintain the systems that make data available for analytics and decision-making.&lt;/p&gt;

&lt;p&gt;One of the most important skills for a data engineer is proficiency in Linux. Most production servers, cloud environments, databases, and big data platforms run on Linux. Whether managing databases, deploying applications, automating workflows, or troubleshooting infrastructure, Linux knowledge is essential.&lt;/p&gt;

&lt;p&gt;This article explores the fundamental Linux concepts every data engineer should understand, supported by practical examples from a hands-on Linux and PostgreSQL administration project.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Linux Matters in Data Engineering
&lt;/h2&gt;

&lt;p&gt;Linux is the preferred operating system for data engineering because it is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open source&lt;/li&gt;
&lt;li&gt;Stable and reliable&lt;/li&gt;
&lt;li&gt;Highly scalable&lt;/li&gt;
&lt;li&gt;Secure&lt;/li&gt;
&lt;li&gt;Efficient in resource utilization&lt;/li&gt;
&lt;li&gt;Widely supported across cloud platforms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many popular data engineering technologies run on Linux, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PostgreSQL&lt;/li&gt;
&lt;li&gt;MySQL&lt;/li&gt;
&lt;li&gt;Apache Airflow&lt;/li&gt;
&lt;li&gt;Apache Spark&lt;/li&gt;
&lt;li&gt;Hadoop&lt;/li&gt;
&lt;li&gt;Kafka&lt;/li&gt;
&lt;li&gt;Docker&lt;/li&gt;
&lt;li&gt;Kubernetes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a result, understanding Linux fundamentals allows data engineers to work effectively across different environments.&lt;/p&gt;




&lt;h2&gt;
  
  
  Connecting to Remote Servers with SSH
&lt;/h2&gt;

&lt;p&gt;One of the first tasks data engineers perform is accessing remote servers.&lt;/p&gt;

&lt;p&gt;SSH (Secure Shell) provides a secure way to connect to remote Linux systems.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh root@159.65.222.96
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;SSH provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Secure encrypted communication&lt;/li&gt;
&lt;li&gt;Remote administration capabilities&lt;/li&gt;
&lt;li&gt;Authentication mechanisms&lt;/li&gt;
&lt;li&gt;Secure file transfers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;During my assignment, SSH was used to connect to a remote Linux server where PostgreSQL administration tasks were performed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Linux User Management
&lt;/h2&gt;

&lt;p&gt;User management is critical for maintaining security and controlling access to resources.&lt;/p&gt;

&lt;p&gt;Instead of allowing everyone to use the root account, Linux administrators create separate user accounts with specific permissions.&lt;/p&gt;

&lt;p&gt;Creating a user:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;adduser tonym
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verifying the user:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;id &lt;/span&gt;tonym
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Useful user management commands include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;whoami
id
groups
&lt;/span&gt;passwd
useradd
adduser
usermod
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These commands help administrators manage user identities and access rights.&lt;/p&gt;




&lt;h2&gt;
  
  
  Understanding the Linux File System
&lt;/h2&gt;

&lt;p&gt;Linux organizes files using a hierarchical directory structure.&lt;/p&gt;

&lt;p&gt;Some important directories include:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Directory&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/home&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;User home directories&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/etc&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Configuration files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/var&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Logs and application data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/tmp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Temporary files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/usr&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Installed applications&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/bin&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Essential system binaries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/root&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Root user's home directory&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Understanding the Linux file system helps data engineers locate configuration files, logs, datasets, and scripts.&lt;/p&gt;




&lt;h2&gt;
  
  
  Essential Linux Navigation Commands
&lt;/h2&gt;

&lt;p&gt;Navigation is one of the first Linux skills every engineer learns.&lt;/p&gt;

&lt;p&gt;Display current directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;List files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;ls&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Detailed listing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-la&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Change directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; /home/tonym
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These commands allow users to move efficiently through the filesystem.&lt;/p&gt;




&lt;h2&gt;
  
  
  File and Directory Operations
&lt;/h2&gt;

&lt;p&gt;Data engineers frequently work with files and datasets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create a File
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;touch &lt;/span&gt;dataset.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Create a Directory
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;datasets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Copy Files
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp &lt;/span&gt;source.csv backup.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Move Files
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mv &lt;/span&gt;old.csv archive.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Remove Files
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;rm &lt;/span&gt;unwanted.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These commands are useful when organizing scripts, logs, and datasets.&lt;/p&gt;




&lt;h2&gt;
  
  
  Viewing and Searching Files
&lt;/h2&gt;

&lt;p&gt;Inspecting files is a common task when troubleshooting data pipelines.&lt;/p&gt;

&lt;p&gt;Display file contents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;file.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;View beginning of file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;head &lt;/span&gt;file.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;View end of file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;tail &lt;/span&gt;file.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Search text:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"ERROR"&lt;/span&gt; logfile.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Find files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;find &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="s2"&gt;"*.csv"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These tools make it easy to locate information within large systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Linux Permissions and Security
&lt;/h2&gt;

&lt;p&gt;Linux uses a permission-based security model.&lt;/p&gt;

&lt;p&gt;View permissions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-rw-r--r-- 1 user user 1024 file.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Permission management commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod
chown
chgrp&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod &lt;/span&gt;755 script.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For data engineers, managing permissions is important for protecting datasets, scripts, and database resources.&lt;/p&gt;




&lt;h2&gt;
  
  
  Monitoring System Resources
&lt;/h2&gt;

&lt;p&gt;Data pipelines can consume significant system resources.&lt;/p&gt;

&lt;p&gt;Linux provides tools for monitoring system performance.&lt;/p&gt;

&lt;p&gt;Check disk usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;df&lt;/span&gt; &lt;span class="nt"&gt;-h&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check memory usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;free &lt;span class="nt"&gt;-m&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;View running processes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ps aux
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real-time monitoring:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;top
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Monitoring helps identify performance bottlenecks and resource constraints.&lt;/p&gt;




&lt;h2&gt;
  
  
  PostgreSQL Administration on Linux
&lt;/h2&gt;

&lt;p&gt;Databases are central to data engineering.&lt;/p&gt;

&lt;p&gt;As part of a practical assignment, PostgreSQL was configured and managed on a Linux server.&lt;/p&gt;

&lt;p&gt;Verify PostgreSQL installation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;psql &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check PostgreSQL service status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl status postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start PostgreSQL service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl start postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures that the database server is operational and available for connections.&lt;/p&gt;




&lt;h2&gt;
  
  
  Creating a Database
&lt;/h2&gt;

&lt;p&gt;A database named after the Linux username was created.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;tonym&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Connecting to the database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="n"&gt;tonym&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This follows common administrative practices where databases are associated with specific users or projects.&lt;/p&gt;




&lt;h2&gt;
  
  
  Using Schemas for Organization
&lt;/h2&gt;

&lt;p&gt;Schemas provide logical organization inside a database.&lt;/p&gt;

&lt;p&gt;A staging schema was created:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="n"&gt;staging&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In data engineering, staging schemas are commonly used to store raw or intermediate data before transformation.&lt;/p&gt;

&lt;p&gt;Benefits include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better organization&lt;/li&gt;
&lt;li&gt;Easier maintenance&lt;/li&gt;
&lt;li&gt;Clear separation of data layers&lt;/li&gt;
&lt;li&gt;Improved governance&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Creating Tables and Loading Data
&lt;/h2&gt;

&lt;p&gt;A sample employee dataset was created inside the staging schema.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create Table
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;staging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;employees&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;employee_id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;full_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;hire_date&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Insert Sample Data
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;staging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hire_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'John Doe'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Engineering'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;75000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2023-01-15'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Mary Wanjiku'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Finance'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;68000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2022-06-20'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Peter Mwangi'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'IT'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;72000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2023-03-10'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Verify Data
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;staging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This process mirrors real-world ETL workflows where data is first loaded into staging areas before transformation and analysis.&lt;/p&gt;




&lt;h2&gt;
  
  
  Secure File Transfers with SCP
&lt;/h2&gt;

&lt;p&gt;Data engineers often move datasets and scripts between systems.&lt;/p&gt;

&lt;p&gt;SCP (Secure Copy Protocol) provides secure file transfers over SSH.&lt;/p&gt;

&lt;p&gt;Upload a file to a server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;scp sample.csv tonym@159.65.222.96:/home/tonym/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Download a file from a server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;scp tonym@159.65.222.96:/home/tonym/sample.csv &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;SCP is widely used for moving backups, configuration files, and datasets securely.&lt;/p&gt;




&lt;h2&gt;
  
  
  Managing Services with systemctl
&lt;/h2&gt;

&lt;p&gt;Linux systems use systemd to manage services.&lt;/p&gt;

&lt;p&gt;Check service status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl status postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl start postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Stop service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl stop postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl restart postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Service management is an essential skill for maintaining databases and other infrastructure components.&lt;/p&gt;




&lt;h2&gt;
  
  
  Best Practices for Linux in Data Engineering
&lt;/h2&gt;

&lt;p&gt;To work effectively in Linux environments, data engineers should:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Avoid using root unless necessary.&lt;/li&gt;
&lt;li&gt;Use SSH keys whenever possible.&lt;/li&gt;
&lt;li&gt;Regularly monitor system resources.&lt;/li&gt;
&lt;li&gt;Organize files and directories consistently.&lt;/li&gt;
&lt;li&gt;Automate repetitive tasks with scripts.&lt;/li&gt;
&lt;li&gt;Maintain proper permissions.&lt;/li&gt;
&lt;li&gt;Keep systems updated.&lt;/li&gt;
&lt;li&gt;Document processes thoroughly.&lt;/li&gt;
&lt;li&gt;Version-control important scripts.&lt;/li&gt;
&lt;li&gt;Back up critical data regularly.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Linux is one of the most important technologies in the data engineering ecosystem. From remote server management and user administration to database configuration and file transfers, Linux provides the tools necessary to build and maintain modern data platforms.&lt;/p&gt;

&lt;p&gt;Through practical experience configuring PostgreSQL, creating databases and schemas, loading sample data, managing users, transferring files with SCP, and documenting the process using GitHub, I gained a deeper understanding of how Linux supports real-world data engineering workflows.&lt;/p&gt;

&lt;p&gt;For aspiring data engineers, investing time in learning Linux is one of the most valuable career decisions they can make.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>dataengineering</category>
      <category>linux</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
