<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Frederick M</title>
    <description>The latest articles on DEV Community by Frederick M (@fredrickm).</description>
    <link>https://dev.to/fredrickm</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2843989%2F59e219b3-6c05-4286-afde-b5afef20e3f6.jpg</url>
      <title>DEV Community: Frederick M</title>
      <link>https://dev.to/fredrickm</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/fredrickm"/>
    <language>en</language>
    <item>
      <title>Understanding Data Modeling in Power BI: Joins, Relationships, and Schemas Explained.</title>
      <dc:creator>Frederick M</dc:creator>
      <pubDate>Wed, 01 Apr 2026 13:20:00 +0000</pubDate>
      <link>https://dev.to/fredrickm/understanding-data-modeling-in-power-bi-joins-relationships-and-schemas-explained-112p</link>
      <guid>https://dev.to/fredrickm/understanding-data-modeling-in-power-bi-joins-relationships-and-schemas-explained-112p</guid>
      <description>&lt;p&gt;If you’ve ever felt confused about &lt;strong&gt;joins vs relationships&lt;/strong&gt;, or why your Power BI report is giving incorrect totals, this is where data modeling comes in.&lt;/p&gt;

&lt;p&gt;This guide breaks it down simply, with real-world examples and practical steps inside Power BI.&lt;/p&gt;




&lt;h1&gt;
  
  
  1. What is Data Modeling?
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Data modeling&lt;/strong&gt; is how you structure your data so Power BI can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understand relationships between tables&lt;/li&gt;
&lt;li&gt;Aggregate data correctly&lt;/li&gt;
&lt;li&gt;Perform fast and accurate calculations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Data modeling = organizing your data into a clean, logical system before analysis.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h1&gt;
  
  
  2. Joins in Power BI (Power Query)
&lt;/h1&gt;

&lt;p&gt;Joins happen &lt;strong&gt;before data is loaded&lt;/strong&gt;, inside &lt;strong&gt;Power Query&lt;/strong&gt;. They physically combine tables.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where to create joins:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;strong&gt;Home → Transform Data&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Open Power Query&lt;/li&gt;
&lt;li&gt;Select a table&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Merge Queries&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Choose another table and matching column(s)&lt;/li&gt;
&lt;li&gt;Select join type&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Types of Joins (with real-life examples)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. INNER JOIN
&lt;/h3&gt;

&lt;p&gt;Returns only matching records.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customers table&lt;/li&gt;
&lt;li&gt;Orders table&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Only customers who placed orders appear.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Customers      Orders
A              A
B              B
C              -

Result:
A, B
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  2. LEFT JOIN (Most Common)
&lt;/h3&gt;

&lt;p&gt;Returns all records from left + matches from right.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
All customers, even those without orders.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Result:
A (order)
B (order)
C (null)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  3. RIGHT JOIN
&lt;/h3&gt;

&lt;p&gt;Opposite of LEFT JOIN.&lt;/p&gt;

&lt;p&gt;Returns all records from the right table.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. FULL OUTER JOIN
&lt;/h3&gt;

&lt;p&gt;Returns everything from both tables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Result:
A (match)
B (match)
C (left only)
D (right only)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  5. LEFT ANTI JOIN
&lt;/h3&gt;

&lt;p&gt;Returns rows in left table with &lt;strong&gt;no match&lt;/strong&gt; in right.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
Customers who NEVER ordered.&lt;/p&gt;


&lt;h3&gt;
  
  
  6. RIGHT ANTI JOIN
&lt;/h3&gt;

&lt;p&gt;Returns rows in right table with no match in left.&lt;/p&gt;


&lt;h2&gt;
  
  
  When to use joins
&lt;/h2&gt;

&lt;p&gt;Use joins when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need to &lt;strong&gt;combine data permanently&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;You're shaping raw data before modeling&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;
  
  
  3. Relationships in Power BI
&lt;/h1&gt;

&lt;p&gt;Relationships are created &lt;strong&gt;after loading data&lt;/strong&gt;. They do NOT merge tables, just connect them.&lt;/p&gt;
&lt;h3&gt;
  
  
  Where to create relationships:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Go to &lt;strong&gt;Model View&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Drag a column from one table to another
OR&lt;/li&gt;
&lt;li&gt;Go to &lt;strong&gt;Manage Relationships → New&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Types of Relationships
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. One-to-Many (1:M), Most Common
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;One side = unique values (Dimension)&lt;/li&gt;
&lt;li&gt;Many side = repeated values (Fact)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customers (unique IDs)&lt;/li&gt;
&lt;li&gt;Orders (many orders per customer)&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  2. Many-to-Many (M:M)
&lt;/h3&gt;

&lt;p&gt;Both sides contain duplicates.&lt;/p&gt;

&lt;p&gt;Use carefully, can cause ambiguity.&lt;/p&gt;


&lt;h3&gt;
  
  
  3. One-to-One (1:1)
&lt;/h3&gt;

&lt;p&gt;Rare. Both tables have unique keys.&lt;/p&gt;


&lt;h2&gt;
  
  
  Cardinality (IMPORTANT)
&lt;/h2&gt;

&lt;p&gt;Defines how tables relate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 → Many&lt;/li&gt;
&lt;li&gt;Many → Many&lt;/li&gt;
&lt;li&gt;1 → 1&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Cross Filter Direction
&lt;/h2&gt;

&lt;p&gt;Controls how filters flow:&lt;/p&gt;
&lt;h3&gt;
  
  
  Single Direction
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Filters flow one way (recommended)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Both Direction
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Filters flow both ways (can cause confusion if misused)&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Active vs Inactive Relationships
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Active&lt;/strong&gt;: Used by default&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inactive&lt;/strong&gt;: Requires DAX (&lt;code&gt;USERELATIONSHIP&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Order Date (active)&lt;/li&gt;
&lt;li&gt;Ship Date (inactive)&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;
  
  
  4. Joins vs Relationships (Critical Difference)
&lt;/h1&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Joins&lt;/th&gt;
&lt;th&gt;Relationships&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Done in Power Query&lt;/td&gt;
&lt;td&gt;Done in Model View&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Combines tables&lt;/td&gt;
&lt;td&gt;Keeps tables separate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Static&lt;/td&gt;
&lt;td&gt;Dynamic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Increases table size&lt;/td&gt;
&lt;td&gt;More efficient&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Rule of thumb:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Use relationships whenever possible. Avoid unnecessary joins.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h1&gt;
  
  
  5. Fact vs Dimension Tables
&lt;/h1&gt;
&lt;h2&gt;
  
  
  Fact Table
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Contains measurable data&lt;/li&gt;
&lt;li&gt;Large&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sales Amount&lt;/li&gt;
&lt;li&gt;Quantity&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Dimension Table
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Contains descriptive data&lt;/li&gt;
&lt;li&gt;Smaller&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer Name&lt;/li&gt;
&lt;li&gt;Product Category&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;
  
  
  6. Data Modeling Schemas
&lt;/h1&gt;


&lt;h2&gt;
  
  
  1. Star Schema (BEST PRACTICE ⭐)
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;        Customers
            |
Products — Sales — Dates
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Fact table in center&lt;/li&gt;
&lt;li&gt;Dimensions around it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fast&lt;/li&gt;
&lt;li&gt;Simple&lt;/li&gt;
&lt;li&gt;Scalable&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  2. Snowflake Schema
&lt;/h2&gt;

&lt;p&gt;Dimensions are normalized into multiple tables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Product → Category → Department
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Less redundancy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More complex&lt;/li&gt;
&lt;li&gt;Slower&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  3. Flat Table (DLAT - Denormalized)
&lt;/h2&gt;

&lt;p&gt;Everything in one table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple to start&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Poor performance&lt;/li&gt;
&lt;li&gt;Hard to maintain&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  7. Role-Playing Dimensions
&lt;/h1&gt;

&lt;p&gt;A single dimension used multiple times.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
Date table used as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Order Date&lt;/li&gt;
&lt;li&gt;Ship Date&lt;/li&gt;
&lt;li&gt;Delivery Date&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Solution:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Duplicate the Date table&lt;/li&gt;
&lt;li&gt;Create separate relationships&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;
  
  
  8. Common Data Modeling Issues
&lt;/h1&gt;
&lt;h3&gt;
  
  
  1. Many-to-Many confusion
&lt;/h3&gt;

&lt;p&gt;Fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Introduce a bridge table&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  2. Circular relationships
&lt;/h3&gt;

&lt;p&gt;Fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove unnecessary relationships&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  3. Incorrect totals
&lt;/h3&gt;

&lt;p&gt;Fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check relationship direction &amp;amp; cardinality&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  4. Duplicate keys in dimension table
&lt;/h3&gt;

&lt;p&gt;Fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensure uniqueness&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;
  
  
  9. Step-by-Step Workflow (Practical)
&lt;/h1&gt;
&lt;h3&gt;
  
  
  Step 1: Load Data
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Get Data → Import tables&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  Step 2: Clean Data (Power Query)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Remove duplicates&lt;/li&gt;
&lt;li&gt;Fix data types&lt;/li&gt;
&lt;li&gt;Create joins only if necessary&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  Step 3: Build Relationships
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Go to Model View&lt;/li&gt;
&lt;li&gt;Connect tables using keys&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  Step 4: Validate Model
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cardinality&lt;/li&gt;
&lt;li&gt;Filter direction&lt;/li&gt;
&lt;li&gt;Active relationships&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  Step 5: Create Measures (DAX)
&lt;/h3&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;Total&lt;/span&gt; &lt;span class="n"&gt;Sales&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Sales&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Amount&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;If you remember nothing else, remember this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Use Star Schema&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prefer relationships over joins&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Keep dimension tables clean and unique&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Avoid many-to-many unless necessary&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once your model is clean, everything else (DAX, visuals, performance) becomes easier.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>data</category>
      <category>powerplatform</category>
      <category>bigdata</category>
    </item>
    <item>
      <title>How Linux is Used in Real-World Data Engineering</title>
      <dc:creator>Frederick M</dc:creator>
      <pubDate>Fri, 27 Mar 2026 18:47:32 +0000</pubDate>
      <link>https://dev.to/fredrickm/how-linux-is-used-in-real-world-data-engineering-47nh</link>
      <guid>https://dev.to/fredrickm/how-linux-is-used-in-real-world-data-engineering-47nh</guid>
      <description>&lt;p&gt;Linux is the backbone of modern data engineering. From running ETL pipelines on cloud servers to managing distributed systems like Hadoop and Spark, proficiency with the Linux command line is non‑negotiable. In this guide, we’ll walk through a realistic data‑engineering workflow on an Ubuntu server – the kind of tasks you’ll perform daily when managing data pipelines, securing sensitive files, and organising project assets.&lt;/p&gt;

&lt;p&gt;We’ll cover:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Secure login to a remote server&lt;/li&gt;
&lt;li&gt;Structuring a data project with version‑aware directories&lt;/li&gt;
&lt;li&gt;Creating and manipulating data files (CSV, logs, scripts)&lt;/li&gt;
&lt;li&gt;Copying, moving, renaming, and cleaning up files&lt;/li&gt;
&lt;li&gt;Setting correct permissions to protect sensitive data&lt;/li&gt;
&lt;li&gt;Navigating the file system and re‑using command history&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  1. Logging into a Linux Server
&lt;/h2&gt;

&lt;p&gt;In the real world, data engineers rarely work on their local laptop. Most tasks happen on remote servers (on‑premises or in the cloud). The first step is to securely connect to it using SSH, and put in the password when prompted.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh root@143.110.224.135
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After login it, should look like this&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce0pvzthlico9ieyyylb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce0pvzthlico9ieyyylb.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once logged in, it’s good practice to confirm you are using the correct account. Data pipelines often run under dedicated service accounts, so knowing your user context matters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;whoami&lt;/span&gt;      &lt;span class="c"&gt;# displays the current username&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, verify your current working directory, which should indicate where you will start creating folders and files using the command pwd&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;          &lt;span class="c"&gt;# prints the current working directory&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. Folder and File Creation
&lt;/h2&gt;

&lt;p&gt;A well‑organised directory structure is vital for any data project. Let’s create a main folder named after ourselves and inside it create subfolders for raw data, processed data, logs, and scripts. &lt;br&gt;
We will then confirm our folders were created using ls&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; ~/fredrickMDataEngineering   &lt;span class="c"&gt;# make a new folder&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; ~/fredrickMDataEngineering  &lt;span class="c"&gt;# enter the folder&lt;/span&gt;
&lt;span class="nb"&gt;mkdir &lt;/span&gt;raw_data processed_data logs scripts &lt;span class="c"&gt;# make multiple folders&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt;   &lt;span class="c"&gt;# check current working directory content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgy0ucjfd0y27q1zwwt6p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgy0ucjfd0y27q1zwwt6p.png" alt=" " width="714" height="151"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we need to create files that simulate real data engineering assets. Inside each folder, create the appropriate csv files using the "touch" command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Raw data&lt;/span&gt;
&lt;span class="nb"&gt;touch &lt;/span&gt;raw_data.csv

&lt;span class="c"&gt;# Processed data&lt;/span&gt;
&lt;span class="nb"&gt;touch &lt;/span&gt;cleaned_data.csv

&lt;span class="c"&gt;# Logs&lt;/span&gt;
&lt;span class="nb"&gt;touch &lt;/span&gt;logs.csv

&lt;span class="c"&gt;# Scripts&lt;/span&gt;
&lt;span class="nb"&gt;touch &lt;/span&gt;scripts.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. File Operations: Copy, Move, Rename, Delete
&lt;/h2&gt;

&lt;p&gt;Data engineering involves frequent file manipulation, backing up raw data, moving files between stages, versioning assets, and cleaning up obsolete files.&lt;/p&gt;

&lt;p&gt;Lets back by copying the raw CSV file as a precaution before processing using the cp command&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp &lt;/span&gt;raw_data.csv raw_data_backup.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lets move a file to simulate data flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mv &lt;/span&gt;cleaned_data.csv logs/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rename the processed file to indicate a version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mv &lt;/span&gt;raw_data/sample_data.csv raw_data/sample_data_v1.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deleting unnecessary files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;rm &lt;/span&gt;raw_data/raw_data_backup.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Managing Permissions for Security
In production, sensitive files (e.g., credentials, raw PII data) must have strict permissions. Here’s how we secure our project.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Lets make the main directory accessible only by its owner (no one else can read, write, or execute). We use the "-R" flag to make the command affect all files and folders in the curent working directory&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod &lt;/span&gt;700 &lt;span class="nt"&gt;-R&lt;/span&gt; ~/fredrickMDataEngineering  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We will also set permissions for sensitive files using this command&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod &lt;/span&gt;600 raw_data/raw_data.csv 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To confirm if our permissions are set, use this command in the current working directory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our ETL script needs execute permission to run, we will use this command to make the script executable&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x scripts/scripts.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  7. Navigation and Command History
&lt;/h1&gt;

&lt;p&gt;Data engineers constantly move between directories. Use relative and absolute paths to navigate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ~/fMwangiDataEngineering/scripts   &lt;span class="c"&gt;# go to scripts folder&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; ../logs                               &lt;span class="c"&gt;# move back to logs&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; ~                                    &lt;span class="c"&gt;# back to home&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hidden files (e.g., .env for environment variables) are common in data projects. View them with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your terminal history is a goldmine, it helps reproduce exact commands, and track actions within the terminal. We view it with this command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;history&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fweier25ev96pfo3nkvjc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fweier25ev96pfo3nkvjc.png" alt=" " width="714" height="778"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To re‑run a previous command (e.g., command number 2104), use&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;!&lt;/span&gt;2104
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F18b7dwlkh4wpqv9bcky9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F18b7dwlkh4wpqv9bcky9.png" alt=" " width="714" height="778"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  8. Why These Skills Matter
&lt;/h1&gt;

&lt;p&gt;What you just practised is a miniature version of real‑world data engineering:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Structured folders mirror how data lakes or data warehouses are organised.&lt;/li&gt;
&lt;li&gt;File operations (copy, move, rename) simulate the stages of an ETL pipeline, from ingestion to transformation to archiving.&lt;/li&gt;
&lt;li&gt;Permissions protect sensitive data and ensure only authorised users (or processes) can modify critical files. &lt;/li&gt;
&lt;li&gt;Scripts automate repetitive tasks, and command history allows you to audit or replay steps.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>cli</category>
      <category>dataengineering</category>
      <category>linux</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
