<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Angellicah</title>
    <description>The latest articles on DEV Community by Angellicah (@angellicah_2ed8aa8f01f176).</description>
    <link>https://dev.to/angellicah_2ed8aa8f01f176</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3951197%2Fb9a782a5-035a-42ba-8998-316540fdcbb9.jpg</url>
      <title>DEV Community: Angellicah</title>
      <link>https://dev.to/angellicah_2ed8aa8f01f176</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/angellicah_2ed8aa8f01f176"/>
    <language>en</language>
    <item>
      <title>Stop Guessing, Start Modeling: Relationships, Schemas &amp; Joins in Power BI</title>
      <dc:creator>Angellicah</dc:creator>
      <pubDate>Tue, 30 Jun 2026 00:24:10 +0000</pubDate>
      <link>https://dev.to/angellicah_2ed8aa8f01f176/stop-guessing-start-modeling-relationships-schemas-joins-in-power-bi-3oep</link>
      <guid>https://dev.to/angellicah_2ed8aa8f01f176/stop-guessing-start-modeling-relationships-schemas-joins-in-power-bi-3oep</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;A database without relationships is just a spreadsheet with delusions of grandeur.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you've ever stared at a Power BI report showing wrong numbers...totals that don't add up, filters that filter nothing, there's a good chance your data model was broken. Not a bug. Just two tables that should've been talking to each other… and weren't.&lt;/p&gt;

&lt;p&gt;This is your practical guide to data modeling, schemas, relationships, and joins in Power BI,  what they are, how they connect, and how to stop getting burned by them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Data Modeling and How Does It Work?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Data modeling&lt;/strong&gt; is the &lt;em&gt;process of defining how your tables connect to each other inside Power BI's engine&lt;/em&gt; (called VertiPaq). Think of it like drawing a map between your tables, telling Power BI this column in Table A is the same thing as this column in Table B.&lt;/p&gt;

&lt;p&gt;When you load multiple tables into Power BI, it doesn't automatically know they're related. A Sales table and a Products table, sitting separately, can't filter each other. &lt;em&gt;Data modeling builds the bridges.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Power BI's model view lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Define relationships between tables&lt;/li&gt;
&lt;li&gt;Set cardinality and cross-filter direction&lt;/li&gt;
&lt;li&gt;Build star or snowflake schemas&lt;/li&gt;
&lt;li&gt;Create calculated columns and measures using DAX&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Under the hood, Power BI &lt;em&gt;compresses&lt;/em&gt; and &lt;em&gt;stores each column separately&lt;/em&gt; (&lt;strong&gt;columnar storage&lt;/strong&gt;). Relationships are resolved in-memory at query time, which is why a well-structured model is blazing fast, and a messy one will bring your report to its knees.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Key Concepts
| Concept                  | What It Means                                      |
|--------------------------|-----------------------------------------------------|
| Fact Table               | Stores measurable events (sales, transactions, logs)|
| Dimension Table          | Stores descriptive context (products, customers)    |
| Primary Key (PK)         | Unique identifier column in a dimension table       |
| Foreign Key (FK)         | Column in a fact table referencing a PK in a dim    |
| Relationship             | The defined link between a PK and FK across tables  |
| Cardinality              | Describes how many rows on each side match          |
| Cross-filter Direction   | Controls which way filters flow across relationship |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Understanding Schemas (The Blueprint of Your Model)
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;schema&lt;/strong&gt; is simply the &lt;em&gt;structure, the layout, of your data model&lt;/em&gt;. It describes which tables exist, what columns they have, and how they relate to each other. Think of it as the floor plan of your data house.&lt;/p&gt;

&lt;p&gt;Power BI works best with two classic schema types:&lt;/p&gt;

&lt;h3&gt;
  
  
  Star Schema
&lt;/h3&gt;

&lt;p&gt;The star schema is Power BI's best friend. It has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One central Fact Table (big, skinny — lots of rows, few columns)&lt;/li&gt;
&lt;li&gt;Multiple Dimension Tables surrounding it (smaller, with descriptive attributes)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;em&gt;&lt;strong&gt;fact table&lt;/strong&gt;&lt;/em&gt; holds numbers and foreign keys. The &lt;em&gt;&lt;strong&gt;dimension tables&lt;/strong&gt;&lt;/em&gt; hold the context. Every dimension connects directly to the fact table. No dimension connects to another dimension.&lt;/p&gt;

&lt;p&gt;Example: Sales Model&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
DimDate ──────┐
DimProduct ───┤──── FactSales
DimCustomer ──┤
DimRegion ────┘

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is clean, fast, and DAX-friendly. Most of your models should look like this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Snowflake Schema
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;snowflake schema&lt;/strong&gt; is a &lt;em&gt;star schema where the dimension tables are further normalized, they break into sub-dimensions&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DimProductCategory ──── DimProduct ──── FactSales

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;code&gt;DimProduct&lt;/code&gt; relates to &lt;code&gt;FactSales&lt;/code&gt;, but &lt;code&gt;DimProductCategory&lt;/code&gt; relates to &lt;code&gt;DimProduct&lt;/code&gt; instead of directly to &lt;code&gt;FactSales&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Snowflake schemas &lt;em&gt;save storage space&lt;/em&gt; but are &lt;em&gt;harder to work with in DAX&lt;/em&gt; and &lt;em&gt;can slow down your model&lt;/em&gt;. Use them only when necessary (e.g. data comes from a normalized SQL database and you can't denormalize it).&lt;/p&gt;

&lt;h3&gt;
  
  
  Galaxy Schema (Fact Constellation)
&lt;/h3&gt;

&lt;p&gt;Multiple fact tables share the same dimension tables. This is common in enterprise models.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DimDate ────┬──── FactSales
            └──── FactReturns
DimProduct ─┬──── FactSales
             └──── FactReturns

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| Schema                     | Structure                   | Power BI Friendliness                 | Best Used When              |
|----------------------------|-----------------------------|------------------------|--------------------------------------|
| &lt;span class="gs"&gt;**Star**&lt;/span&gt;                   | 1 fact + many dimensions    | ⭐⭐⭐⭐⭐ Excellent | Default choice — always aim for this |
| &lt;span class="gs"&gt;**Snowflake**&lt;/span&gt;              | Normalized dimensions       | ⭐⭐⭐ Moderate  | Source data is already normalized    |
| &lt;span class="gs"&gt;**Galaxy / Constellation**&lt;/span&gt; | Multiple facts, shared dims | ⭐⭐⭐⭐ Good      | Enterprise multi-subject models      |
| &lt;span class="gs"&gt;**Flat (Wide Table)**&lt;/span&gt;      | Everything in one table     | ⭐ Poor              | Avoid — causes redundancy &amp;amp; slow DAX           |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Relationships in Power BI (How Tables Actually Talk)
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;relationship&lt;/strong&gt; in Power BI is a_ defined connection between two tables based on a matching column_. It's how Power BI knows that &lt;code&gt;ProductID&lt;/code&gt; in your &lt;code&gt;FactSales&lt;/code&gt; table and &lt;code&gt;ProductID&lt;/code&gt; in your &lt;code&gt;DimProduct&lt;/code&gt; table are the same thing.&lt;/p&gt;

&lt;p&gt;Without relationships, every table is an island. With relationships, filters and aggregations flow across your entire model like electricity through a circuit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Components of a Relationship
&lt;/h3&gt;

&lt;p&gt;Every relationship in Power BI has four components:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| Component                  | Description                                             | Options                    |
|----------------------------|-----------------------------------------------------------|----------|
| &lt;span class="gs"&gt;**From Table / Column**&lt;/span&gt;    | The table where the FK lives (usually the fact table)    | Any table |
| &lt;span class="gs"&gt;**To Table / Column**&lt;/span&gt;      | The table where the PK lives (usually the dimension)     | Any table |
| &lt;span class="gs"&gt;**Cardinality**&lt;/span&gt;            | How many rows on each side match             | One-to-Many, One-to-One, Many-to-Many |
| &lt;span class="gs"&gt;**Cross-Filter Direction**&lt;/span&gt; | Which way filters flow                     | Single, Both |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cardinality
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cardinality&lt;/strong&gt; &lt;em&gt;defines the nature of the match between your two key columns&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| Cardinality            | Symbol       | Meaning                         | Example                |
|------------------------|--------------|--------------------------------------|-------------------|
| &lt;span class="gs"&gt;**One-to-Many (1:M)**&lt;/span&gt;  | &lt;span class="sb"&gt;`1`&lt;/span&gt; ──── &lt;span class="sb"&gt;`*`&lt;/span&gt; | One row in dim matches many in fact | 1 Product → many Sales rows |
| &lt;span class="gs"&gt;**Many-to-One (M:1)**&lt;/span&gt;  | &lt;span class="sb"&gt;`*`&lt;/span&gt; ──── &lt;span class="sb"&gt;`1`&lt;/span&gt; | Many fact rows match one dim row | Same as above, reversed|
| &lt;span class="gs"&gt;**One-to-One (1:1)**&lt;/span&gt;   | &lt;span class="sb"&gt;`1`&lt;/span&gt; ──── &lt;span class="sb"&gt;`1`&lt;/span&gt; | Each row matches exactly one row | Country codes ↔ Country names |
| &lt;span class="gs"&gt;**Many-to-Many (M:M)**&lt;/span&gt; | &lt;span class="sb"&gt;`*`&lt;/span&gt; ──── &lt;span class="sb"&gt;`*`&lt;/span&gt; | Many rows match many rows                  | Students ↔ Courses     |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Many-to-Many relationships are supported in Power BI but should be used with caution. They can cause ambiguous filter propagation and unexpected aggregation results. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Cross-Filter Direction
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| Direction | Behaviour                                 | When to Use          |
|-----------|-------------------------------------------|-----------------------------------------------|
| &lt;span class="gs"&gt;**Single**&lt;/span&gt;| Filters flow from 1-side → many-side only | Default — use this 90% of the time          |
| &lt;span class="gs"&gt;**Both**&lt;/span&gt;  | Filters flow in both directions | Role-playing dimensions, bridge tables — use sparingly |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Bidirectional filters can create circular dependencies and ambiguous results. Only use them when you have a clear reason.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Active vs. Inactive Relationships
&lt;/h2&gt;

&lt;p&gt;Power BI allows only one active relationship between any two tables at a time. But you can have multiple relationships defined... they just sit there, inactive, until you call them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Active Relationships
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Used automatically by all visuals and standard DAX measures&lt;/li&gt;
&lt;li&gt;Shown as a &lt;em&gt;&lt;strong&gt;solid line&lt;/strong&gt;&lt;/em&gt; in the model view&lt;/li&gt;
&lt;li&gt;You can only have one active relationship between any two tables&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Inactive Relationships
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Shown as a &lt;em&gt;&lt;strong&gt;dashed line&lt;/strong&gt;&lt;/em&gt; in model view&lt;/li&gt;
&lt;li&gt;Ignored by default — must be explicitly activated in DAX using &lt;code&gt;USERELATIONSHIP()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Useful for role-playing dimensions (e.g., a Date table used for both Order Date and Ship Date)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real-World Example: Role-Playing Dates&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- `FactSales` has both `OrderDate` and `ShipDate`, both linking to `DimDate`&lt;/span&gt;
&lt;span class="c1"&gt;-- Only one can be active. Use `USERELATIONSHIP` for the other:&lt;/span&gt;

&lt;span class="n"&gt;ShippedSalesAmount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
&lt;span class="n"&gt;CALCULATE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FactSales&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SalesAmount&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="n"&gt;USERELATIONSHIP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FactSales&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ShipDate&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;DimDate&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Active vs. Inactive Cheat Sheet
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;|                          | &lt;span class="gs"&gt;**Active Relationship**&lt;/span&gt;  | &lt;span class="gs"&gt;**Inactive Relationship**&lt;/span&gt;      |
|--------------------------|--------------------------|
| &lt;span class="gs"&gt;**Visual in Model View**&lt;/span&gt; | Solid line               | Dashed line         |
| &lt;span class="gs"&gt;**Used by default?**&lt;/span&gt;     | ✅ Yes                   | ❌ No |
| &lt;span class="gs"&gt;**Used in DAX?**&lt;/span&gt;         | Automatically            | Only with &lt;span class="sb"&gt;`USERELATIONSHIP()`&lt;/span&gt; |
| &lt;span class="gs"&gt;**How many allowed?**&lt;/span&gt;    | 1 between any two tables | Multiple |
| &lt;span class="gs"&gt;**Common use case**&lt;/span&gt;      | Standard lookups         | Role-playing dimensions          |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Joins vs. Relationships
&lt;/h2&gt;

&lt;p&gt;This trips up a lot of people who come from SQL. &lt;br&gt;
In SQL, you write &lt;code&gt;JOIN&lt;/code&gt; to &lt;em&gt;combine tables at query time&lt;/em&gt;. &lt;br&gt;
In Power BI, &lt;em&gt;relationships are defined once in the model and then used automatically&lt;/em&gt;. But both achieve similar results in different layers.&lt;/p&gt;
&lt;h2&gt;
  
  
  Joins (Power Query / M)
&lt;/h2&gt;

&lt;p&gt;Joins happen in &lt;strong&gt;Power Query (the M layer)&lt;/strong&gt; — before the data even loads into your model. They physically merge tables into one combined table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| Join Type            | What It Returns                                                         |
|----------------------|--------------------------------------------------------|
| &lt;span class="gs"&gt;**Inner Join**&lt;/span&gt;       | Only rows with matches in BOTH tables            |
| &lt;span class="gs"&gt;**Left Outer Join**&lt;/span&gt;  | All rows from left table + matching rows from right |
| &lt;span class="gs"&gt;**Right Outer Join**&lt;/span&gt; | All rows from right table + matching rows from left  |
| &lt;span class="gs"&gt;**Full Outer Join**&lt;/span&gt;  | All rows from both tables, matched where possible |
| &lt;span class="gs"&gt;**Left Anti Join**&lt;/span&gt;   | Rows in left table with NO match in right         |
| &lt;span class="gs"&gt;**Right Anti Join**&lt;/span&gt;  | Rows in right table with NO match in left        |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Relationships (Model Layer)
&lt;/h2&gt;

&lt;p&gt;Relationships stay as separate tables in the model and are resolved dynamically at query time by the DAX engine. Filters flow across them without physically merging data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Joins vs. Relationships
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| | &lt;span class="gs"&gt;**Joins (Power Query)**&lt;/span&gt; | &lt;span class="gs"&gt;**Relationships (Model)**&lt;/span&gt; |
|---------------------------|---------------------------|
| &lt;span class="gs"&gt;**Where it happens**&lt;/span&gt;      | Data transformation layer | Data model layer |
| &lt;span class="gs"&gt;**Result**&lt;/span&gt;                | Merged/flattened table    | Separate tables, linked |
| &lt;span class="gs"&gt;**Performance**&lt;/span&gt;           | Can increase data size    | Optimised by VertiPaq |
| &lt;span class="gs"&gt;**Flexibility**&lt;/span&gt;           | Fixed at load time        | Dynamic at query time |
| &lt;span class="gs"&gt;**DAX compatibility**&lt;/span&gt;     | Limited (flat table)      | Full DAX power |
| &lt;span class="gs"&gt;**Maintenance**&lt;/span&gt;           | Harder to update          | Easy to modify |
| &lt;span class="gs"&gt;**Best for**&lt;/span&gt;              | One-time lookups, data cleanup | Star schema models |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;## When to Use Joins vs. When to Use Relationships&lt;/p&gt;

&lt;p&gt;This is arguably the most practical question in Power BI data modeling, and the answer matters more than most tutorials admit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use a JOIN (Power Query) When&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need to &lt;em&gt;&lt;strong&gt;enrich a table with a few lookup columns&lt;/strong&gt;&lt;/em&gt; (e.g., add Country Name to a table that only has Country Code)&lt;/li&gt;
&lt;li&gt;You are &lt;em&gt;&lt;strong&gt;cleaning or reshaping raw data&lt;/strong&gt;&lt;/em&gt; before modeling&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;&lt;strong&gt;two tables will never be used separately&lt;/strong&gt;&lt;/em&gt; in your model&lt;/li&gt;
&lt;li&gt;You &lt;em&gt;&lt;strong&gt;want to reduce the number of tables&lt;/strong&gt;&lt;/em&gt; in your model for simplicity&lt;/li&gt;
&lt;li&gt;You're &lt;strong&gt;_dealing with a very small lookup table _&lt;/strong&gt;that &lt;em&gt;&lt;strong&gt;doesn't need to be its own dimension&lt;/strong&gt;&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Example: Merging a small "CurrencyCode → CurrencyName" lookup 
into your FactSales table in Power Query is fine — 
you don't need a separate DimCurrency table for 3 currency codes.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use a RELATIONSHIP (Model) When&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your tables have a &lt;em&gt;&lt;strong&gt;clear one-to-many structure&lt;/strong&gt;&lt;/em&gt; (Fact ↔ Dimension)&lt;/li&gt;
&lt;li&gt;You need &lt;em&gt;&lt;strong&gt;dynamic filtering&lt;/strong&gt;&lt;/em&gt; — visuals should filter each other&lt;/li&gt;
&lt;li&gt;You're &lt;em&gt;&lt;strong&gt;using time intelligence functions&lt;/strong&gt;&lt;/em&gt; (they require a proper Date relationship)&lt;/li&gt;
&lt;li&gt;You &lt;em&gt;&lt;strong&gt;plan to reuse a dimension across multiple fact tables&lt;/strong&gt;&lt;/em&gt; (e.g., &lt;code&gt;DimDate&lt;/code&gt; used by &lt;code&gt;FactSales&lt;/code&gt; AND &lt;code&gt;FactReturns&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;You want to &lt;em&gt;&lt;strong&gt;write clean, efficient DAX measures&lt;/strong&gt;&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;&lt;strong&gt;dimension table has many attributes&lt;/strong&gt;&lt;/em&gt; that would bloat your fact table if joined&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Decision Cheat Sheet: Join or Relationship?
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| Scenario                                         | Recommended Approach |
|--------------------------------------------------|----------------------                  |
| Add country name from a 3-row lookup             | &lt;span class="gs"&gt;**Join**&lt;/span&gt; in Power Query                          |
| Connect Sales to a 50-column Product table       | &lt;span class="gs"&gt;**Relationship**&lt;/span&gt; in the model                      |
| Combine data from two systems for one flat table | &lt;span class="gs"&gt;**Join**&lt;/span&gt; in Power Query                          |
| Use one Date table for Order Date AND Ship Date  | &lt;span class="gs"&gt;**Two Relationships**&lt;/span&gt; (1 active, 1 inactive)         |
| One customer linked to many orders               | &lt;span class="gs"&gt;**Relationship**&lt;/span&gt; (1:M)                          |
| Many students enrolled in many courses           | &lt;span class="gs"&gt;**Relationship with bridge table**&lt;/span&gt; (M:M → two 1:M) |
| Snapshot table that's used once                  | &lt;span class="gs"&gt;**Join**&lt;/span&gt; in Power Query                          |
| Shared dimension across multiple fact tables     | &lt;span class="gs"&gt;**Relationship**&lt;/span&gt; in the model                      |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Keys
&lt;/h2&gt;

&lt;p&gt;Every relationship depends on &lt;strong&gt;keys&lt;/strong&gt; — &lt;em&gt;columns that uniquely identify rows&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| Key Type             | Description                                                | Example                                          |
|----------------------|-------------------------------------------------------------|---------------------------------------|
| &lt;span class="gs"&gt;**Primary Key (PK)**&lt;/span&gt; | Uniquely identifies each row — no nulls, no duplicates | &lt;span class="sb"&gt;`ProductID`&lt;/span&gt; in DimProduct             |
| &lt;span class="gs"&gt;**Foreign Key (FK)**&lt;/span&gt; | References a PK in another table — may have duplicates | &lt;span class="sb"&gt;`ProductID`&lt;/span&gt; in FactSales              |
| &lt;span class="gs"&gt;**Surrogate Key**&lt;/span&gt;    | System-generated key (usually an integer)           | Auto-incremented ID                              |
| &lt;span class="gs"&gt;**Natural Key**&lt;/span&gt;      | A real-world identifier used as a key | Email address, National ID                               |
| &lt;span class="gs"&gt;**Composite Key**&lt;/span&gt;    | Two or more columns together form the unique identifier             | &lt;span class="sb"&gt;`OrderID + LineNumber`&lt;/span&gt;    |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Power BI tip: Always use surrogate integer keys for relationships instead of text-based natural keys.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Master Cheat Sheet — The Complete Power BI Modeling Reference
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| Task                             | Where                   | What to Do |
|----------------------------------|-------------------------|------------|
| Connect two tables               | Model View              | Drag FK column onto PK column           |
| Check relationship type          | Model View → Click line | See cardinality &amp;amp; direction         |
| Fix ambiguous relationships      | Model View              | Deactivate one, use USERELATIONSHIP in DAX |
| Use inactive relationship in DAX | DAX Editor              | &lt;span class="sb"&gt;`CALCULATE([Measure], USERELATIONSHIP(FK, PK))`&lt;/span&gt; |
| Avoid many-to-many               | Power Query + Model     | Add bridge/junction table           |
| Build a star schema              | Model View              | 1 fact table, many dimension tables    |
| Improve performance              | Model View + Power Query| Use integer keys, remove unused columns     |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cardinality Quick Reference&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| Type                   | When                      | Watch Out For |
|------------------------|---------------------------|---------------|
| &lt;span class="gs"&gt;**One-to-Many (1:M)**&lt;/span&gt;  | Standard Fact ↔ Dimension | Nothing — this is ideal                  |
| &lt;span class="gs"&gt;**One-to-One (1:1)**&lt;/span&gt;   | Splitting large tables    | May indicate tables should be merged       |
| &lt;span class="gs"&gt;**Many-to-Many (M:M)**&lt;/span&gt; | Shared attributes         | Use a bridge table instead where possible |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cross-Filter Quick Reference&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| Setting    | Filter Flows    | Use When                                            |
|------------|-----------------|------------------------------------------|
| &lt;span class="gs"&gt;**Single**&lt;/span&gt; | Dim → Fact only | Standard star schema (default)              |
| &lt;span class="gs"&gt;**Both**&lt;/span&gt;   | Dim ↔ Fact      | Bridge tables, role-playing dims (use sparingly) |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;DAX Relationship Functions&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| Function          | Syntax                      | Purpose                       |
|-------------------|-----------------------------|-----------------------|
| &lt;span class="sb"&gt;`RELATED`&lt;/span&gt;         | &lt;span class="sb"&gt;`RELATED(DimTable[Column])`&lt;/span&gt; | Pull a value from the 1-side into the many-side  |
| &lt;span class="sb"&gt;`RELATEDTABLE`&lt;/span&gt;    | &lt;span class="sb"&gt;`RELATEDTABLE(FactTable)`&lt;/span&gt;   | Return related rows from the many-side         |
| &lt;span class="sb"&gt;`USERELATIONSHIP`&lt;/span&gt; | &lt;span class="sb"&gt;`USERELATIONSHIP(FK, PK)`&lt;/span&gt;   | Activate an inactive relationship in a measure  |
| &lt;span class="sb"&gt;`CROSSFILTER`&lt;/span&gt;     | &lt;span class="sb"&gt;`CROSSFILTER(FK, PK, Both)`&lt;/span&gt; | Override filter direction inside a measure |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Data modeling in Power BI isn't just a technical checkbox,... it is &lt;em&gt;the architecture that determines whether your reports are fast, accurate, and maintainable, or slow, wrong, and painful to debug&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The golden rules to walk away with:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Always &lt;em&gt;&lt;strong&gt;aim for a star schema&lt;/strong&gt;&lt;/em&gt;. One fact table, surrounded by clean dimension tables.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;&lt;strong&gt;Relationships beat joins&lt;/strong&gt;&lt;/em&gt; for anything that needs to be dynamic, reusable, or DAX-friendly.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;&lt;strong&gt;Use integer surrogate keys&lt;/strong&gt;&lt;/em&gt;. Text-based keys are slower and harder to manage.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;&lt;strong&gt;Default to Single cross-filter direction&lt;/strong&gt;&lt;/em&gt;. Go bidirectional only when you have to.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;&lt;strong&gt;Inactive relationships are not dead relationships&lt;/strong&gt;&lt;/em&gt; — they're tools. Use &lt;code&gt;USERELATIONSHIP()&lt;/code&gt; to unlock them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Many-to-many isn't always wrong&lt;/em&gt;&lt;/strong&gt; — but a bridge table is almost always cleaner.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The moment your model is clean, your DAX becomes simpler, your reports run faster, and those mysterious wrong numbers finally disappear. That's the power of modeling done right.&lt;/p&gt;

</description>
      <category>powerbi</category>
      <category>data</category>
      <category>beginners</category>
    </item>
    <item>
      <title>LINUX FUNDAMENTALS FOR DATA ENGINEERING.</title>
      <dc:creator>Angellicah</dc:creator>
      <pubDate>Sat, 06 Jun 2026 21:27:55 +0000</pubDate>
      <link>https://dev.to/angellicah_2ed8aa8f01f176/linux-fundamentals-for-data-engineering-28dm</link>
      <guid>https://dev.to/angellicah_2ed8aa8f01f176/linux-fundamentals-for-data-engineering-28dm</guid>
      <description>&lt;h2&gt;
  
  
  INTRODUCTION
&lt;/h2&gt;

&lt;p&gt;Data engineering is the backbone of modern data-driven organizations. Data engineers &lt;em&gt;design&lt;/em&gt;, &lt;em&gt;build&lt;/em&gt;, and &lt;em&gt;maintain&lt;/em&gt; &lt;em&gt;systems&lt;/em&gt; that collect, process, and store vast amounts of data. While programming languages such as &lt;strong&gt;Python&lt;/strong&gt; and &lt;strong&gt;SQL&lt;/strong&gt; often receive significant attention in data engineering discussions, Linux remains one of the most essential tools in a data engineer's toolkit.&lt;/p&gt;

&lt;p&gt;Most data platforms, cloud servers, databases, big data frameworks, and ETL pipelines run on Linux-based systems. Therefore, understanding Linux fundamentals is a necessity for any aspiring data engineer.&lt;/p&gt;

&lt;p&gt;This article explores the key Linux concepts every data engineer should master, including &lt;em&gt;file system navigation&lt;/em&gt;, &lt;em&gt;file management&lt;/em&gt;, &lt;em&gt;permissions&lt;/em&gt;, &lt;em&gt;process management&lt;/em&gt;, &lt;em&gt;networking&lt;/em&gt;, &lt;em&gt;shell scripting&lt;/em&gt;, and &lt;em&gt;automation&lt;/em&gt;. Practical examples are provided throughout to demonstrate how Linux is used in real-world data engineering tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  WHY LINUX MATTERS IN DATA ENGINEERING
&lt;/h3&gt;

&lt;p&gt;Linux dominates the server and cloud computing ecosystem. Technologies frequently used in data engineering and are typically deployed on Linux servers include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apache Hadoop&lt;/li&gt;
&lt;li&gt;Apache Spark&lt;/li&gt;
&lt;li&gt;Apache Kafka&lt;/li&gt;
&lt;li&gt;PostgreSQL&lt;/li&gt;
&lt;li&gt;MySQL&lt;/li&gt;
&lt;li&gt;Docker&lt;/li&gt;
&lt;li&gt;Kubernetes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a data engineer, you may need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Access remote servers&lt;/li&gt;
&lt;li&gt;Monitor data pipelines&lt;/li&gt;
&lt;li&gt;Schedule automated jobs&lt;/li&gt;
&lt;li&gt;Manage data files&lt;/li&gt;
&lt;li&gt;Troubleshoot system issues&lt;/li&gt;
&lt;li&gt;Deploy applications&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  UNDERSTANDING THE LINUX FILE SYSTEM
&lt;/h3&gt;

&lt;p&gt;Unlike Windows, Linux uses a hierarchical directory structure beginning with the root directory (/).&lt;/p&gt;

&lt;p&gt;Common directories include:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Directory&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Root directory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/home&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;User files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/etc&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Configuration files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/var&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Log files and variable data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/tmp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Temporary files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/usr&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;User programs and utilities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/bin&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Essential command binaries&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;To &lt;em&gt;view the current directory&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pwd&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Example Output:&lt;/p&gt;

&lt;p&gt;/home/student&lt;/p&gt;

&lt;p&gt;To &lt;em&gt;list files&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ls&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;For &lt;em&gt;detailed information&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ls -l&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;To &lt;em&gt;view hidden files&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ls -la&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;These commands are frequently used when locating datasets, scripts, logs, and configuration files.&lt;/p&gt;

&lt;h3&gt;
  
  
  NAVIGATING DIRECTORIES
&lt;/h3&gt;

&lt;p&gt;Directory navigation is one of the first Linux skills every data engineer should learn.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Move into a directory&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cd data&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Move back one level&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cd ..&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Return to home directory&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cd ~&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Move to root directory&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cd /&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical Example&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Suppose a dataset is stored in:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/home/student/datasets/sales&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You can access it using:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cd ~/datasets/sales&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Efficient navigation saves time when managing large data projects.&lt;/p&gt;

&lt;h3&gt;
  
  
  CREATING AND MANAGING FILES
&lt;/h3&gt;

&lt;p&gt;Data engineers often create scripts, configuration files, and data storage directories.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Create a new directory&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;mkdir project_data&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Create nested directories&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;mkdir -p project_data/raw/2025&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Create an empty file&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;touch sales.csv&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Copy a file&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cp sales.csv backup_sales.csv&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Move&lt;/em&gt; or &lt;em&gt;rename&lt;/em&gt; a file:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;mv sales.csv monthly_sales.csv&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Delete a file&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;rm sales.csv&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Delete&lt;/em&gt; a &lt;em&gt;directory&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;rm -r project_data&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical Example&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Creating a project structure for a data pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 
data_pipeline/&lt;span class="o"&gt;{&lt;/span&gt;raw,processed,scripts,logs&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output structure:&lt;/p&gt;

&lt;p&gt;data_pipeline/&lt;br&gt;
├── raw&lt;br&gt;
├── processed&lt;br&gt;
├── scripts&lt;br&gt;
└── logs&lt;/p&gt;

&lt;p&gt;This organization improves maintainability and scalability.&lt;/p&gt;
&lt;h3&gt;
  
  
  VIEWING AND MANIPULATING FILE CONTENTS
&lt;/h3&gt;

&lt;p&gt;Data engineers regularly inspect datasets and log files.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Display file contents&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cat data.csv&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;View large files&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;less data.csv&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Display first 10 lines&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;head data.csv&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Display last 10 lines&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;tail data.csv&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Monitor logs continuously&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;tail -f pipeline.log&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical Example&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Monitoring an ETL process:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;tail -f etl_job.log&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This command helps identify errors in real time.&lt;/p&gt;
&lt;h3&gt;
  
  
  SEARCHING FOR FILES AND DATA
&lt;/h3&gt;

&lt;p&gt;Data environments often contain thousands of files.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Find a file&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;find . -name "sales.csv"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Search for text inside files&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;grep "ERROR" pipeline.log&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Count occurrences&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;grep -c "ERROR" pipeline.log&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical Example&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Finding failed records in a log&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;grep "FAILED" ingestion.log&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;FAILED: Record 1024&lt;br&gt;
FAILED: Record 2048&lt;br&gt;
FAILED: Record 3050&lt;/p&gt;

&lt;p&gt;This allows quick troubleshooting.&lt;/p&gt;
&lt;h3&gt;
  
  
  LINUX PERMISSIONS AND OWNERSHIP
&lt;/h3&gt;

&lt;p&gt;Linux uses permissions to control file access.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;View permissions&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ls -l&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Example Output:&lt;/p&gt;

&lt;p&gt;-rw-r--r-- 1 student student 2450 sales.csv&lt;/p&gt;

&lt;p&gt;Permission categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Owner&lt;/li&gt;
&lt;li&gt;Group&lt;/li&gt;
&lt;li&gt;Others&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Permission symbols:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Symbol&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;r&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Read&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;w&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Write&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;x&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Execute&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Change permissions&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;chmod 755 script.sh&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Make script executable&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;chmod +x script.sh&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Change ownership&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;chown user:user file.txt&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical Example&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Allowing an ETL script to execute&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;chmod +x etl.sh&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Without execute permission, the script cannot run.&lt;/p&gt;
&lt;h3&gt;
  
  
  PROCESS MANAGEMENT
&lt;/h3&gt;

&lt;p&gt;Data pipelines frequently run as Linux processes.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;View running processes&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ps aux&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Monitor system activity&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;top&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Find process ID&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pgrep python&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Terminate process&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;kill PID&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Force termination&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;kill -9 PID&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical Example&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Suppose a Spark job becomes unresponsive.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Find it&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ps aux | grep spark&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Stop it&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;kill PID&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This prevents resource wastage.&lt;/p&gt;
&lt;h3&gt;
  
  
  DISK USAGE MONITORING
&lt;/h3&gt;

&lt;p&gt;Large datasets consume significant storage.&lt;/p&gt;

&lt;p&gt;Check disk space:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;df -h&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Check directory size:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;du -sh datasets/&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical Example&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Determining storage&lt;/em&gt; used by data files:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;du -sh raw_data/&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;15G raw_data/&lt;/p&gt;

&lt;p&gt;This helps monitor storage requirements.&lt;/p&gt;
&lt;h3&gt;
  
  
  NETWORKING FUNDAMENTALS
&lt;/h3&gt;

&lt;p&gt;Data engineers often work with remote servers.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Check IP address&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ip addr&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Test connectivity&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ping google.com&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Connect to remote server&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ssh user@server-ip&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Transfer files&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;scp data.csv user@server:/home/user/&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical Example&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Uploading a processed dataset:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;scp processed.csv admin@192.168.1.10:/data/&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This enables data sharing between systems.&lt;/p&gt;
&lt;h3&gt;
  
  
  PACKAGE MANAGEMENT
&lt;/h3&gt;

&lt;p&gt;Linux distributions use package managers.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Ubuntu/Debian&lt;/strong&gt;&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;python3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;&lt;strong&gt;Red Hat/CentOS&lt;/strong&gt;&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sudo yum install python3&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical Example&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Installing PostgreSQL client:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sudo apt install postgresql-client&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This allows database interaction directly from the terminal.&lt;/p&gt;

&lt;h3&gt;
  
  
  SHELL SCRIPTING FOR AUTOMATION
&lt;/h3&gt;

&lt;p&gt;Automation is a core responsibility of data engineers.&lt;/p&gt;

&lt;p&gt;Example shell script:&lt;/p&gt;

&lt;p&gt;!/bin/bash&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Starting Data Pipeline"&lt;/span&gt;

python extract.py
python transform.py
python load.py

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Pipeline Completed"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Save as&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pipeline.sh&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Make executable&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;chmod +x pipeline.sh&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Run&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;./pipeline.sh&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduces manual work&lt;/li&gt;
&lt;li&gt;Improves consistency&lt;/li&gt;
&lt;li&gt;Enables scheduling&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  SCHEDULING JOBS WITH CRON
&lt;/h3&gt;

&lt;p&gt;Data pipelines often run automatically.&lt;/p&gt;

&lt;p&gt;Open cron editor:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;crontab -e&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Run script every day at midnight:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;0 0 * * * /home/student/pipeline.sh&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cron Format&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Minute Hour Day Month Weekday&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical Example&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Execute data ingestion daily:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;30 2 * * * /home/student/scripts/ingest.sh&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This runs at 2:30 AM every day.&lt;/p&gt;

&lt;h3&gt;
  
  
  WORKING WITH COMPRESSED FILES
&lt;/h3&gt;

&lt;p&gt;Large datasets are commonly compressed.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Compress&lt;/em&gt; file:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;gzip data.csv&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Decompress&lt;/em&gt; file:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;gunzip data.csv.gz&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Create archive&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;tar -cvf archive.tar data/&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Extract archive&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;tar -xvf archive.tar&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical Example&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Receiving compressed logs:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;gunzip logs.gz&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Then analyze them using Linux tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  USEFUL COMMANDS FOR DATA ENGINEERS
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Count lines in a file&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;wc -l sales.csv&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Sort data&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sort sales.csv&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Remove duplicates&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;uniq sales.csv&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Display specific columns&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cut -d',' -f1,3 sales.csv&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Combine commands&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cat sales.csv | grep Nairobi | wc -l&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This counts records containing "Nairobi".&lt;/p&gt;

&lt;h3&gt;
  
  
  PRACTICAL ASSIGNMENT EXAMPLE
&lt;/h3&gt;

&lt;p&gt;During this Linux fundamentals assignment, several commands were used to create and manage a data engineering workspace.&lt;/p&gt;

&lt;p&gt;Creating project directories:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;mkdir -p data_engineering/{raw,processed,scripts,logs}&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Creating a sample dataset:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;touch raw/sales_data.csv&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Viewing data:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;head raw/sales_data.csv&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Creating a pipeline script:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;nano scripts/process_data.sh&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Making it executable:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;chmod +x scripts/process_data.sh&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Running the pipeline:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;./scripts/process_data.sh&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Monitoring logs:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;tail -f logs/pipeline.log&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;These activities simulate real-world data engineering operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  BEST PRACTICLES FOR DATA ENGINEERS USING LINUX
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Organize files using structured directories.&lt;/li&gt;
&lt;li&gt;Use meaningful file names.&lt;/li&gt;
&lt;li&gt;Automate repetitive tasks with scripts.&lt;/li&gt;
&lt;li&gt;Monitor system resources regularly.&lt;/li&gt;
&lt;li&gt;Secure files using proper permissions.&lt;/li&gt;
&lt;li&gt;Maintain backups of critical data.&lt;/li&gt;
&lt;li&gt;Use version control systems such as Git.&lt;/li&gt;
&lt;li&gt;Document scripts and workflows.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Following these practices improves reliability and maintainability.&lt;/p&gt;

&lt;h3&gt;
  
  
  CONCLUSION
&lt;/h3&gt;

&lt;p&gt;Linux is a foundational skill for data engineering. Whether managing datasets, monitoring ETL pipelines, deploying applications, or automating workflows, Linux provides the essential tools required to operate efficiently in modern data environments.&lt;/p&gt;

&lt;p&gt;Mastering Linux fundamentals such as file management, permissions, process control, networking, automation, and shell scripting significantly enhances a data engineer's productivity and effectiveness. As organizations continue to rely on cloud platforms and distributed data systems, Linux expertise will remain one of the most valuable technical skills in the data engineering profession.&lt;/p&gt;

&lt;p&gt;For aspiring data engineers, investing time in learning Linux is about building the operational foundation necessary for handling real-world data challenges at scale.&lt;/p&gt;

</description>
      <category>linux</category>
      <category>dataengineering</category>
      <category>devops</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
