<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Wangeci Ndovu</title>
    <description>The latest articles on DEV Community by Wangeci Ndovu (@wangeci_ndovu).</description>
    <link>https://dev.to/wangeci_ndovu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3708631%2F19d469d7-6962-4556-be89-39aeb3cec318.jpg</url>
      <title>DEV Community: Wangeci Ndovu</title>
      <link>https://dev.to/wangeci_ndovu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/wangeci_ndovu"/>
    <language>en</language>
    <item>
      <title>SQL Joins and Window Functions: The Difference Between Combining Data and Analyzing It</title>
      <dc:creator>Wangeci Ndovu</dc:creator>
      <pubDate>Wed, 04 Mar 2026 02:27:32 +0000</pubDate>
      <link>https://dev.to/wangeci_ndovu/sql-joins-and-window-functions-the-difference-between-combining-data-and-analyzing-it-4d1b</link>
      <guid>https://dev.to/wangeci_ndovu/sql-joins-and-window-functions-the-difference-between-combining-data-and-analyzing-it-4d1b</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbzcpmx4n6ca8msr9nazj.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbzcpmx4n6ca8msr9nazj.jpg" alt=" " width="735" height="490"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Let us talk about joins and windows functions in sql
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Joins combine tables
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Window Functions analyze data without collapsing it
&lt;/h2&gt;

&lt;p&gt;Many beginners confuse the two. Let’s break them down properly step y step &lt;strong&gt;with real examples, clear explanations, and practical queries&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 1- SQL Joins — Combining Data Across Tables
&lt;/h2&gt;

&lt;p&gt;Imagine you have two tables:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;customers&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;customer_id | first_name|second_name
1           | Alice     |Johnson
2           | Bob       |Njagi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;orders&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;order_id| customer_id |Order_amount
1       | 1           | 250
2       | 1           | 300
3       | 2           | 150
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to see who made which order, you need a ##JOIN##.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is a JOIN?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A JOIN allows you to combine rows from two or more tables based on a related column.&lt;/p&gt;

&lt;p&gt;In simple terms:&lt;/p&gt;

&lt;p&gt;A JOIN connects tables using a common key.&lt;/p&gt;

&lt;h2&gt;
  
  
  INNER JOIN
&lt;/h2&gt;

&lt;p&gt;Returns only matching rows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT customer_id, order_id,
order_amount,
FROM customers,
INNER JOIN orders,
ON customer_id = customer_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Alice | 1 | 250
Alice | 2 | 300
Bob   | 3 | 150
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;If there’s no match, the row is excluded&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  LEFT JOIN
&lt;/h2&gt;

&lt;p&gt;Returns all rows from the left table, even if there’s no match.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT customer_name,
order_id,orders_amount,
FROM customers,
LEFT JOIN orders,
ON customer_id = customer_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a customer has no orders, they still appear &lt;strong&gt;with NULL values&lt;/strong&gt; for order columns.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhnr28pfoba9ly8wfs2z2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhnr28pfoba9ly8wfs2z2.jpg" alt=" " width="736" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  RIGHT JOIN
&lt;/h2&gt;

&lt;p&gt;Opposite of LEFT JOIN returns all rows from the right table.&lt;/p&gt;

&lt;h2&gt;
  
  
  FULL OUTER JOIN
&lt;/h2&gt;

&lt;p&gt;Returns all rows from both tables matched where possible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Insight About Joins&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Joins increase columns.&lt;/p&gt;

&lt;p&gt;They bring data from multiple tables into a single result set.&lt;/p&gt;

&lt;p&gt;They do NOT calculate ranking, running totals, or row by row analytics.&lt;/p&gt;

&lt;p&gt;That’s where &lt;strong&gt;Window Functions&lt;/strong&gt; come in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2- Window Functions Analyzing Without Collapsing Data
&lt;/h2&gt;

&lt;p&gt;Window functions are different.&lt;/p&gt;

&lt;p&gt;They:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Do NOT reduce rows (unlike GROUP BY)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Perform calculations across related rows&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Allow row-level analytics&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This is extremely important.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Example question
&lt;/h3&gt;

&lt;p&gt;What if we want:&lt;/p&gt;

&lt;p&gt;Total spending per customer, but still show each individual order?&lt;/p&gt;

&lt;p&gt;If you use GROUP BY:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT customer_id,
SUM(amount) AS total_spent,
FROM orders
GROUP BY customer_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;You get&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1 | 550
2 | 150
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But you lose individual orders.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Enter Window Functions&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT order_id,customer_id,amount,
SUM(amount) OVER (PARTITION BY customer_id) 
AS total_spent
FROM orders;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1 | 1 | 250 | 550
2 | 1 | 300 | 550
3 | 2 | 150 | 150
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Now you have
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Each order&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AND&lt;/strong&gt; total per customer&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Without collapsing rows&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a better way to do it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding OVER()
&lt;/h2&gt;

&lt;p&gt;The magic happens inside the &lt;strong&gt;OVER()&lt;/strong&gt; clause.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PARTITION BY&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Groups rows logically (like GROUP BY), but does not collapse them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ORDER BY&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Defines order within each partition(basically how you want them show).&lt;/p&gt;

&lt;p&gt;Example: &lt;strong&gt;Ranking orders by amount.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT order_id, customer_id,amount,
RANK() OVER (PARTITION BY customer_id ORDER BY amount DESC) 
AS customer_rank,
FROM orders;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;This ranks each customer’s orders separately.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Common Window Functions You Should Know&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ROW_NUMBER()&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gives unique row numbers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ROW_NUMBER() OVER (ORDER BY amount DESC)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;RANK()&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gives same rank for ties, skips numbers.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DENSE_RANK()&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gives same rank for ties, does NOT skip numbers.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SUM() OVER()&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Running totals&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT order_id, amount,
SUM(amount) OVER (ORDER BY order_id) 
AS running_total
FROM orders;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;LAG() and LEAD()&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compare rows to the ones before or the ones after.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT order_id, amount,
LAG(amount) OVER (ORDER BY order_id) 
AS previous_amount
FROM orders;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Very useful for time-series analysis&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Joins vs Window Functions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Real Difference&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here’s the more clearer distinction&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Joins
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Combine tables
&lt;/li&gt;
&lt;li&gt;Increase columns
&lt;/li&gt;
&lt;li&gt;Used to bring related data
&lt;/li&gt;
&lt;li&gt;Based on keys &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Window Functions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Analyze rows within a table&lt;/li&gt;
&lt;li&gt;Add calculated insights&lt;/li&gt;
&lt;li&gt;collapsing rows&lt;/li&gt;
&lt;li&gt;Based on partitions and order&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69m3ri86jyl4kj6qfqjs.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69m3ri86jyl4kj6qfqjs.jpg" alt=" " width="736" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  When Should You Use Each?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use JOIN when-&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;You need data from multiple tables&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You’re connecting facts and dimensions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You’re building analytical datasets&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use Window Functions when-&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;You need ranking&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You need running totals&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You need comparisons between rows&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;-You want aggregates without GROUP BY&lt;/p&gt;

&lt;p&gt;In everyday analytics and data engineering, you often use &lt;strong&gt;BOTH&lt;/strong&gt; together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT customer_name, order_id,order_amount,
SUM(order_amount) OVER (PARTITION BY customer_id) AS total_spent
FROM customers
JOIN orders
ON customer_id = customer_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;This combines tables AND applies analytics&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That’s production-level SQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  In conclusion
&lt;/h2&gt;

&lt;p&gt;Joins connect tables using keys.&lt;/p&gt;

&lt;p&gt;Window functions perform analytics without collapsing rows.&lt;/p&gt;

&lt;p&gt;GROUP BY reduces rows, window functions do not.&lt;/p&gt;

&lt;p&gt;PARTITION BY is like GROUP BY, but keeps detail rows.&lt;/p&gt;

&lt;p&gt;Modern data work heavily relies on window functions.&lt;/p&gt;

&lt;p&gt;If you’re serious about becoming strong in SQL especially as a Data Engineer mastering both concepts is &lt;strong&gt;non-negotiable.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;you can check more of my articles on &lt;a href="https://www.linkedin.com/in/thomas-wangeci-065469194/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/thomas-wangeci-065469194/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>database</category>
      <category>sql</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How Analysts Translate Messy Data, DAX, and Dashboards into Action Using Power bi</title>
      <dc:creator>Wangeci Ndovu</dc:creator>
      <pubDate>Mon, 16 Feb 2026 14:30:16 +0000</pubDate>
      <link>https://dev.to/wangeci_ndovu/how-analysts-translate-messy-data-dax-and-dashboards-into-action-using-power-bi-25ja</link>
      <guid>https://dev.to/wangeci_ndovu/how-analysts-translate-messy-data-dax-and-dashboards-into-action-using-power-bi-25ja</guid>
      <description>&lt;p&gt;When I started working with large amounts of data I quickly realized one thing and that is that, raw data isn’t valuable until it tells a story. This is where analysts step in, turning messy datasets into actionable intelligence. At the heart of this transformation is Microsoft's Power BI, a powerful analytics platform that helps analysts organize data, build logic with DAX, and deliver dashboards that drive decisions.&lt;/p&gt;

&lt;p&gt;In this article, we'll look at how analysts approach messy data, apply DAX (Data Analysis Expressions) to add intelligence, and design dashboards that empower stakeholders, who are often times the key decision makers in organizations to act with confidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the Messy Data
&lt;/h2&gt;

&lt;p&gt;Messy data is everywhere. sometimes customers have inconsistent names, at other times dates are stored as text. decimals may sometimes use commas instead of full stops. The first step in any analytics initiative is data understanding and cleaning, key on the &lt;strong&gt;cleaning&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Messy Data Issues
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Missing values&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Inconsistent formats&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Duplicated records&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mis-typed entries&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Irrelevant data&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Analysts Do First
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;go through the dataset&lt;/li&gt;
&lt;li&gt;Identify inconsistencies &lt;/li&gt;
&lt;li&gt;Transform the data through data query&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before building anything in Power BI, analysts most times start in Power Query, &lt;strong&gt;cleaning&lt;/strong&gt; data using an intuitive UI or M language.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Transformation using Power Query
&lt;/h2&gt;

&lt;p&gt;Power BI’s Power Query Editor is where the heavy lifting actually happens. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analysts use it to&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Split columns&lt;/li&gt;
&lt;li&gt;Change data types&lt;/li&gt;
&lt;li&gt;Replace inconsistent text&lt;/li&gt;
&lt;li&gt;Merge and append tables&lt;/li&gt;
&lt;li&gt;Handle missing values&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key here is: “Prepare once, reuse many”. With Power Query steps, transformation logic persists every time the dataset refreshes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding Intelligence With DAX
&lt;/h2&gt;

&lt;p&gt;Once the data is clean and structured, it’s time for one of Power BI’s most powerful tools: DAX (Data Analysis Expressions)&lt;/p&gt;

&lt;p&gt;DAX is the language that fuels calculated columns, measures, time intelligence, and business logic within Power BI, and it is designed specifically for data analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  so how do analysts use DAX in real life scenarios
&lt;/h2&gt;

&lt;h1&gt;
  
  
  Creating Important Metrics
&lt;/h1&gt;

&lt;p&gt;Instead of simple raw columns, analysts define business critical metrics using DAX, such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Total Sales = SUM(Sales[Amount])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a reusable measure that aggregates sales dynamically across filters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Time Intelligence
&lt;/h2&gt;

&lt;p&gt;Common business questions involve time comparisons. With DAX, you can express these like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sales Last Year = CALCULATE([Total Sales], 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SAMEPERIODLASTYEAR(Calendar[Date]))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Now you can compare year-over-year trends with ease&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Designing Dashboards That Tell Stories
&lt;/h2&gt;

&lt;p&gt;Data without visualization is pontless it fails to tell the story that is intended overally. Dashboards are where analytics actually communicate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Principles Analysts Follow
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Start with questions — What business decisions do stakeholders need to make?&lt;/li&gt;
&lt;li&gt;Choose clear visuals — Bar charts for comparisons, line charts for trends, cards for key numbers.&lt;/li&gt;
&lt;li&gt;Use slicers thoughtfully — Let users filter context without clutter.&lt;/li&gt;
&lt;li&gt;Avoid noise — Too many visuals complicate and dilute the point of focus.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;A typical Power BI dashboard MUST answers:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;What happened?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Why did it happen?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What might happen next?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This turns passive data into actionable insight&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Refreshing and Operationalizing Insights
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Power BI dashboards aren’t static reports — they’re living, refreshing assets. Analysts schedule data refreshes, connect to live data sources, and configure alerts on a daily so stakeholders don’t miss key changes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This is where analytics becomes practically actionable, not just descriptive&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  To put it at a glance
&lt;/h2&gt;

&lt;p&gt;Here’s an example of a simple step-by-step analytics workflow in Power BI:&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase            Tool               Output
&lt;/h2&gt;

&lt;p&gt;Ingest        Power Query           Clean tables&lt;br&gt;
Logic               DAX                 Dynamic measures&lt;br&gt;
Visualize      Reports &amp;amp; Dashboard       Actionable views&lt;br&gt;
Share         Power BI Service          Published insights&lt;/p&gt;

&lt;p&gt;Each step builds on the last and without any of them, insights remain heavily inaccurate.&lt;/p&gt;

&lt;h2&gt;
  
  
  key take home
&lt;/h2&gt;

&lt;p&gt;Power BI is powerful, but it’s the analyst’s mindset that turns raw data into actionable &lt;strong&gt;insights/ logic&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Curiosity — What story is the data trying to tell?&lt;/li&gt;
&lt;li&gt;Precision — Is this metric calculated correctly?&lt;/li&gt;
&lt;li&gt;Clarity — Can a user understand this at a glance?&lt;/li&gt;
&lt;li&gt;Impact — Does this lead to better decisions?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Power BI is the tool — but it is virtually useless without the analyst&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  In conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Whether you’re building your first dashboard or optimizing complex enterprise analytics, the process remains the same:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Understand the data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Clean and shape it&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add intelligence with DAX&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Visualize simply and with clarity&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Empower users to take action&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>analytics</category>
      <category>data</category>
      <category>datascience</category>
      <category>microsoft</category>
    </item>
    <item>
      <title>Schemas and Data Modelling in Power BI: A Practical Guide for Accurate and High-Performance Reporting</title>
      <dc:creator>Wangeci Ndovu</dc:creator>
      <pubDate>Mon, 02 Feb 2026 15:08:17 +0000</pubDate>
      <link>https://dev.to/wangeci_ndovu/schemas-and-data-modelling-in-power-bi-a-practical-guide-for-accurate-and-high-performance-3hjl</link>
      <guid>https://dev.to/wangeci_ndovu/schemas-and-data-modelling-in-power-bi-a-practical-guide-for-accurate-and-high-performance-3hjl</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6dao2aczv0isald4cde7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6dao2aczv0isald4cde7.jpg" alt=" " width="236" height="157"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When working with Power BI, most beginners focus heavily on &lt;strong&gt;visuals—charts, tables, slicers, and dashboards&lt;/strong&gt;. However, the real foundation of reliable, fast, and accurate Power BI reports is data modelling.&lt;/p&gt;

&lt;p&gt;A poorly designed model can lead to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Slow reports&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Incorrect totals&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Confusing relationships&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hard-to-maintain dashboards&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article, we’ll explore &lt;strong&gt;schemas and data modelling&lt;/strong&gt; in Power BI, focusing on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Star schema&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Snowflake schema&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fact and dimension tables&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Relationships&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why good modelling is critical for performance and accurate reporting&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Data Modelling in Power BI?
&lt;/h2&gt;

&lt;p&gt;Data modelling is the process of structuring your data into tables and defining how those tables relate to each other.&lt;/p&gt;

&lt;p&gt;In Power BI, this happens in the Model view, where you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Organize tables&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create relationships&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Decide filter directions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Design a structure that supports efficient analysis&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of data modelling as &lt;strong&gt;designing the blueprint&lt;/strong&gt; before building the house.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fact Tables vs Dimension Tables
&lt;/h2&gt;

&lt;p&gt;Before discussing schemas, it’s important to understand the two main table types.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fact Tables
&lt;/h2&gt;

&lt;p&gt;Fact tables store measurable, numerical data.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Sales amount&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Quantity sold&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Revenue&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cost&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Profit&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Characteristics:
&lt;/h3&gt;

&lt;p&gt;Usually very large&lt;/p&gt;

&lt;p&gt;Contain foreign keys to dimensions&lt;/p&gt;

&lt;p&gt;Contain metrics used in calculations&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
Fact_Sales&lt;/p&gt;

&lt;p&gt;DateKey ProductKey  CustomerKey SalesAmount Quantity&lt;br&gt;
Dimension Tables&lt;/p&gt;

&lt;p&gt;Dimension tables store descriptive attributes used for filtering and grouping.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Product name&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Customer name&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Region&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Category&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;-Date details&lt;/p&gt;

&lt;h2&gt;
  
  
  Characteristics:
&lt;/h2&gt;

&lt;p&gt;Smaller than fact tables&lt;/p&gt;

&lt;p&gt;Contain descriptive columns&lt;/p&gt;

&lt;p&gt;Used in slicers and axes&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
Dim_Product&lt;/p&gt;

&lt;p&gt;ProductKey  ProductName Category    Brand&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5f9yp5c9j0mxih0esnzl.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5f9yp5c9j0mxih0esnzl.jpg" alt=" " width="236" height="133"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is a Star Schema?
&lt;/h2&gt;

&lt;p&gt;The star schema is the recommended and most efficient data model for Power BI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Structure
&lt;/h2&gt;

&lt;p&gt;One central fact table&lt;/p&gt;

&lt;p&gt;Multiple dimension tables&lt;/p&gt;

&lt;p&gt;Dimensions connect directly to the fact table&lt;/p&gt;

&lt;p&gt;The model visually resembles a &lt;strong&gt;star&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;Dim_Date     Dim_Product     Dim_Customer&lt;br&gt;
     \            |               /&lt;br&gt;
              Fact_Sales&lt;/p&gt;

&lt;p&gt;Why Star Schema Is Best for Power BI&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple relationships&lt;/li&gt;
&lt;li&gt;Faster performance&lt;/li&gt;
&lt;li&gt;Easier DAX calculations&lt;/li&gt;
&lt;li&gt;Clear filter flow&lt;/li&gt;
&lt;li&gt;Easier to understand and maintain&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Power BI’s engine &lt;strong&gt;VertiPaq&lt;/strong&gt; is optimized for star schemas.&lt;/p&gt;

&lt;p&gt;Example Star Schema in Power BI&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Fact_Sales&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dim_Date&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dim_Product&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dim_Customer&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dim_Region&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each dimension connects one-to-many to the fact table.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is a Snowflake Schema?
&lt;/h2&gt;

&lt;p&gt;A snowflake schema is a variation of the star schema where dimension tables are further normalized into sub-dimensions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Structure
&lt;/h2&gt;

&lt;p&gt;Fact table at the center&lt;/p&gt;

&lt;p&gt;Dimension tables split into multiple related tables&lt;/p&gt;

&lt;p&gt;More relationships&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;Dim_Product → Dim_Category&lt;br&gt;
        \&lt;br&gt;
       Fact_Sales&lt;/p&gt;

&lt;p&gt;When Snowflake Schema Appears&lt;/p&gt;

&lt;p&gt;Data comes directly from normalized databases&lt;/p&gt;

&lt;p&gt;Dimensions have many hierarchical levels&lt;/p&gt;

&lt;p&gt;Storage optimization is a priority&lt;/p&gt;

&lt;p&gt;Drawbacks in Power BI&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More complex relationships&lt;/li&gt;
&lt;li&gt;Slower performance&lt;/li&gt;
&lt;li&gt;Harder DAX calculations&lt;/li&gt;
&lt;li&gt;Confusing filter paths&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In Power BI, &lt;strong&gt;&lt;em&gt;denormalizing dimensions back into a star schema is usually recommended&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Relationships in Power BI
&lt;/h2&gt;

&lt;p&gt;Relationships define how tables filter each other.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common Relationship Type&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;One-to-Many (1:*)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dimension (1)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fact (*)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Dim_Product[ProductKey] → Fact_Sales[ProductKey]&lt;/p&gt;

&lt;p&gt;**Relationship Direction##&lt;/p&gt;

&lt;p&gt;Power BI relationships usually use:&lt;/p&gt;

&lt;p&gt;Single direction (Dimension → Fact)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Avoid:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Bi-directional filters unless absolutely necessary&lt;/p&gt;

&lt;p&gt;Many-to-many relationships (performance risk)&lt;/p&gt;

&lt;p&gt;Why Good Data Modelling Is Critical&lt;br&gt;
  ##Performance##&lt;/p&gt;

&lt;p&gt;Star schema reduces joins&lt;/p&gt;

&lt;p&gt;Smaller, denormalized dimensions compress better&lt;/p&gt;

&lt;p&gt;Faster report loading and interactions&lt;/p&gt;

&lt;h2&gt;
  
  
  Accurate Calculations
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Bad models cause:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Double counting&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Incorrect totals&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Broken time intelligence&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Good models ensure:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Correct aggregation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Predictable DAX behavior&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;##Simpler DAX##&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With a clean star schema:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Measures are shorter&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Logic is clearer&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Debugging is easier&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;Total Sales = SUM(Fact_Sales[SalesAmount])&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No complex filters needed.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;##Easier Maintenance##&lt;/p&gt;

&lt;p&gt;Adding new visuals is straightforward&lt;/p&gt;

&lt;p&gt;New measures don’t break existing reports&lt;/p&gt;

&lt;p&gt;New data sources integrate cleanly&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best Practices for Power BI Data Modelling&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use star schema whenever possible&lt;/li&gt;
&lt;li&gt;Separate facts and dimensions&lt;/li&gt;
&lt;li&gt;Avoid unnecessary bi-directional filters&lt;/li&gt;
&lt;li&gt;Use surrogate keys (IDs)&lt;/li&gt;
&lt;li&gt;Flatten snowflake dimensions when possible&lt;/li&gt;
&lt;li&gt;Validate relationships early&lt;/li&gt;
&lt;li&gt;Keep the model simple and readable&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In Power BI, great visuals come from great models.&lt;/p&gt;

&lt;p&gt;You can have the best charts in the world, but without:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Proper schemas&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Clean relationships&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Well-defined fact and dimension tables&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;your reports will be slow, inaccurate, and difficult to trust.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Mastering data modelling is what separates a Power BI user from a Power BI professional.&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>data</category>
      <category>microsoft</category>
      <category>performance</category>
    </item>
    <item>
      <title>Introduction to Linux for Data Engineers, Including Practical Use of Vi and Nano with Examples</title>
      <dc:creator>Wangeci Ndovu</dc:creator>
      <pubDate>Mon, 26 Jan 2026 16:34:43 +0000</pubDate>
      <link>https://dev.to/wangeci_ndovu/introduction-to-linux-for-data-engineers-including-practical-use-of-vi-and-nano-with-examples-27nj</link>
      <guid>https://dev.to/wangeci_ndovu/introduction-to-linux-for-data-engineers-including-practical-use-of-vi-and-nano-with-examples-27nj</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe3spu382g96momptohy6.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe3spu382g96momptohy6.jpg" alt=" " width="736" height="460"&gt;&lt;/a&gt;#Introduction#&lt;/p&gt;

&lt;p&gt;Linux is one of the most important technologies behind modern data systems. While many beginners focus first on programming languages like Python or SQL, most real-world data engineering work happens on Linux-based systems. Understanding Linux basics—especially how to work with files using terminal editors—is a key step in becoming a confident data engineer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This article introduces Linux from a beginner’s perspective,&lt;/strong&gt; explains why it matters in data engineering, and demonstrates practical text editing using Vi and Nano, supported by real terminal examples.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Linux Is Important for Data Engineers
&lt;/h2&gt;

&lt;p&gt;Most data engineers do not work only on personal computers. Instead, they manage and maintain:&lt;/p&gt;

&lt;p&gt;Cloud servers (AWS EC2, Google Compute Engine, Azure VMs)&lt;/p&gt;

&lt;p&gt;Big data platforms (Hadoop, Spark, Kafka)&lt;/p&gt;

&lt;p&gt;Workflow tools (Airflow, Luigi)&lt;/p&gt;

&lt;p&gt;Databases and data warehouses&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;All these systems primarily run on Linux&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Key benefits of Linux in data engineering&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Server dominance&lt;/strong&gt; Linux is the default operating system for servers&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stability&lt;/strong&gt; Data pipelines can run for days or weeks without interruption&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automation&lt;/strong&gt; Linux supports scripting and scheduling with ease&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost-effective&lt;/strong&gt; Open-source and widely supported&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Command-line power&lt;/strong&gt; Faster and more precise than graphical interfaces&lt;/p&gt;

&lt;p&gt;For these reasons, Linux skills are often listed as a core requirement in data engineering job descriptions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3qlbesoup78s0c4r9neh.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3qlbesoup78s0c4r9neh.jpg" alt=" " width="736" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Comfortable with the Linux Terminal
&lt;/h2&gt;

&lt;p&gt;The Linux terminal allows users to interact with the system using text commands.&lt;/p&gt;

&lt;p&gt;Example terminal prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ndovu@NDOVU:~$
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Explanation:&lt;/p&gt;

&lt;p&gt;ndovu → username&lt;/p&gt;

&lt;p&gt;NDOVU → computer name&lt;/p&gt;

&lt;p&gt;~ → home directory&lt;/p&gt;

&lt;p&gt;$ → ready to accept commands&lt;/p&gt;

&lt;p&gt;Essential Linux Commands for Beginners&lt;br&gt;
&lt;strong&gt;Checking Your Current Location&lt;/strong&gt;&lt;br&gt;
pwd&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;/home/ndovu&lt;/p&gt;

&lt;p&gt;This command shows the current directory you are working in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Viewing Files and Directories&lt;/strong&gt;&lt;br&gt;
ls&lt;/p&gt;

&lt;p&gt;Sample output:&lt;/p&gt;

&lt;p&gt;data  scripts  notes.txt&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To see detailed information&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;ls -l&lt;br&gt;
&lt;strong&gt;Creating Directories&lt;/strong&gt;&lt;br&gt;
mkdir pipelines&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating multiple levels at once&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;mkdir -p data/raw data/processed&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating Empty Files&lt;/strong&gt;&lt;br&gt;
touch readme.txt&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Moving Between Directories&lt;/strong&gt;&lt;br&gt;
cd data&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Go back one level&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;cd ..&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Text Editors Matter in Linux&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data engineers frequently edit:&lt;/p&gt;

&lt;p&gt;Configuration files&lt;/p&gt;

&lt;p&gt;Shell scripts&lt;/p&gt;

&lt;p&gt;SQL and Python files&lt;/p&gt;

&lt;p&gt;Log files&lt;/p&gt;

&lt;p&gt;On Linux servers, graphical editors are often unavailable. This is why terminal-based editors such as Nano and Vi are essential.&lt;/p&gt;
&lt;h1&gt;
  
  
  Editing Files with Nano (Beginner Friendly)
&lt;/h1&gt;

&lt;p&gt;Nano is easy to learn and ideal for beginners.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Opening a File with Nano&lt;/strong&gt;&lt;br&gt;
nano readme.txt&lt;/p&gt;

&lt;p&gt;If the file does not exist, Nano creates it automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Writing Content in Nano&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Type the following text&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This project contains data engineering examples.&lt;br&gt;
Linux is essential for managing pipelines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Saving and Closing Nano&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At the bottom of the screen, Nano shows helpful shortcuts:&lt;/p&gt;

&lt;p&gt;^O Write Out   ^X Exit&lt;/p&gt;

&lt;p&gt;Steps:&lt;/p&gt;

&lt;p&gt;Press CTRL + O to save&lt;/p&gt;

&lt;p&gt;Press Enter to confirm&lt;/p&gt;

&lt;p&gt;Press CTRL + X to exit&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Confirming the File Content&lt;/strong&gt;&lt;br&gt;
cat readme.txt&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Expected output&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This project contains data engineering examples.&lt;br&gt;
Linux is essential for managing pipelines.&lt;br&gt;
&lt;strong&gt;Editing Files with Vi (Industry Standard)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Vi (or Vim) is more complex than Nano but extremely powerful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Opening a File Using Vi&lt;/strong&gt;&lt;br&gt;
vi config.conf&lt;/p&gt;

&lt;p&gt;Vi starts in command mode, not insert mode.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Switching to Insert Mode&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Press&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;i&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Now type&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;source=mysql&lt;br&gt;
format=csv&lt;br&gt;
target=hdfs&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Saving and Exiting Vi&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Press ESC to return to command mode&lt;/p&gt;

&lt;p&gt;Type:&lt;/p&gt;

&lt;p&gt;:wq&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Press Enter&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common Vi Commands&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Command Description&lt;br&gt;
i   Enter insert mode&lt;br&gt;
ESC Return to command mode&lt;br&gt;
:w  Save file&lt;br&gt;
:q  Quit&lt;br&gt;
:wq Save and quit&lt;br&gt;
:q! Quit without saving&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Practical Data Engineering Scenario&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A common task for a data engineer is editing pipeline configurations on a remote server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh user@analytics-server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;cd /etc/pipelines&lt;br&gt;
vi ingestion.conf&lt;/p&gt;

&lt;p&gt;&lt;em&gt;File content example&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;source=kafka&lt;br&gt;
format=json&lt;br&gt;
target=data_lake&lt;/p&gt;

&lt;p&gt;This simple task reflects real production work done daily by data engineers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Terminal Editors Are Still Relevant
&lt;/h2&gt;

&lt;p&gt;They work on remote servers&lt;/p&gt;

&lt;p&gt;No graphical interface required&lt;/p&gt;

&lt;p&gt;Lightweight and fast&lt;/p&gt;

&lt;p&gt;Essential for troubleshooting production issues&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fky8993dfy4qn7h8jtzvq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fky8993dfy4qn7h8jtzvq.jpg" alt=" " width="735" height="503"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Linux is a foundational skill for data engineers. By learning basic commands and mastering text editors like Nano and Vi, beginners gain the confidence to work on real servers and real data systems.&lt;/p&gt;

&lt;p&gt;Starting with Nano and gradually learning Vi is a practical approach that prepares you for professional data engineering environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Learn Next
&lt;/h2&gt;

&lt;p&gt;Linux file permissions (chmod, chown)&lt;/p&gt;

&lt;p&gt;Shell scripting basics&lt;/p&gt;

&lt;p&gt;Running Python and SQL scripts on Linux&lt;/p&gt;

&lt;p&gt;Exploring Spark and Airflow on Linux&lt;/p&gt;

&lt;p&gt;With consistent practice, Linux will become a powerful and natural tool in your data engineering journey.&lt;/p&gt;

&lt;h2&gt;
  
  
  Happy coding
&lt;/h2&gt;

</description>
    </item>
    <item>
      <title>Understanding Git: How to Track Changes, Push, and Pull Code Like a Pro</title>
      <dc:creator>Wangeci Ndovu</dc:creator>
      <pubDate>Fri, 16 Jan 2026 11:27:47 +0000</pubDate>
      <link>https://dev.to/wangeci_ndovu/understanding-git-how-to-track-changes-push-and-pull-code-like-a-pro-226e</link>
      <guid>https://dev.to/wangeci_ndovu/understanding-git-how-to-track-changes-push-and-pull-code-like-a-pro-226e</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyuh8tvamof9372wted01.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyuh8tvamof9372wted01.jpg" alt=" " width="735" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  code 101
&lt;/h1&gt;

&lt;p&gt;When you start writing code, you quickly realize something:&lt;br&gt;
&lt;strong&gt;&lt;em&gt;files change, mistakes happen, and things break&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So how do professional developers keep track of what changed, how it changed, how to correct it or in words, &lt;strong&gt;how to go back in time when something breaks?&lt;/strong&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  This article simply explains:
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;How Git tracks changes&lt;/li&gt;
&lt;li&gt;How to push code to GitHub&lt;/li&gt;
&lt;li&gt;How to pull data from GitHub&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;beginner friendly&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What is Version Control
&lt;/h2&gt;

&lt;p&gt;Version control is a system that &lt;strong&gt;keeps track of:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Every change you make to your code&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When the change happened&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Who made the change&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What exactly was modified&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it like Google Docs history for your code.&lt;/p&gt;

&lt;p&gt;If your code breaks, Git lets you correct it to a working version.&lt;/p&gt;

&lt;p&gt;Git also helps you collaborate with other like minded individuals in shared projects.&lt;/p&gt;
&lt;h3&gt;
  
  
  What is GitHub?
&lt;/h3&gt;

&lt;p&gt;GitHub is a cloud platform where Git repositories are stored online.&lt;br&gt;
&lt;strong&gt;You use&lt;/strong&gt;&lt;br&gt;
Git on your computer&lt;br&gt;
GitHub to back it up and share it with others&lt;/p&gt;
&lt;h3&gt;
  
  
  How to create a Git Repository
&lt;/h3&gt;

&lt;p&gt;A repository is a folder that Git tracks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a hidden .git folder inside your project.&lt;br&gt;
Now Git is watching this directory.&lt;/p&gt;
&lt;h2&gt;
  
  
  Save a Version (Commit)
&lt;/h2&gt;

&lt;p&gt;A commit is a snapshot of your project.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git commit -m "Add file"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now Git has stored that version forever.&lt;br&gt;
You can always go back to it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Push Code to GitHub
&lt;/h2&gt;

&lt;p&gt;First connect your project to GitHub:&lt;/p&gt;

&lt;p&gt;git remote add origin &lt;a href="mailto:git@github.com"&gt;git@github.com&lt;/a&gt;:yourname/yourrepo.git&lt;/p&gt;

&lt;p&gt;Push:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git push -u origin main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: The default branch name is often main. If yours is "master" &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;use&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git push -u origin master.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your code is now safely stored online.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pull Code from GitHub
&lt;/h2&gt;

&lt;p&gt;If someone else updates the repository, or you work from another computer:&lt;br&gt;
Navigate to your local repository directory using the &lt;em&gt;command&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, if your repository is in a folder named &lt;strong&gt;Mombasa&lt;/strong&gt; then,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd Mombasa
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ensure you are on the correct branch by using the &lt;em&gt;command&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pull the latest changes from the remote repository using the git pull command&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git pull
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This downloads the latest changes into your project.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Git Tracks Changes
&lt;/h2&gt;

&lt;p&gt;When you edit a file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Git will show:&lt;/p&gt;

&lt;p&gt;modified: Mombasa&lt;/p&gt;

&lt;p&gt;To save the change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git add Mombasa
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;then&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git commit -m "Mombasa"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Git stores only what changed, not the entire file.&lt;/p&gt;

&lt;p&gt;This makes Git fast and powerful.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Git is a Superpower
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;With Git you can&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Undo mistakes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Work on features safely&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Collaborate with others&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Track project history&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Work on multiple versions&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why every professional developer uses Git.&lt;/p&gt;

&lt;h2&gt;
  
  
  conclusion
&lt;/h2&gt;

&lt;p&gt;Git is not just a tool — it is how software is built.&lt;/p&gt;

&lt;p&gt;Once you understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;add&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;commit&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;push&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;pull&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can work on any real-world engineering project.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
