<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Lorraine Njagi</title>
    <description>The latest articles on DEV Community by Lorraine Njagi (@lorraine_njagi_k).</description>
    <link>https://dev.to/lorraine_njagi_k</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3831265%2Fe5ce8432-41d0-4313-b95f-4d0c072c97e2.png</url>
      <title>DEV Community: Lorraine Njagi</title>
      <link>https://dev.to/lorraine_njagi_k</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lorraine_njagi_k"/>
    <language>en</language>
    <item>
      <title>Mastering SQL Fundamentals: From Data Definition to Data Transformation</title>
      <dc:creator>Lorraine Njagi</dc:creator>
      <pubDate>Tue, 14 Apr 2026 19:47:39 +0000</pubDate>
      <link>https://dev.to/lorraine_njagi_k/mastering-sql-fundamentals-from-data-definition-to-data-transformation-1mfg</link>
      <guid>https://dev.to/lorraine_njagi_k/mastering-sql-fundamentals-from-data-definition-to-data-transformation-1mfg</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Structured Query Language (SQL) is a standardized language used for managing and interacting with relational databases. It enables users to define database structures, manipulate stored data, and retrieve information efficiently. SQL plays a critical role in modern data management systems, making it an essential skill in data-related fields.&lt;/p&gt;

&lt;p&gt;SQL commands are broadly categorized into different groups, among which &lt;strong&gt;Data Definition Language (DDL)&lt;/strong&gt; and &lt;strong&gt;Data Manipulation Language (DML)&lt;/strong&gt; are fundamental. This article explores these categories, demonstrates their application, and reflects on the learning experience gained through practical exercises.&lt;/p&gt;




&lt;h2&gt;
  
  
  Data Definition Language (DDL)
&lt;/h2&gt;

&lt;p&gt;Data Definition Language (DDL) refers to the set of SQL commands used to define and manage the structure of a database. These commands are concerned with creating, modifying, and deleting database objects.&lt;/p&gt;

&lt;p&gt;Key DDL commands include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CREATE&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
This command is used to create database objects such as tables, schemas, and databases.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ALTER&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The &lt;code&gt;ALTER&lt;/code&gt; command is used to modify the structure of existing database objects, such as renaming a column or changing a data type.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;DROP&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
This command is used to permanently remove database objects such as tables or databases.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DDL operations focus on structure rather than the data itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  Data Manipulation Language (DML)
&lt;/h2&gt;

&lt;p&gt;Data Manipulation Language (DML) consists of SQL commands used to interact with and manipulate data within a database.&lt;/p&gt;

&lt;h3&gt;
  
  
  INSERT
&lt;/h3&gt;

&lt;p&gt;Used to add new records into a table.&lt;/p&gt;

&lt;h3&gt;
  
  
  SELECT
&lt;/h3&gt;

&lt;p&gt;Used to retrieve data from one or more tables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;column_name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  UPDATE
&lt;/h3&gt;

&lt;p&gt;Used to modify existing records.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Nairobi'&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Esther'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Akinyi'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  DELETE
&lt;/h3&gt;

&lt;p&gt;Used to remove records.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;result_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These commands demonstrate how data within a database can be dynamically managed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Filtering Data Using the WHERE Clause
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;WHERE&lt;/code&gt; clause is used to filter records based on specific conditions, allowing for precise data retrieval.&lt;/p&gt;

&lt;p&gt;Common operators include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;=&lt;/code&gt; (equal to)
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;gt;&lt;/code&gt; (greater than)
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;BETWEEN&lt;/code&gt; (range of values)
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;IN&lt;/code&gt; (multiple values)
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;LIKE&lt;/code&gt; (pattern matching)
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Nairobi'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Mombasa'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Kisumu'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'A%'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Filtering enables users to extract meaningful subsets of data from larger datasets.&lt;/p&gt;




&lt;h2&gt;
  
  
  Data Transformation Using CASE WHEN
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;CASE WHEN&lt;/code&gt; statement allows conditional logic to be applied within SQL queries, transforming raw data into meaningful categories.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;marks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;CASE&lt;/span&gt;
        &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Distinction'&lt;/span&gt;
        &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Merit'&lt;/span&gt;
        &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Pass'&lt;/span&gt;
        &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="s1"&gt;'Fail'&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;performance&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach is particularly useful in reporting and data analysis.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reflection
&lt;/h2&gt;

&lt;p&gt;The learning experience throughout was both challenging and intellectually engaging. One of the main challenges was ensuring accuracy in SQL syntax, as even small mistakes could lead to errors. This emphasized the importance of precision when writing queries.&lt;/p&gt;

&lt;p&gt;Despite these challenges, I found querying to be highly enjoyable. I particularly enjoyed interacting with the database and observing how different SQL commands produce meaningful outputs. I found querying both interesting and engaging, and each successful query provided a sense of accomplishment.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;CASE WHEN&lt;/code&gt; statement was especially insightful, as it demonstrated how raw data can be transformed into meaningful categories. This experience has significantly increased my interest in data manipulation and analysis, and I am motivated to continue improving my SQL skills.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;SQL provides powerful tools for defining and managing databases and manipulating data. Understanding the distinction between DDL and DML, along with filtering and transformation techniques, is essential for effective data management.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>dataengineering</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Publishing and Embedding Power BI Reports into a Website</title>
      <dc:creator>Lorraine Njagi</dc:creator>
      <pubDate>Mon, 06 Apr 2026 16:26:47 +0000</pubDate>
      <link>https://dev.to/lorraine_njagi_k/publishing-and-embedding-power-bi-reports-into-a-website-4eba</link>
      <guid>https://dev.to/lorraine_njagi_k/publishing-and-embedding-power-bi-reports-into-a-website-4eba</guid>
      <description>&lt;h1&gt;
  
  
  Publishing and Embedding Power BI Reports into a Website
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;As a seasonal Power BI user, I always found it to be an easy and simple tool to use. Well, this perspective affected how much I was able to utilize it. This resulted in me not seeing it as much of a challenge, hence not putting it into practice as much as I should have.&lt;/p&gt;

&lt;p&gt;The first time I used Power BI was during a bootcamp, and it was the easier part of the program before getting to the more challenging and time-demanding modules related to machine learning.&lt;/p&gt;

&lt;p&gt;I recently got back to Power BI after enrolling in a course that includes it as part of the modules taught. Honestly, Power BI is easy, but it requires practice and one has to be keen just like with any other tool. One is more likely to forget what they learned and stay stuck in the beginner loop.&lt;/p&gt;

&lt;p&gt;Well, this article is not about me. I will cover how to &lt;strong&gt;publish and embed your Power BI reports into your websites or web pages&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Power BI?
&lt;/h2&gt;

&lt;p&gt;Power BI makes data visual. This is the tool that makes it easy to communicate what our data is about to stakeholders. Through Power BI, you tell a story about your data.&lt;/p&gt;

&lt;p&gt;After working on your report, you need to &lt;strong&gt;save it and publish&lt;/strong&gt; it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1 — Save Your Report
&lt;/h2&gt;

&lt;p&gt;To save your report, click on the &lt;strong&gt;File icon&lt;/strong&gt; on the top left of your screen.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa86grrbij6o5v0pqlsvy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa86grrbij6o5v0pqlsvy.png" alt="Save Report" width="61" height="46"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2 — Publish Your Report
&lt;/h2&gt;

&lt;p&gt;To publish:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to &lt;strong&gt;Home&lt;/strong&gt; on the ribbon&lt;/li&gt;
&lt;li&gt;At the far right, you will see the &lt;strong&gt;Publish&lt;/strong&gt; button&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkkeywy0a89earpmp0i37.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkkeywy0a89earpmp0i37.png" alt="Publish Report" width="800" height="75"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After selecting publish, it will ask you to select a &lt;strong&gt;workspace&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
If you do not have a workspace, you have to log into your &lt;strong&gt;Power BI account&lt;/strong&gt; and create one.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; A workspace is only available in &lt;strong&gt;Power BI Service&lt;/strong&gt; and not available in &lt;strong&gt;Power BI Desktop&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff5it6mveuxudvjtnwb7s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff5it6mveuxudvjtnwb7s.png" alt="Workspace" width="582" height="870"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3 — Embedding to a Website
&lt;/h2&gt;

&lt;p&gt;After publishing:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to your &lt;strong&gt;Workspace&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Open the &lt;strong&gt;Report&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Navigate to &lt;strong&gt;File&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Click on &lt;strong&gt;Embed report&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx50gwqak2x40mr1ijo7e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx50gwqak2x40mr1ijo7e.png" alt="Embed Report" width="301" height="487"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After clicking on &lt;strong&gt;Embed report&lt;/strong&gt;, select &lt;strong&gt;Website&lt;/strong&gt; and copy the &lt;strong&gt;iframe HTML&lt;/strong&gt; to paste into your web page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F127jy29dl0lxvp6e6w86.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F127jy29dl0lxvp6e6w86.png" alt="Embed Code" width="800" height="412"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Power BI is a powerful tool for data visualization and storytelling. Publishing reports to Power BI Service and embedding them into websites allows you to share insights with a wider audience in an interactive and visually appealing way.&lt;/p&gt;

&lt;p&gt;However, like any other tool, Power BI requires &lt;strong&gt;practice and consistency&lt;/strong&gt; to master and fully utilize its capabilities.&lt;/p&gt;

</description>
      <category>powerplatform</category>
      <category>dataengineering</category>
      <category>visualization</category>
      <category>luxdevhq</category>
    </item>
    <item>
      <title>Understanding Data Modeling in Power BI: Joins, Relationships, and Schemas Explained</title>
      <dc:creator>Lorraine Njagi</dc:creator>
      <pubDate>Mon, 30 Mar 2026 21:05:40 +0000</pubDate>
      <link>https://dev.to/lorraine_njagi_k/understanding-data-modeling-in-power-bi-joins-relationships-and-schemas-explained-4klb</link>
      <guid>https://dev.to/lorraine_njagi_k/understanding-data-modeling-in-power-bi-joins-relationships-and-schemas-explained-4klb</guid>
      <description>&lt;p&gt;Data modeling is how you structure your tables so Power BI understands how they connect. Get it right, and your reports are fast, accurate, and easy to maintain. Get it wrong, and you'll spend hours debugging.&lt;/p&gt;




&lt;h2&gt;
  
  
  SQL Joins in Power Query
&lt;/h2&gt;

&lt;p&gt;Joins combine tables into one and happen in &lt;strong&gt;Power Query&lt;/strong&gt; (Transform Data). Use them when you need a flat table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sample tables:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;CustomerID&lt;/th&gt;
&lt;th&gt;CustomerName&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;John Smith&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Sarah Jones&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Mike Brown&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Emma Wilson&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;OrderID&lt;/th&gt;
&lt;th&gt;CustomerID&lt;/th&gt;
&lt;th&gt;Amount&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;103&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;104&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  INNER JOIN
&lt;/h3&gt;

&lt;p&gt;Returns only matching rows from both tables.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Customers 1 and 2 with their orders.&lt;/p&gt;

&lt;h3&gt;
  
  
  LEFT JOIN
&lt;/h3&gt;

&lt;p&gt;Returns all rows from left table, matching from right.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; All customers; customers 3 and 4 show NULL for orders.&lt;/p&gt;

&lt;h3&gt;
  
  
  RIGHT JOIN
&lt;/h3&gt;

&lt;p&gt;Returns all rows from right table, matching from left.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; All orders; order 104 shows NULL for customer.&lt;/p&gt;

&lt;h3&gt;
  
  
  FULL OUTER JOIN
&lt;/h3&gt;

&lt;p&gt;Returns all rows from both tables.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; All customers and all orders; NULLs where no match.&lt;/p&gt;

&lt;h3&gt;
  
  
  LEFT ANTI JOIN
&lt;/h3&gt;

&lt;p&gt;Returns rows from left with no match in right.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Customers 3 and 4 (no orders).&lt;/p&gt;

&lt;h3&gt;
  
  
  RIGHT ANTI JOIN
&lt;/h3&gt;

&lt;p&gt;Returns rows from right with no match in left.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Order 104 (orphaned order).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to create:&lt;/strong&gt; Merge Queries &amp;gt; select tables &amp;gt; choose join kind &amp;gt; expand column.&lt;/p&gt;




&lt;h2&gt;
  
  
  Joins vs. Relationships
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Joins&lt;/th&gt;
&lt;th&gt;Relationships&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Power Query&lt;/td&gt;
&lt;td&gt;Model View&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Combine into one table&lt;/td&gt;
&lt;td&gt;Keep tables separate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Use for preparation&lt;/td&gt;
&lt;td&gt;Use for analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Best practice:&lt;/strong&gt; Use relationships unless you specifically need a flat table.&lt;/p&gt;




&lt;h2&gt;
  
  
  Power BI Relationships
&lt;/h2&gt;

&lt;p&gt;Create relationships in &lt;strong&gt;Model View&lt;/strong&gt; (drag between columns) or &lt;strong&gt;Manage Relationships&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Relationship Types
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;One-to-Many (1:M)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One row matches many rows. Most common.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Many-to-Many (M:M)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Many match many. Use bridge table if possible.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;One-to-One (1:1)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One matches one. Rare.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Active vs. Inactive
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Active (solid):&lt;/strong&gt; Used automatically&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inactive (dotted):&lt;/strong&gt; Use with &lt;code&gt;USERELATIONSHIP&lt;/code&gt; in DAX&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cardinality &amp;amp; Cross-Filter
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cardinality:&lt;/strong&gt; Many-to-One is default&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-filter:&lt;/strong&gt; Single direction is default; Both can cause issues&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Fact vs. Dimension Tables
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fact Tables&lt;/th&gt;
&lt;th&gt;Dimension Tables&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Numeric, measurable data&lt;/td&gt;
&lt;td&gt;Descriptive attributes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grows with transactions&lt;/td&gt;
&lt;td&gt;Relatively static&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Foreign keys&lt;/td&gt;
&lt;td&gt;Primary keys&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Examples: Sales, Orders&lt;/td&gt;
&lt;td&gt;Examples: Customers, Products, Date&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Schemas
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Star Schema
&lt;/h3&gt;

&lt;p&gt;Fact table in center, dimensions connected directly. &lt;strong&gt;Optimal for Power BI.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Snowflake Schema
&lt;/h3&gt;

&lt;p&gt;Dimensions normalized into sub-tables. Avoid if possible—hurts performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Flat Table
&lt;/h3&gt;

&lt;p&gt;One table with all data. Only for simple exports; avoid for reports.&lt;/p&gt;




&lt;h2&gt;
  
  
  Role-Playing Dimensions
&lt;/h2&gt;

&lt;p&gt;One dimension used multiple ways (e.g., Date table used for Order Date and Ship Date).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup:&lt;/strong&gt; One active relationship, one inactive, use &lt;code&gt;USERELATIONSHIP&lt;/code&gt; when needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Issues &amp;amp; Fixes
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Issue&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Circular dependencies&lt;/td&gt;
&lt;td&gt;Remove unnecessary relationship&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Many-to-many&lt;/td&gt;
&lt;td&gt;Add bridge table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bi-directional filtering&lt;/td&gt;
&lt;td&gt;Change to Single direction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Missing date table&lt;/td&gt;
&lt;td&gt;Create and mark date table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wrong cardinality&lt;/td&gt;
&lt;td&gt;Clean duplicates in dimension&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Quick Steps to Build a Model
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Load data&lt;/strong&gt; (Get Data)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clean in Power Query&lt;/strong&gt; (fix types, remove duplicates)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Create relationships&lt;/strong&gt; (drag in Model View)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configure&lt;/strong&gt; (Many-to-One, Single direction)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hide technical columns&lt;/strong&gt; (Properties &amp;gt; Is Hidden)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mark date table&lt;/strong&gt; (right-click &amp;gt; Mark as Date Table)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify&lt;/strong&gt; (Manage Relationships)&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;relationships&lt;/strong&gt; over joins&lt;/li&gt;
&lt;li&gt;Build &lt;strong&gt;star schema&lt;/strong&gt; with fact at center&lt;/li&gt;
&lt;li&gt;Set &lt;strong&gt;one-to-many&lt;/strong&gt; with &lt;strong&gt;single&lt;/strong&gt; cross-filter&lt;/li&gt;
&lt;li&gt;Create a &lt;strong&gt;date table&lt;/strong&gt; for time intelligence&lt;/li&gt;
&lt;li&gt;Clean data in &lt;strong&gt;Power Query&lt;/strong&gt;, model in &lt;strong&gt;Model View&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A solid model saves hours of troubleshooting. Get the foundation right first.&lt;/p&gt;

</description>
      <category>schemas</category>
      <category>datamodelling</category>
      <category>data</category>
      <category>datascience</category>
    </item>
    <item>
      <title>How Linux is Used in Real-World Data Engineering</title>
      <dc:creator>Lorraine Njagi</dc:creator>
      <pubDate>Sat, 28 Mar 2026 21:09:43 +0000</pubDate>
      <link>https://dev.to/lorraine_njagi_k/how-linux-is-used-in-real-world-data-engineering-3elk</link>
      <guid>https://dev.to/lorraine_njagi_k/how-linux-is-used-in-real-world-data-engineering-3elk</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;What is Data Engineering?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Data engineering refers to the transformation of data and preparing it for analysis or use by data analyst and data scientist. This is what ensures the infrastructure and the data to be used is in the right form. They convert vast amounts of raw data into usable data sets.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Why Linux is Used In Data Engineering?&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Most Cloud infrastructures such as &lt;strong&gt;AWS, Azure and GCP&lt;/strong&gt; run on Linux. They use Linux for their virtual machines and data services.&lt;/li&gt;
&lt;li&gt;Tools such as &lt;strong&gt;Kafka, Hadoop, Spark and Apache&lt;/strong&gt; are more suited by its open source ecosystem.&lt;/li&gt;
&lt;li&gt;Linux offers &lt;strong&gt;performance and stability&lt;/strong&gt; for running large data pipelines without needing reboots.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automation and Scripting&lt;/strong&gt;
Linux offers the command line CLI and tools such as &lt;strong&gt;CRON&lt;/strong&gt; which enable automation of data tasks and Extract Transform and Load (ETL) pipelines.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Linux Basics for Data Engineering&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;There are a few Linux basics that data engineers should be aware of.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;1. The File System Structure&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The Linux file system takes the structure of a &lt;strong&gt;tree&lt;/strong&gt;, with the starting point as the &lt;strong&gt;root (&lt;code&gt;/&lt;/code&gt;) directory&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Important directories under the root are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;/etc&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/var&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/bin&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/tmp&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;/etc&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;etc&lt;/code&gt; contains configuration files and folders. This folder controls the configuration of the entire system, how the OS and how the user behaves. For example the &lt;strong&gt;passwd&lt;/strong&gt; file which contains details about users.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;/var&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This folder contains variable data that changes continuously during system operations.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System logs&lt;/li&gt;
&lt;li&gt;Authorization logs&lt;/li&gt;
&lt;li&gt;Databases&lt;/li&gt;
&lt;li&gt;Runtime state files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is of importance to a data engineer in various ways such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/var/log/              &lt;span class="c"&gt;# Where spark, kafka and apache logs live&lt;/span&gt;
/var/lib/postgresql/   &lt;span class="c"&gt;# Actual PostgreSQL database storage&lt;/span&gt;
/var/spool/cron        &lt;span class="c"&gt;# Job queues for scheduled cron tasks&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;strong&gt;/bin&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This contains essential command line programs that are available for all users even in single use or recovery mode such as &lt;code&gt;cd&lt;/code&gt;, &lt;code&gt;ls&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt;   &lt;span class="c"&gt;# Used to change directory&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt;   &lt;span class="c"&gt;# Used to list files and directories&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;ls options:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt;   &lt;span class="c"&gt;# Include hidden files&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt;   &lt;span class="c"&gt;# Show file permissions and details&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;strong&gt;Single User / Recovery Mode&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Single user&lt;/strong&gt; or &lt;strong&gt;recovery mode&lt;/strong&gt; refers to a Linux special boot mode used for repair and maintenance.&lt;/p&gt;

&lt;p&gt;In this mode:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only one user logs in (&lt;strong&gt;root user&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;No networks are started&lt;/li&gt;
&lt;li&gt;No GUI is started&lt;/li&gt;
&lt;li&gt;Used for system repair (e.g password reset)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;2. File Permissions&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;File permissions are relevant to a data engineer because the role involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Moving data between systems&lt;/li&gt;
&lt;li&gt;Automatic running of scripts&lt;/li&gt;
&lt;li&gt;Handling sensitive credentials&lt;/li&gt;
&lt;li&gt;Maintaining data integrity&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Permission Table&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Permission&lt;/th&gt;
&lt;th&gt;Symbol&lt;/th&gt;
&lt;th&gt;Numerical Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Write&lt;/td&gt;
&lt;td&gt;w&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read&lt;/td&gt;
&lt;td&gt;r&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Execute&lt;/td&gt;
&lt;td&gt;x&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The command used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt;   &lt;span class="c"&gt;# change mode&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Example&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Create a directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;Yourname
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;touch &lt;/span&gt;file.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restrict permissions so only the owner has full access:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod &lt;/span&gt;700 file.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Allow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Group → read &amp;amp; execute&lt;/li&gt;
&lt;li&gt;Others → read only
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod &lt;/span&gt;731 file.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To change ownership:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chown&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;strong&gt;3. Disk Usage&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A data engineer should pay attention to disk usage because it directly affects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Performance&lt;/li&gt;
&lt;li&gt;Usage&lt;/li&gt;
&lt;li&gt;Cost&lt;/li&gt;
&lt;li&gt;Pipeline failures (when disk fills)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;du&lt;/span&gt; &lt;span class="nt"&gt;-sh&lt;/span&gt;   &lt;span class="c"&gt;# Check file size&lt;/span&gt;
&lt;span class="nb"&gt;df&lt;/span&gt; &lt;span class="nt"&gt;-h&lt;/span&gt;    &lt;span class="c"&gt;# Check disk space&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;strong&gt;4. Searching&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Data engineers handle large files, so searching is important.&lt;/p&gt;

&lt;p&gt;Commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;grep&lt;/span&gt;    &lt;span class="c"&gt;# Search within a file&lt;/span&gt;
find    &lt;span class="c"&gt;# Search for files and directories&lt;/span&gt;
locate  &lt;span class="c"&gt;# Fast search using index&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;strong&gt;5. Process Management&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;process&lt;/strong&gt; is a program that is running in memory.&lt;/p&gt;

&lt;p&gt;Commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ps                 &lt;span class="c"&gt;# View running processes&lt;/span&gt;
ps &lt;span class="nt"&gt;-u&lt;/span&gt; yourusername &lt;span class="c"&gt;# View your processes&lt;/span&gt;
top                &lt;span class="c"&gt;# Live process monitor&lt;/span&gt;
htop               &lt;span class="c"&gt;# Modern version of top&lt;/span&gt;
&lt;span class="nb"&gt;kill&lt;/span&gt;               &lt;span class="c"&gt;# End a process&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;strong&gt;Real World Data Engineering Workflow on Linux&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Linux servers are used to manage the entire data pipeline.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;data pipeline&lt;/strong&gt; refers to the entire process of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Data Collection → Cleaning → Formatting → Storage → Analysis → Presentation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;strong&gt;Data Pipelines&lt;/strong&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Batch Pipelines&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Process data in batches&lt;/li&gt;
&lt;li&gt;Scheduled (hourly, daily)&lt;/li&gt;
&lt;li&gt;Suitable for historical data&lt;/li&gt;
&lt;li&gt;Used for computationally expensive operations&lt;/li&gt;
&lt;li&gt;Tools: Apache, Hadoop, Airflow&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Realtime Processing Pipelines&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Data analyzed continuously as it flows&lt;/li&gt;
&lt;li&gt;Used in fraud detection and monitoring systems&lt;/li&gt;
&lt;li&gt;Requires realtime analysis&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Stages of a Data Pipeline&lt;/strong&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;1. Ingestion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This refers to bringing data from different sources into your system storage.&lt;/p&gt;

&lt;p&gt;Sources include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data lakes&lt;/li&gt;
&lt;li&gt;APIs&lt;/li&gt;
&lt;li&gt;IoT devices&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Two methods of ingestion&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Batch ingestion&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Streaming ingestion&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Ingestion Tool&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Apache Kafka&lt;/strong&gt; – sits between your data source and destination and uses streaming.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;2. Transformation&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cleaning&lt;/li&gt;
&lt;li&gt;Restructuring&lt;/li&gt;
&lt;li&gt;Enriching&lt;/li&gt;
&lt;li&gt;Standardization&lt;/li&gt;
&lt;li&gt;Aggregation&lt;/li&gt;
&lt;li&gt;Validation&lt;/li&gt;
&lt;li&gt;Filtering&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;3. Storage&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Storage types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Relational databases (SQL)&lt;/li&gt;
&lt;li&gt;Data warehouses (Azure Synapse, Amazon Redshift)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;4. Analysis&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Descriptive&lt;/td&gt;
&lt;td&gt;What happened&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Diagnostic&lt;/td&gt;
&lt;td&gt;Why it happened&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Predictive&lt;/td&gt;
&lt;td&gt;What will happen&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prescriptive&lt;/td&gt;
&lt;td&gt;What should be done&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Linux provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Control&lt;/li&gt;
&lt;li&gt;Automation&lt;/li&gt;
&lt;li&gt;Scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Automation includes scheduling jobs using &lt;strong&gt;CRON&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Dealing With Large Files in Linux&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Example: tail&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; /var/log/syslog
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Shows last 10 lines&lt;/li&gt;
&lt;li&gt;Keeps the file open&lt;/li&gt;
&lt;li&gt;Prints new lines in realtime&lt;/li&gt;
&lt;li&gt;Stops when you press &lt;strong&gt;CTRL + C&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This uses &lt;strong&gt;inotify&lt;/strong&gt;, a Linux kernel mechanism where the OS notifies tail when new lines are written.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Final Note&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This article does not delve so much into every aspect of Linux involved in data engineering. Looking forward to sharing more thoughts in my future articles. There is a first time for everything!!&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>linux</category>
      <category>linuxfordataengineering</category>
    </item>
  </channel>
</rss>
