<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: grace wambua</title>
    <description>The latest articles on DEV Community by grace wambua (@grace_wambua).</description>
    <link>https://dev.to/grace_wambua</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3831263%2F1bedc93a-8970-41f2-baae-080f9377930f.webp</url>
      <title>DEV Community: grace wambua</title>
      <link>https://dev.to/grace_wambua</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/grace_wambua"/>
    <language>en</language>
    <item>
      <title>The Python Data Analytics Handbook: A Guide for Beginners</title>
      <dc:creator>grace wambua</dc:creator>
      <pubDate>Sun, 17 May 2026 19:23:07 +0000</pubDate>
      <link>https://dev.to/grace_wambua/the-python-data-analytics-handbook-a-guide-for-beginners-4n0b</link>
      <guid>https://dev.to/grace_wambua/the-python-data-analytics-handbook-a-guide-for-beginners-4n0b</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Python is a computer programming language often used to build websites and software, automate tasks, and conduct data analysis.&lt;/p&gt;

&lt;p&gt;Python allows analysts to handle the entire data lifecycle, from collecting and cleaning raw data to performing complex statistical modeling and creating interactive visualizations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Python is popular in data analytics
&lt;/h2&gt;

&lt;p&gt;Python was designed to be human-friendly; easy to read and write.&lt;br&gt;
It has become the most popular tool in data analytics because it’s &lt;strong&gt;approachable&lt;/strong&gt; but &lt;strong&gt;powerful&lt;/strong&gt;. It does the heavy lifting for you, allowing you to focus on the story your data is telling rather than struggling with the tools themselves.&lt;br&gt;
For a beginner, Python isn't just another skill, it’s the bridge that takes you from being overwhelmed by numbers to actually making sense of them.&lt;/p&gt;
&lt;h2&gt;
  
  
  Essential Python Libraries for Data Analysis
&lt;/h2&gt;

&lt;p&gt;In Python, a &lt;em&gt;library&lt;/em&gt; is a collection of pre-written code that you can use to perform specific tasks without writing everything from scratch.&lt;br&gt;
These libraries act like ready-made toolkits, each designed for a specific part of the workflow, from data manipulation to visualization. Let's look at the most essential ones every analyst should know:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Pandas&lt;/strong&gt; (&lt;em&gt;Python Data Analysis&lt;/em&gt;)&lt;br&gt;
Pandas is an open-source tool for data manipulation. &lt;br&gt;
It is the most popular Python library for working with tabular data. It is similar to Excel sheets, but with far more power and flexibility. You can use it for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Easy data loading from CSV, Excel, SQL, and JSON.&lt;/li&gt;
&lt;li&gt;Filtering, grouping, and merging datasets.&lt;/li&gt;
&lt;li&gt;Handling missing data and time-series analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;student.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# load the file
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Math&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;  &lt;span class="c1"&gt;# print just the math column
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. NumPy&lt;/strong&gt; (&lt;em&gt;Numerical Python&lt;/em&gt;)&lt;br&gt;
NumPy is the foundational library for scientific computing in Python. &lt;br&gt;
The NumPy library contains multidimensional array data structures, such as the homogeneous, N-dimensional &lt;code&gt;ndarray&lt;/code&gt;, and a large library of functions that operate efficiently on these data structures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="n"&gt;prices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="c1"&gt;# list of numbers
&lt;/span&gt;&lt;span class="n"&gt;new_prices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prices&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;   &lt;span class="c1"&gt;# add 2 to every number
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_prices&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# print new list
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SciPy&lt;/strong&gt; (&lt;em&gt;Scientific Python&lt;/em&gt;)&lt;strong&gt;:&lt;/strong&gt;
Extends NumPy for advanced scientific and engineering computations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Matplotlib&lt;/strong&gt; &lt;br&gt;
Matplotlib is the most widely used Python library for creating static, animated, and interactive data visualizations. It has become a fundamental tool in data science and machine learning for exploring patterns and trends.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="n"&gt;students&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alex&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;John&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Anna&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;grades&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;92&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;58&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;students&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;grades&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skyblue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# create a bar chart
&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Student Math Grades&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# add labels to chart
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Student Name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Math Score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# display chart
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. Seaborn&lt;/strong&gt; &lt;br&gt;
While Matplotlib handles the basics, Seaborn builds on it with elegant styles and simpler syntax for statistical visualizations. It is perfect for visualizing relationships and distributions in your data. Seaborn automatically handles color palettes, aesthetics, and complex charts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;  &lt;span class="c1"&gt;# handles the graph layout and displays it on your screen
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;  &lt;span class="c1"&gt;# opens and structures your student.csv file
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;seaborn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;  &lt;span class="c1"&gt;# colors and styles the bars automatically based on your columns
&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;student.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# load student file
&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;barplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hue&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Subject&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;palette&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deep&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# draw the chart (Seaborn automatically reads the columns from df)
&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Student Performance Comparison&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# add titles 
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# show the window
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;5. Scikit-learn&lt;/strong&gt; (often imported as &lt;code&gt;sklearn&lt;/code&gt;) &lt;br&gt;
This is a Python open-source machine learning library. Built on top of core scientific libraries like &lt;code&gt;NumPy&lt;/code&gt;, &lt;code&gt;SciPy&lt;/code&gt;, and &lt;code&gt;Matplotlib&lt;/code&gt;, it provides simple and efficient tools for predictive data analysis.&lt;br&gt;
&lt;strong&gt;Note:&lt;/strong&gt; Scikit-learn requires data to be formatted as numerical arrays to perform its calculations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;  &lt;span class="c1"&gt;# it formats data into a 2D column grid so the machine learning model can read it
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;

&lt;span class="c1"&gt;# classes attended vs. final Score
&lt;/span&gt;&lt;span class="n"&gt;classes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;
&lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;  &lt;span class="c1"&gt;# the computer sees that more classes = higher score
&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LinearRegression&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;# create and train the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;classes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;#  predict the score for a student who attended 25 classes
&lt;/span&gt;&lt;span class="n"&gt;predicted_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt; 
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predicted_score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;6. Requests&lt;/strong&gt; &lt;br&gt;
Requests library is a simple and powerful tool to send HTTP requests and interact with web resources. It allows you to easily send &lt;code&gt;get&lt;/code&gt;, &lt;code&gt;post&lt;/code&gt;, &lt;code&gt;put&lt;/code&gt;, &lt;code&gt;delete&lt;/code&gt;, &lt;code&gt;patch&lt;/code&gt;, &lt;code&gt;head&lt;/code&gt; requests to web servers, handle responses, and work with REST APIs and web scraping tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;githubusercontent.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# connect to the raw student data file link on GitHub
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# print the raw data content from the internet
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Beautiful Soup:&lt;/strong&gt; For parsing HTML and XML data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Django &amp;amp; Flask:&lt;/strong&gt; Frameworks used to build powerful web applications.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TensorFlow &amp;amp; PyTorch:&lt;/strong&gt; Leading frameworks for deep learning and neural networks.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How Python is used to clean, analyze, and visualize data
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Data cleaning&lt;/strong&gt; is a critical step in any data analysis or machine learning project. &lt;br&gt;
Essential Python-specific steps include: &lt;strong&gt;&lt;em&gt;handling missing values&lt;/em&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;em&gt;standardizing data types&lt;/em&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;em&gt;removing duplicates&lt;/em&gt;&lt;/strong&gt;, and &lt;strong&gt;&lt;em&gt;identifying outliers&lt;/em&gt;&lt;/strong&gt; to ensure analysis accuracy.&lt;br&gt;
Here are some best practices to keep in mind as you streamline your data cleaning process:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Store raw data separately&lt;/strong&gt;&lt;br&gt;
Always keep the original!&lt;br&gt;
This is the number one most important tip when cleaning data. Keep a copy of the raw data files separate from the cleaned and processed versions. This ensures that you always have a reference point and can easily revert to the original data if needed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Document your data-cleaning code&lt;/strong&gt;&lt;br&gt;
Add comments to your code to explain the purpose of each cleaning step and any assumptions made.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Look out for unintended consequences&lt;/strong&gt;&lt;br&gt;
Make sure your data cleaning efforts aren’t significantly changing the distribution or introducing any unintended biases. Repeated data exploration after your cleaning efforts can help ensure you are on the right track.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Keep a data cleaning log&lt;/strong&gt;&lt;br&gt;
If you have a long cleaning process or one that is automated, you may want to maintain a separate document where you record the details of each cleaning step.&lt;br&gt;
Details such as the date, the specific action taken, and any issues encountered may be helpful down the road.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Write reusable functions&lt;/strong&gt;&lt;br&gt;
Identify common data cleaning tasks and encapsulate them into reusable functions. This allows you to apply the same cleaning steps to multiple datasets. This is especially helpful if you have company-specific abbreviations you want to map.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Moving to the &lt;strong&gt;Analysis&lt;/strong&gt; stage, this is the bridge where you extract meaning from your cleaned rows.&lt;br&gt;
Using tools like &lt;strong&gt;&lt;em&gt;Pandas&lt;/em&gt;&lt;/strong&gt;, you calculate averages, find the highest and lowest values, and group items into categories to see how they compare. &lt;br&gt;
This process takes a giant pile of cleaned data and summarizes it to a few important facts, so you know exactly what to show in your final charts.&lt;/p&gt;

&lt;p&gt;The final phase &lt;strong&gt;Visualization&lt;/strong&gt;, is where you turn those rows of numbers into insights that can easily be understood. &lt;br&gt;
This is achieved by using &lt;strong&gt;&lt;em&gt;Matplotlib&lt;/em&gt;&lt;/strong&gt; for foundational control, &lt;strong&gt;&lt;em&gt;Seaborn&lt;/em&gt;&lt;/strong&gt; for polished statistical graphics, and &lt;strong&gt;&lt;em&gt;Plotly&lt;/em&gt;&lt;/strong&gt; for interactivity. &lt;br&gt;
The process involves aggregating your data into summaries, selecting a chart type that fits the relationship and refining the output with clear labels and titles to ensure the data is immediately understood by the viewer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;An example use case for the process:&lt;/strong&gt;&lt;br&gt;
Python sorts through a Kenyan bank's records and organizes it automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cleaning:&lt;/strong&gt; fixing messy M-Pesa transaction texts and converting USD deposits into Kenya Shillings automatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analyzing:&lt;/strong&gt; calculating which customers qualify for loans based on how regularly a customer’s balance in M-Shwari or KCB M-Pesa is growing or shrinking over time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visualizing:&lt;/strong&gt; uses maps to show where to open new shops and gauges that warn managers the moment an ATM runs out of cash.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real-world examples of Python in data analytics
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Data Science and Data Analysis
&lt;/h3&gt;

&lt;p&gt;Python is the leading language in data science. It offers powerful tools for data manipulation, visualization, and machine learning.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Case Example:&lt;/strong&gt;  A data analyst uses Pandas to clean large CSV files and visualize sales trends with Matplotlib.  &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Libraries:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Pandas:&lt;/em&gt; Data cleaning and manipulation
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;NumPy:&lt;/em&gt; Numerical computation
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Matplotlib / Seaborn:&lt;/em&gt; Data visualization
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Education &amp;amp; Research
&lt;/h3&gt;

&lt;p&gt;Python is the preferred language in academia and research due to its simplicity and vast scientific libraries.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Case Example:&lt;/strong&gt;  Researchers use Python for data modeling in climate studies or biological simulations.  &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Libraries:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Jupyter Notebooks for documentation and experiments &lt;/li&gt;
&lt;li&gt;SymPy for symbolic math
&lt;/li&gt;
&lt;li&gt;SciPy for computations
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Healthcare
&lt;/h3&gt;

&lt;p&gt;Python is used in healthcare for image-based diagnostics and predictive analysis, to enable health care professionals determine the information they need to make the best decision possible regarding treatment plans.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Case Example:&lt;/strong&gt;  Algorithms help detect bone fractures, tumors, and early-stage breast cancer from mammograms.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Libraries:&lt;/em&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;NumPy &amp;amp; Pandas&lt;/em&gt; to clean, filter, and structure tabular patient metrics, lab stats, and vitals.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Scikit-learn&lt;/em&gt; to train standard medical risk-assessment machine learning algorithms.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;TensorFlow &amp;amp; PyTorch&lt;/em&gt; to power deep learning applications that automatically spot tumors in diagnostic imagery.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Machine Learning &amp;amp; Artificial Intelligence
&lt;/h3&gt;

&lt;p&gt;Python dominates ML and AI development. Its ecosystem allows you to train, test, and deploy intelligent models with ease.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Case Example:&lt;/strong&gt;  Netflix uses Python-based recommendation algorithms to personalize movie suggestions.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Libraries:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scikit-learn for traditional ML models
&lt;/li&gt;
&lt;li&gt;TensorFlow and PyTorch for deep learning
&lt;/li&gt;
&lt;li&gt;spaCy for NLP tasks
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Web Development
&lt;/h3&gt;

&lt;p&gt;Python simplifies web development with frameworks like Django, Flask, and FastAPI. These tools help build scalable and secure web applications quickly.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Case Example:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instagram: Uses Django for scalability.
&lt;/li&gt;
&lt;li&gt;Pinterest: Built with Flask for flexibility.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Libraries:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Django for rapid web app development
&lt;/li&gt;
&lt;li&gt;Flask for lightweight APIs
&lt;/li&gt;
&lt;li&gt;FastAPI for high-performance web services. &lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Web Scraping and Data Extraction
&lt;/h3&gt;

&lt;p&gt;Python helps gather and analyze online information at scale. Businesses use it for price tracking, content aggregation, and market research.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Case Example:&lt;/strong&gt;  E-commerce teams use Python scripts to monitor competitor pricing in real time.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Libraries:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requests&lt;/li&gt;
&lt;li&gt;BeautifulSoup
&lt;/li&gt;
&lt;li&gt;Scrapy
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7. Software Development and Testing
&lt;/h3&gt;

&lt;p&gt;Python is used for backend software development and test automation.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Case Example:&lt;/strong&gt;  Developers use Python to test APIs automatically after every code change.  &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Frameworks:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PyTest and Unittest for testing
&lt;/li&gt;
&lt;li&gt;Sphinx for documentation
&lt;/li&gt;
&lt;li&gt;Buildbot for CI/CD pipelines
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  8. Desktop GUI Applications
&lt;/h3&gt;

&lt;p&gt;Python supports GUI-based applications that run across platforms.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Case Example:&lt;/strong&gt;  Simple apps like file explorers or calculators are often built using Tkinter.  &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Libraries:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tkinter for basic GUIs
&lt;/li&gt;
&lt;li&gt;PyQt and Kivy for advanced interfaces
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  9. Internet of Things (IoT)
&lt;/h3&gt;

&lt;p&gt;Python connects hardware and sensors in IoT systems. It’s used for collecting and processing real-time data.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Case Example:&lt;/strong&gt;  Home automation systems that monitor temperature and lighting using Raspberry Pi. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Tools and Platforms:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Raspberry Pi
&lt;/li&gt;
&lt;li&gt;MicroPython
&lt;/li&gt;
&lt;li&gt;Adafruit Python SDKs
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  10. Game Development
&lt;/h3&gt;

&lt;p&gt;Python supports simple and mid-level game creation. It’s ideal for building prototypes, 2D games, and educational projects.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Case Example:&lt;/strong&gt;  Several well-known titles have used Python for their core logic or extensive modding capabilities like: &lt;em&gt;The Sims 4&lt;/em&gt;, &lt;em&gt;Battlefield 2&lt;/em&gt;, &lt;em&gt;Pacman&lt;/em&gt;, &lt;em&gt;Sudoku&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Tools and Frameworks:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pygame for 2D games
&lt;/li&gt;
&lt;li&gt;Panda3D for 3D graphics
&lt;/li&gt;
&lt;li&gt;Arcade for modern designs
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why beginners should learn Python
&lt;/h2&gt;

&lt;p&gt;Python is the perfect starting point because its syntax is as clear as plain English, allowing you to focus on logic rather than fighting complex code. Its massive global community means you’ll never get stuck, and its incredible versatility allows you to transition easily into high-paying fields like &lt;strong&gt;data science&lt;/strong&gt;, &lt;strong&gt;AI&lt;/strong&gt;, or &lt;strong&gt;web development&lt;/strong&gt;. &lt;br&gt;
By using powerful libraries that handle the heavy lifting for you, Python lets you build professional-grade projects like automating your bank statements or analyzing electricity bills, faster than almost any other language.&lt;/p&gt;

</description>
      <category>python</category>
      <category>dataanalysis</category>
    </item>
    <item>
      <title>A Beginner’s Pocket Guide to SQL Data Analysis</title>
      <dc:creator>grace wambua</dc:creator>
      <pubDate>Thu, 23 Apr 2026 20:34:36 +0000</pubDate>
      <link>https://dev.to/grace_wambua/a-beginners-pocket-guide-to-sql-data-analysis-2579</link>
      <guid>https://dev.to/grace_wambua/a-beginners-pocket-guide-to-sql-data-analysis-2579</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When you first start working with databases, it can feel like you’re looking at a giant, overwhelming spreadsheet. Aggregate functions are the tools that turn that mountain of data into meaningful stories.&lt;br&gt;
This article highlights 5 fundamental SQL functions that every analyst should be familiar with, providing a brief overview of their purpose and usage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. COUNT()&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;COUNT()&lt;/code&gt; function is used to count the number of rows that match a specified condition in a database table. It is particularly useful for aggregating data and understanding the size of datasets.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;Finance&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The query returns the total number of employees in the Finance department&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. AVG()&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;AVG()&lt;/code&gt; function calculates the average value of a numeric column. This function is essential for analyzing trends and performance metrics over time.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;Engineering&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This query provides the average salary of employees in the Engineering department.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. SUM()&lt;/strong&gt;&lt;br&gt;
The &lt;code&gt;SUM()&lt;/code&gt; function calculates the total sum of a numeric column. It is commonly used for financial data, sales records, or any other scenario where the sum of numeric values needs to be determined. &lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_revenue&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;GROUP BY&lt;/code&gt; clause tells the database how to group the rows before performing the calculation.&lt;br&gt;
This query calculates total sales per category. This shows which products are actually making the most money.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. MIN() and MAX()&lt;/strong&gt;&lt;br&gt;
The &lt;code&gt;MIN()&lt;/code&gt; and &lt;code&gt;MAX()&lt;/code&gt; functions retrieve the smallest and largest  values from a column.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="k"&gt;MIN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lowest_pay&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;highest_pay&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The query finds the highest and the lowest pay from the employees.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. CONCAT()&lt;/strong&gt;&lt;br&gt;
This is a String Function used to join two or more strings together into one.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
&lt;span class="n"&gt;CONCAT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;full_name&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This query combines the &lt;code&gt;first_name&lt;/code&gt; and &lt;code&gt;last_name&lt;/code&gt; columns to create a single &lt;code&gt;full_name&lt;/code&gt; field for reports.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Mastering these five functions takes you from simply "reading" data to actually analyzing it.&lt;br&gt;
This helps analysts transform raw, messy rows of information into clear insights; leading to more informed decision-making.  &lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>dataanalysis</category>
    </item>
    <item>
      <title>Managing Academic Data: A Practical Application of SQL in PostgreSQL</title>
      <dc:creator>grace wambua</dc:creator>
      <pubDate>Tue, 14 Apr 2026 01:16:43 +0000</pubDate>
      <link>https://dev.to/grace_wambua/managing-academic-data-a-practical-application-of-sql-in-postgresql-4e9d</link>
      <guid>https://dev.to/grace_wambua/managing-academic-data-a-practical-application-of-sql-in-postgresql-4e9d</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Databases are essential for storing and managing data in modern applications. Many organizations turn to PostgreSQL, a powerful, open-source object-relational database management system known for its reliability and data integrity. &lt;br&gt;
By leveraging Structured Query Language (SQL), we can build an interconnected system that not only stores information but also reveals meaningful insights through advanced relationships and filtering.&lt;br&gt;
Most DBMSs operate using two core components: &lt;strong&gt;Data Definition Language (DDL)&lt;/strong&gt; and &lt;strong&gt;Data Manipulation Language (DML)&lt;/strong&gt;. Together, DDL and DML handle the computation within a DBMS, while the database itself stores the data.&lt;br&gt;
Here are the key differences between DDL and DML:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;DDL&lt;/th&gt;
&lt;th&gt;DML&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Purpose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Defines and manages the database &lt;strong&gt;schema&lt;/strong&gt; and structure.&lt;/td&gt;
&lt;td&gt;Manipulates and manages &lt;strong&gt;actual data&lt;/strong&gt; records.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Commands&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;CREATE&lt;/code&gt;, &lt;code&gt;ALTER&lt;/code&gt;, &lt;code&gt;DROP&lt;/code&gt;, &lt;code&gt;TRUNCATE&lt;/code&gt;, &lt;code&gt;RENAME&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;SELECT&lt;/code&gt;*, &lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, &lt;code&gt;DELETE&lt;/code&gt;, &lt;code&gt;MERGE&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Effect&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Changes the structure of tables, indexes, or views.&lt;/td&gt;
&lt;td&gt;Changes only the rows or records within existing tables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auto-Commit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Changes are &lt;strong&gt;permanent immediately&lt;/strong&gt; (auto-committed) in most databases.&lt;/td&gt;
&lt;td&gt;Changes are &lt;strong&gt;not permanent&lt;/strong&gt; until a &lt;code&gt;COMMIT&lt;/code&gt; command is issued.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rollback&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Generally &lt;strong&gt;cannot be undone&lt;/strong&gt; once executed.&lt;/td&gt;
&lt;td&gt;Can be &lt;strong&gt;rolled back&lt;/strong&gt; (undone) if not yet committed.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;WHERE Clause&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Cannot&lt;/strong&gt; use a &lt;code&gt;WHERE&lt;/code&gt; clause.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Can&lt;/strong&gt; use a &lt;code&gt;WHERE&lt;/code&gt; clause to target specific records.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In my recent database assignment, I developed a structured system to manage student records, curriculum details, and examination performance. &lt;br&gt;
Using PostgreSQL, I implemented the four core SQL operations: &lt;strong&gt;CREATE&lt;/strong&gt;, &lt;strong&gt;INSERT&lt;/strong&gt;, &lt;strong&gt;UPDATE&lt;/strong&gt;, and &lt;strong&gt;DELETE&lt;/strong&gt;, to build and maintain a functional educational database.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;CREATE&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first step involved defining the architecture of the database. I began by creating a dedicated schema to ensure all school data remained organized.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;schema&lt;/span&gt; &lt;span class="n"&gt;nairobi_academy&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Within this schema, I used the &lt;code&gt;CREATE TABLE&lt;/code&gt; command to build three interconnected entities:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;students&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="n"&gt;student_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;first_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;last_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;gender&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; 
&lt;span class="n"&gt;date_of_birth&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;subjects&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;subjects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="n"&gt;subject_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;subject_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;unique&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="n"&gt;teacher_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="n"&gt;credits&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;exam_results&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="n"&gt;result_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;student_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;subject_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;exam_date&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;grade&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;INSERT&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I used &lt;code&gt;INSERT INTO&lt;/code&gt; to transform the empty tables into a live database. I successfully migrated data for 10 students and 10 core subjects. This phase also included recording initial exam marks and grades, providing a guideline of data to work with.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;subject_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exam_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;grade&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;values&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;78&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-03-15'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'B'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-03-16'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'A'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;92&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-03-1'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'A'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-03-17'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'C'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;49&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-03-16'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'D'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;71&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-03-18'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'B'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;88&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-03-15'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'A'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;63&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-03-19'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'C'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;39&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-03-20'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'F'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-03-21'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'A'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;UPDATE&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I utilized the &lt;code&gt;UPDATE&lt;/code&gt; command to ensure the records remained accurate over time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;update&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt; 
&lt;span class="k"&gt;set&lt;/span&gt;  &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Nairobi'&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;student_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;update&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt; 
&lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;59&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;result_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;DELETE&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The final part involved data cleanup. Using the &lt;code&gt;DELETE&lt;/code&gt; statement, I removed obsolete entries from the system.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;delete&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt; 
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;result_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Additionally, I used structural commands like &lt;code&gt;ALTER TABLE&lt;/code&gt; to drop unnecessary columns.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;alter&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt;
&lt;span class="k"&gt;drop&lt;/span&gt; &lt;span class="k"&gt;column&lt;/span&gt; &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Filtering with WHERE
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;WHERE&lt;/code&gt; clause only allows rows that meet specific criteria to appear in the results. To make these filters precise, I utilized several key SQL operators.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The &lt;code&gt;=&lt;/code&gt; Operator:&lt;/strong&gt;
Used in this case to retrieve specific records.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt; 
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Form 4'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The &lt;code&gt;&amp;gt;&lt;/code&gt; Operator:&lt;/strong&gt;
This is essential for analyzing performance and numerical data.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The &lt;code&gt;BETWEEN&lt;/code&gt; Operator:&lt;/strong&gt;
Here &lt;code&gt;BETWEEN&lt;/code&gt; provides a cleaner way to filter within a specific range.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="k"&gt;between&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The &lt;code&gt;IN&lt;/code&gt; Operator:&lt;/strong&gt;
This filters records where a column matches one of the specified values.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Nairobi'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Mombasa'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Kisumu'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The &lt;code&gt;LIKE&lt;/code&gt; Operator:&lt;/strong&gt;
is used within a &lt;code&gt;WHERE&lt;/code&gt; clause to search for a specified pattern in a column. It is specifically case-sensitive by default.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;subjects&lt;/span&gt; 
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;subject_name&lt;/span&gt; &lt;span class="k"&gt;like&lt;/span&gt; &lt;span class="s1"&gt;'%Studies%'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also combined these filters for deeper insights. For instance, using &lt;code&gt;WHERE class = 'Form 3' AND city = 'Nairobi'&lt;/code&gt; allowed me to find a very specific subset of students: those in a particular year who also live in that city.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using CASE WHEN to Transform Data
&lt;/h3&gt;

&lt;p&gt;In a school setting, a score of "85" or "49" is more than just a number, it represents a student's progress. I used &lt;code&gt;CASE WHEN&lt;/code&gt; to automatically categorize performance.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; 
    &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="s1"&gt;'Distinction'&lt;/span&gt;
    &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="s1"&gt;'Merit'&lt;/span&gt;
    &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="s1"&gt;'Pass'&lt;/span&gt;
    &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="s1"&gt;'Fail'&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;performance&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Additionally, I used the &lt;code&gt;CASE&lt;/code&gt; statement combined with the &lt;code&gt;IN&lt;/code&gt; operator to define Student Levels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;
    &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Form 3'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Form 4'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="s1"&gt;'Senior'&lt;/span&gt;
    &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Form 2'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Form 1'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="s1"&gt;'Junior'&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;student_level&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  A reflection on my study findings
&lt;/h3&gt;

&lt;p&gt;What I found most interesting was the use of the &lt;code&gt;CASE WHEN&lt;/code&gt; statement, being able to take a column of raw numbers and instantly turn them into human-readable labels like "Distinction" or "Senior" without actually changing the underlying data.&lt;br&gt;
The biggest challenge was the strictness of SQL syntax. One missing semicolon or a misspelled column name command can cause errors in the entire system.&lt;br&gt;
The transition from writing basic code to deploying a functional database was a rewarding challenge. This experience highlighted that SQL proficiency goes beyond technical syntax; it is about developing the logical framework required to communicate effectively with data and extract meaningful insights.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>databasemanagement</category>
      <category>postgres</category>
    </item>
    <item>
      <title>A Comprehensive Guide to Publishing and Embedding Power BI Reports on the Web with IFrames</title>
      <dc:creator>grace wambua</dc:creator>
      <pubDate>Tue, 07 Apr 2026 20:51:51 +0000</pubDate>
      <link>https://dev.to/grace_wambua/a-comprehensive-guide-to-publishing-and-embedding-power-bi-reports-on-the-web-with-iframes-47j3</link>
      <guid>https://dev.to/grace_wambua/a-comprehensive-guide-to-publishing-and-embedding-power-bi-reports-on-the-web-with-iframes-47j3</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Power BI is Microsoft's business analytics platform that helps you turn data into actionable insights. Whether you're a business user, report creator, or developer, Power BI offers integrated tools and services to connect, visualize, and share data across your organization.&lt;/p&gt;

&lt;p&gt;Power BI is made up of 3 main elements:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Power BI Desktop&lt;/strong&gt;: a free desktop application for building and designing reports&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Power BI Service&lt;/strong&gt;: the online publishing service for viewing and sharing reports and dashboards.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Power BI mobile apps&lt;/strong&gt;: for viewing reports and dashboards on the go.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The purpose of BI is to track Key Performance Indicators (KPIs) and uncover insights in business data to better inform decision-making across the organization. &lt;/p&gt;

&lt;h3&gt;
  
  
  Publishing Reports to Power BI Service
&lt;/h3&gt;

&lt;p&gt;Reports and dashboards can be shared with others by users who have a Power BI pro license and have published them to a PBI workspace where others can view them.&lt;br&gt;
A workspace is essentially a centralized location for collaboration.&lt;br&gt;
Reports and dashboards can be distributed in many different ways including being downloaded as .pbix files, shared via teams, etc.&lt;/p&gt;

&lt;p&gt;Here are step-by-step instructions on  how to publish a Power BI report and embed it on a website using iframes, with an Electronic Sales Data report.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Publish Report to Power BI Service&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Navigate to Workspaces on the BI Service &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvahr6n5fly8622oggrmt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvahr6n5fly8622oggrmt.png" alt="Image to Navigate to Workspaces" width="436" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a workspace where you will publish your report.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa0rdo0phrha869n6vfym.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa0rdo0phrha869n6vfym.png" alt="Image to Create a workspace" width="800" height="680"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open your report in Power BI Desktop.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click Publish on the Home ribbon.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2hqz3j0v17isukvm2i5x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2hqz3j0v17isukvm2i5x.png" alt="Image to Click Publish on Home ribbon" width="800" height="157"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Select the workspace and click Select. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh373i4wkng4qrx0w96v5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh373i4wkng4qrx0w96v5.png" alt="Image to select workspace" width="800" height="186"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generate Embed Code&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Log in to the Power BI Service and navigate to your report.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs4krwdrwjs5smwkhzlyn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs4krwdrwjs5smwkhzlyn.png" alt="Image to navigate to report" width="800" height="389"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Go to File &amp;gt; Embed report &amp;gt; Website or portal (for secure embedding)&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5hvz9nwlvmfyvbl7vouz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5hvz9nwlvmfyvbl7vouz.png" alt="Image to embed file" width="726" height="544"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Embed in Website&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Copy the &lt;code&gt;&amp;lt;iframe&amp;gt;&lt;/code&gt; HTML code provided in the dialog box.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi013k0mq0ypmjw6kf2qb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi013k0mq0ypmjw6kf2qb.png" alt="Image to copy iframe code" width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Paste this code into the HTML editor of your website.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsc5sv3kolt8i77o4hlhh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsc5sv3kolt8i77o4hlhh.png" alt="Image to paste code on HTML editor" width="800" height="123"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open the HTML file on a web browser and your BI report will successfully be embedded on a website&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsien8b55ua373qqiyq58.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsien8b55ua373qqiyq58.png" alt="Image to show successful embed on website" width="800" height="355"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Mastering the art of embedding BI reports not only enhances user experience but also positions your organization to thrive by transforming data into actionable strategies.&lt;/p&gt;

</description>
      <category>embedpowerbi</category>
      <category>datavisualization</category>
      <category>iframes</category>
    </item>
    <item>
      <title>Data Modeling in Power BI: Joins, Relationships, and Schemas</title>
      <dc:creator>grace wambua</dc:creator>
      <pubDate>Tue, 31 Mar 2026 20:02:02 +0000</pubDate>
      <link>https://dev.to/grace_wambua/data-modeling-in-power-bi-joins-relationships-and-schemas-566m</link>
      <guid>https://dev.to/grace_wambua/data-modeling-in-power-bi-joins-relationships-and-schemas-566m</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Data modeling is the process of organizing your data into tables, defining relationships between them, and enhancing the data with calculated fields, measures, and hierarchies. This process ensures accurate analysis and sets you up to create clear, impactful Power BI reports.&lt;/p&gt;

&lt;h3&gt;
  
  
  Types of SQL Joins
&lt;/h3&gt;

&lt;p&gt;Joins are one of the most important features that SQL offers. Joins allow us to make use of the relationships we have set up between our tables. &lt;br&gt;
In this article, we’ll break down the core SQL join types:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. INNER JOIN
&lt;/h3&gt;

&lt;p&gt;The SQL &lt;code&gt;INNER JOIN&lt;/code&gt; statement joins two tables based on a common column and selects rows that have matching values in these columns.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6f0qxj935hp8i915o3rj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6f0qxj935hp8i915o3rj.png" alt="Inner Join Venn Diagram" width="689" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Example:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;departments&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;department_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;departments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this query:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;employees.department_id&lt;/code&gt; refers to the &lt;code&gt;department_id&lt;/code&gt; column from the &lt;code&gt;employees&lt;/code&gt; table.&lt;br&gt;
&lt;code&gt;departments.id&lt;/code&gt; refers to the &lt;code&gt;id&lt;/code&gt; column from the &lt;code&gt;departments table&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;ON&lt;/code&gt; clause ensures that rows are matched based on these columns, creating a relationship between the two tables.&lt;br&gt;
The query above returns all the fields from both tables. The &lt;code&gt;INNER&lt;/code&gt; keyword only affects the number of rows returned, not the number of columns. The &lt;code&gt;INNER JOIN&lt;/code&gt; filters rows based on matching &lt;code&gt;department_id&lt;/code&gt; and &lt;code&gt;id&lt;/code&gt;, while the &lt;code&gt;SELECT *&lt;/code&gt; ensures all columns from both tables are included.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. LEFT JOIN
&lt;/h3&gt;

&lt;p&gt;The SQL &lt;code&gt;LEFT JOIN&lt;/code&gt; combines two tables based on a common column. It then selects records having matching values in these columns and the remaining rows from the left table.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F43ime26tyqptmnt6cddv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F43ime26tyqptmnt6cddv.png" alt="Left Join Venn Diagram" width="679" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Example:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- left join Customers and Orders tables based on their shared customer_id columns&lt;/span&gt;
&lt;span class="c1"&gt;-- Customers is the left table&lt;/span&gt;
&lt;span class="c1"&gt;-- Orders is the right table&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;Customers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Customers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Customers&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;Customers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the SQL command combines data from the &lt;code&gt;Customers&lt;/code&gt; and &lt;code&gt;Orders&lt;/code&gt; tables.&lt;/p&gt;

&lt;p&gt;The query selects the &lt;code&gt;customer_id&lt;/code&gt; and &lt;code&gt;first_name&lt;/code&gt; from &lt;code&gt;Customers&lt;/code&gt; and the &lt;code&gt;amount&lt;/code&gt; from &lt;code&gt;Orders&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Hence, the result includes rows where &lt;code&gt;customer_id&lt;/code&gt; from &lt;code&gt;Customers&lt;/code&gt; matches customer from &lt;code&gt;Orders&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. RIGHT JOIN
&lt;/h3&gt;

&lt;p&gt;The SQL &lt;code&gt;RIGHT JOIN&lt;/code&gt; statement joins two tables based on a common column. It selects records that have matching values in these columns and the remaining rows from the right table.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmobuu27gyhgxrlldejph.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmobuu27gyhgxrlldejph.png" alt="Right Join Venn Diagram" width="679" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Case Example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- join Customers and Orders tables&lt;/span&gt;
&lt;span class="c1"&gt;-- based on customer_id of Customers and customer of Orders&lt;/span&gt;
&lt;span class="c1"&gt;-- Customers is the left table&lt;/span&gt;
&lt;span class="c1"&gt;-- Orders is the right table&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;Customers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Customers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Customers&lt;/span&gt;
&lt;span class="k"&gt;RIGHT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;Customers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the SQL command selects the &lt;code&gt;customer_id&lt;/code&gt; and &lt;code&gt;first_name&lt;/code&gt; columns (from the &lt;code&gt;Customers&lt;/code&gt; table) and the &lt;code&gt;amount&lt;/code&gt; column (from the &lt;code&gt;Orders&lt;/code&gt; table).&lt;/p&gt;

&lt;p&gt;And, the result set will contain those rows where there is a match between &lt;code&gt;customer_id&lt;/code&gt; (of the &lt;code&gt;Customers&lt;/code&gt; table) and &lt;code&gt;customer&lt;/code&gt; (of the &lt;code&gt;Orders&lt;/code&gt; table), along with all the remaining rows from the &lt;code&gt;Orders&lt;/code&gt; table.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. FULL OUTER JOIN
&lt;/h3&gt;

&lt;p&gt;The SQL &lt;code&gt;FULL OUTER JOIN&lt;/code&gt; statement joins two tables based on a common column. It selects records that have matching values in these columns and the remaining rows from both of the tables.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm9bqdc0u1y1pee6ogsd7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm9bqdc0u1y1pee6ogsd7.png" alt="Full Outer Join Venn Diagram" width="679" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Use Case Example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;Customers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Customers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Customers&lt;/span&gt;
&lt;span class="k"&gt;FULL&lt;/span&gt; &lt;span class="k"&gt;OUTER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;Customers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the SQL command selects the &lt;code&gt;customer_id&lt;/code&gt; and &lt;code&gt;first_name&lt;/code&gt; columns (from the &lt;code&gt;Customers&lt;/code&gt; table) and the &lt;code&gt;amount&lt;/code&gt; column (from the &lt;code&gt;Orders&lt;/code&gt; table).&lt;/p&gt;

&lt;p&gt;The result set will contain &lt;strong&gt;all rows of both the tables&lt;/strong&gt;, regardless of whether there is a match between &lt;code&gt;customer_id&lt;/code&gt; (of the &lt;code&gt;Customers&lt;/code&gt; table) and &lt;code&gt;customer&lt;/code&gt; (of the &lt;code&gt;Orders&lt;/code&gt; table).&lt;/p&gt;

&lt;h3&gt;
  
  
  5. ANTI JOIN
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; The SQL &lt;code&gt;LEFT ANTI JOIN&lt;/code&gt;returns rows in the left table that have no matching rows in the right table.
&lt;strong&gt;How it works:&lt;/strong&gt;
Achieved with &lt;code&gt;LEFT JOIN&lt;/code&gt; + &lt;code&gt;WHERE [key in right table] IS NULL&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The SQL &lt;code&gt;RIGHT ANTI JOIN&lt;/code&gt; returns rows in the right table that have no matching rows in the left table.
&lt;strong&gt;How it works:&lt;/strong&gt;
Achieved with &lt;code&gt;RIGHT JOIN&lt;/code&gt; + &lt;code&gt;WHERE [key in left table] IS NULL&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Case Example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;tableA&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;tableB&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;tableA&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tableB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;key&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tableB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;key&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This query will return all rows from &lt;code&gt;tableA&lt;/code&gt; that do not have a corresponding row in &lt;code&gt;tableB&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  CARDINALITY
&lt;/h3&gt;

&lt;p&gt;In Power BI, the term “cardinality” describes the type of relationship between two tables according to how many related rows each table has. It specifies the relationships between rows in one table and rows in another.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;One-to-Many (1 : *):&lt;/strong&gt; In this relationship, one record in the first table connects to many records in the second table.&lt;br&gt;
&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
Each customer can have multiple orders, but each order belongs to only one customer.&lt;br&gt;
In the model view, this appears as 1 --&amp;gt; *&lt;br&gt;
The “1” side is usually a dimension table (like Customers).&lt;br&gt;
The “many” side is a fact table (like Orders).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Many-to-One (* : 1):&lt;/strong&gt; This is the reverse direction of a one-to-many relationship. It happens when the filter starts from the many side and moves to the one side.&lt;br&gt;
&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
If you select a particular order in a visual (from the Orders table), Power BI can trace it back to the correct Customer in the Customers table.&lt;br&gt;
Multiple orders (many) point to one customer (one), that is a many-to-one relationship. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;One-to-One (1 : 1):&lt;/strong&gt; Each record in one table matches exactly one record in another table.&lt;br&gt;
&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
Assume, you have a Customer table and a Customer Profile table. Each customer ID appears only once in both tables.&lt;br&gt;
Useful when you split a large table into smaller parts for better performance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Many-to-Many (* : *):&lt;/strong&gt; Both tables can have repeating values in their key columns&lt;br&gt;
&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
Imagine you are a student and you want to join club, each student can join multiple clubs and each club can have multiple students.&lt;br&gt;
Power BI manages this scenario using a bridge table, which maps each student to each club.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Active Vs. Inactive Relationships
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Active&lt;/strong&gt;&lt;br&gt;
An active relationship in Power BI is the primary, default connection between two tables. Power BI automatically uses active relationships for filtering and calculations unless you specify otherwise.&lt;br&gt;
You can only have one active relationship between two tables at a time, even if there are multiple potential ways they could be related.&lt;br&gt;
&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
Consider a Sales table and a Dates table. You might have a relationship based on the OrderDate field. If this is the main date you want to use for your analysis, it will be marked as the active relationship.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inactive&lt;/strong&gt; &lt;br&gt;
An inactive relationship is a secondary connection between two tables that Power BI does not automatically use for filtering or calculations. These relationships are useful when you need multiple ways to connect tables, but only one connection should be used by default.&lt;br&gt;
Inactive relationships can be activated manually in specific measures or calculations using DAX (Data Analysis Expressions).&lt;br&gt;
&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
In addition to OrderDate, your Sales table might also have a ShipDate field that relates to the Dates table. You can create an inactive relationship between ShipDate and Dates, which you can activate selectively when needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  CROSS FILTER
&lt;/h3&gt;

&lt;p&gt;Cross filtering in Power BI determines how filters are applied across related tables in a relationship. It defines the direction in which the filter context flows between tables.&lt;br&gt;
There are two types of cross filter direction:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Single Direction:&lt;/strong&gt;
Filters flow from one table to another, for example: from Customer to Orders.
This is the default and most efficient setup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Both Direction/Bidirectional:&lt;/strong&gt;
Filters flow both ways between tables.
Used when both tables should influence each other, for example: Region and Sales.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  How Cardinality and Cross Filtering work together
&lt;/h3&gt;

&lt;p&gt;Cardinality defines how tables are connected, while cross filtering defines how filters move through those connections. Together, they ensure that your visuals respond correctly to user actions.&lt;br&gt;
&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
Cardinality: One-to-Many between Customers and Orders.&lt;br&gt;
Cross Filter Direction: Single from Customers to Orders.&lt;br&gt;
Result: Selecting a customer filters their orders but not the other way around.&lt;/p&gt;

&lt;h3&gt;
  
  
  Difference between Relationships and Joins
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Relationships:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are displayed as flexible noodles between logical tables&lt;/li&gt;
&lt;li&gt;Require you to select matching fields between two logical tables&lt;/li&gt;
&lt;li&gt;Do not require you to select join types&lt;/li&gt;
&lt;li&gt;Make all row and column data from related tables potentially available in the data source&lt;/li&gt;
&lt;li&gt;Maintain each table's level of detail in the data source and during analysis&lt;/li&gt;
&lt;li&gt;Create independent domains at multiple levels of detail. Tables aren't merged together in the data source.&lt;/li&gt;
&lt;li&gt;During analysis, create the appropriate joins automatically, based on the fields in use.&lt;/li&gt;
&lt;li&gt;Do not duplicate aggregate values (when Performance Options are set to Many-to-Many)&lt;/li&gt;
&lt;li&gt;Keep unmatched measure values (when Performance Options are set to Some Records Match)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Joins:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are displayed with Venn diagram icons between physical tables&lt;/li&gt;
&lt;li&gt;Require you to select join types and join clauses&lt;/li&gt;
&lt;li&gt;Joined physical tables are merged into a single logical table with a fixed combination of data&lt;/li&gt;
&lt;li&gt;May drop unmatched measure values&lt;/li&gt;
&lt;li&gt;May duplicate aggregate values when fields are at different levels of detail&lt;/li&gt;
&lt;li&gt;Support scenarios that require a single table of data, such as extract filters and aggregation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Fact Vs. Dimension Tables
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Characteristic&lt;/th&gt;
&lt;th&gt;Fact Table&lt;/th&gt;
&lt;th&gt;Dimension Table&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Basic Definition&lt;/td&gt;
&lt;td&gt;It contains data (often transactional) that you want to analyze&lt;/td&gt;
&lt;td&gt;It accompanies the fact table and stores information that describe records in the fact table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Purpose&lt;/td&gt;
&lt;td&gt;It contains measures and is used for analysis and decision making&lt;/td&gt;
&lt;td&gt;It contains information about a business and its process&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Type of Data&lt;/td&gt;
&lt;td&gt;Numeric and textual format&lt;/td&gt;
&lt;td&gt;Textual format&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Primary/Foreign Key&lt;/td&gt;
&lt;td&gt;A primary key for each dimension which is acts as a foreign key in the dimension table&lt;/td&gt;
&lt;td&gt;A foreign key associated with the primary key of the fact table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hierarchy&lt;/td&gt;
&lt;td&gt;No hierarchy&lt;/td&gt;
&lt;td&gt;Contains a hierarchy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Attributes&lt;/td&gt;
&lt;td&gt;Less attributes&lt;/td&gt;
&lt;td&gt;More Attributes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Records&lt;/td&gt;
&lt;td&gt;More Records&lt;/td&gt;
&lt;td&gt;Less Records&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Table Growth&lt;/td&gt;
&lt;td&gt;Grows vertically&lt;/td&gt;
&lt;td&gt;Grows horizontally&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Model&lt;/td&gt;
&lt;td&gt;Fewer fact tables in the data model&lt;/td&gt;
&lt;td&gt;More dimension tables in a data model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Update Frequency&lt;/td&gt;
&lt;td&gt;Records added very frequently&lt;/td&gt;
&lt;td&gt;Records not added frequently&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Star Schema
&lt;/h3&gt;

&lt;p&gt;A star schema is a way to organize data in a database, especially in data warehouses, to make it easier and faster to analyze. At the center, there's a main table called the fact table, which holds measurable data. Around it are dimension tables.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpiu25j5pkoyq42ulaldc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpiu25j5pkoyq42ulaldc.png" alt="star schema diagram" width="800" height="577"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Snowflake Schema
&lt;/h3&gt;

&lt;p&gt;A snowflake schema splits dimension tables into smaller sub-dimensions to keep data more organized and detailed; just like snowflakes in a large lake.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr7029kg2k1ltqegbr490.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr7029kg2k1ltqegbr490.jpg" alt="snowflake schema diagram" width="800" height="767"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Flat Table
&lt;/h3&gt;

&lt;p&gt;A flat table in Power BI is a single, wide table containing all data, including measures and descriptive attributes, without relationships, similar to a spreadsheet.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;OrderID&lt;/th&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Customer&lt;/th&gt;
&lt;th&gt;Region&lt;/th&gt;
&lt;th&gt;Qty&lt;/th&gt;
&lt;th&gt;Unit Price&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1001&lt;/td&gt;
&lt;td&gt;01/01/26&lt;/td&gt;
&lt;td&gt;Laptop&lt;/td&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;North&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1002&lt;/td&gt;
&lt;td&gt;01/02/26&lt;/td&gt;
&lt;td&gt;Desk&lt;/td&gt;
&lt;td&gt;Furniture&lt;/td&gt;
&lt;td&gt;Bob&lt;/td&gt;
&lt;td&gt;South&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1003&lt;/td&gt;
&lt;td&gt;01/02/26&lt;/td&gt;
&lt;td&gt;Laptop&lt;/td&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;Charlie&lt;/td&gt;
&lt;td&gt;North&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Role-playing Dimensions
&lt;/h3&gt;

&lt;p&gt;Role-playing dimensions in Power BI occur when a single dimension table (e.g. Date) connects to a fact table multiple times, representing different roles (e.g. Order Date, Ship Date). Power BI allows only one active relationship between tables; subsequent roles are inactive. &lt;/p&gt;

&lt;h3&gt;
  
  
  Common Modeling Issues
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Building Models from CSV exports. &lt;/li&gt;
&lt;li&gt;Missing Unique Keys. &lt;/li&gt;
&lt;li&gt;Using Multiple date tables. &lt;/li&gt;
&lt;li&gt;Overlapping attributes across tables. &lt;/li&gt;
&lt;li&gt;Lack of normalisation. &lt;/li&gt;
&lt;li&gt;Ignoring virtual relationships. &lt;/li&gt;
&lt;li&gt;Keeping unnecessary columns. &lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;I hope you found this blog helpful in understanding the significance of data modeling and the basic principles crucial to building an effective data model. By understanding these fundamental concepts, such as star schemas, cardinality, cross-filter direction, and active and inactive relationships, you should be well on your way to becoming a more proficient Power BI developer.&lt;/p&gt;

</description>
      <category>powerbi</category>
      <category>datamodeling</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Resource Monitoring for Data Pipelines</title>
      <dc:creator>grace wambua</dc:creator>
      <pubDate>Sat, 28 Mar 2026 09:21:30 +0000</pubDate>
      <link>https://dev.to/grace_wambua/resource-monitoring-for-data-pipelines-388o</link>
      <guid>https://dev.to/grace_wambua/resource-monitoring-for-data-pipelines-388o</guid>
      <description>&lt;p&gt;As a data engineering student, I came to a realization that sometimes the errors that slowly starve resources, don't always throw a code. Monitoring how our pipelines consume resources isn't just about performance, its about respect for the hardware. This article is about understanding the machines we built on, to maintain fast and efficient pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When running data pipelines, especially in production, resource monitoring is critical to prevent slowdowns, crashes, or system-wide failures. Simple Linux command-line tools like &lt;code&gt;top&lt;/code&gt;, &lt;code&gt;htop&lt;/code&gt;, &lt;code&gt;df -h&lt;/code&gt;, and &lt;code&gt;free -h&lt;/code&gt; provide real-time visibility into system health and help you catch issues before they escalate.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Monitoring CPU &amp;amp; Processes: &lt;code&gt;top&lt;/code&gt; and &lt;code&gt;htop&lt;/code&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;top&lt;/code&gt; (Built-in, lightweight)
&lt;/h4&gt;

&lt;p&gt;The &lt;code&gt;top&lt;/code&gt; command gives a live view of system processes and CPU usage.&lt;/p&gt;

&lt;p&gt;Shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU utilization (user, system, idle time)&lt;/li&gt;
&lt;li&gt;Running processes and their CPU/memory consumption&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F27b0ovnkx73fw51adogr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F27b0ovnkx73fw51adogr.png" alt=" raw `top` endraw  command output" width="800" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters for pipelines:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identify CPU bottlenecks during heavy transformations (e.g., Spark jobs, ETL scripts)&lt;/li&gt;
&lt;li&gt;Detect runaway processes consuming excessive CPU&lt;/li&gt;
&lt;li&gt;Spot when multiple pipelines overload the system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tip: Press P inside &lt;code&gt;top&lt;/code&gt; to sort by CPU usage.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;htop&lt;/code&gt; (Enhanced, user-friendly)
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;htop&lt;/code&gt; is an improved version of &lt;code&gt;top&lt;/code&gt; with a more intuitive interface.&lt;/p&gt;

&lt;p&gt;Features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Color-coded CPU, memory, and swap usage&lt;/li&gt;
&lt;li&gt;Easy process management (kill, renice)&lt;/li&gt;
&lt;li&gt;Tree view of processes (great for pipeline dependencies)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxib2gpen8wjduz8jd1x9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxib2gpen8wjduz8jd1x9.png" alt=" raw `htop` endraw  command output" width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pipeline use cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visualize parallel jobs in distributed pipelines&lt;/li&gt;
&lt;li&gt;Quickly terminate stuck or zombie tasks&lt;/li&gt;
&lt;li&gt;Monitor thread-level activity in real time&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Monitoring Memory Usage: &lt;code&gt;free -h&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;free -h&lt;/code&gt; command shows memory usage in a human-readable format.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Used &lt;/li&gt;
&lt;li&gt;Free
&lt;/li&gt;
&lt;li&gt;Buffers/cache&lt;/li&gt;
&lt;li&gt;Swap usage&lt;/li&gt;
&lt;li&gt;Available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbk7bm4t64etgmktugid7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbk7bm4t64etgmktugid7.png" alt=" raw `free -h` endraw  command output" width="800" height="165"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example use:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If your data pipeline loads large datasets into memory (e.g. Pandas, Spark), watch the &lt;strong&gt;available memory&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;If it drops too low, the system may start &lt;strong&gt;swapping&lt;/strong&gt;, drastically slowing performance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best practice:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensure pipelines don’t consume all RAM; leave headroom for the OS and other services&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Monitoring Disk Space: &lt;code&gt;df -h&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;df -h&lt;/code&gt; command displays disk usage across mounted filesystems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shows:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total, used, and available disk space&lt;/li&gt;
&lt;li&gt;Usage percentage per filesystem&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh0r0yw7zsq6s8rf69uxv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh0r0yw7zsq6s8rf69uxv.png" alt=" raw `df -h` endraw  command output" width="800" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Data pipelines often generate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Temporary files&lt;/li&gt;
&lt;li&gt;Logs&lt;/li&gt;
&lt;li&gt;Intermediate datasets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If disk fills up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Jobs may fail unexpectedly&lt;/li&gt;
&lt;li&gt;Databases or services can crash&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Common risk:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A pipeline writing large intermediate files (e.g. CSV/Parquet) can silently fill up disk,causing job failure or system instability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Watch for partitions approaching &lt;strong&gt;90–100% usage&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Clean up temp directories or rotate logs regularly&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Preventing Production Failures
&lt;/h3&gt;

&lt;p&gt;By combining these tools, you can proactively protect your system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;High CPU usage&lt;/strong&gt; (&lt;code&gt;top&lt;/code&gt;/&lt;code&gt;htop&lt;/code&gt;)&lt;br&gt;
Indicates inefficient code or too many parallel jobs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Low available memory&lt;/strong&gt; (&lt;code&gt;free -h&lt;/code&gt;)&lt;br&gt;
Risk of crashes or heavy swapping&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;High disk usage&lt;/strong&gt; (&lt;code&gt;df -h&lt;/code&gt;)&lt;br&gt;
Risk of failed writes and system instability&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Practical Workflow for Data Engineers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Start your pipeline&lt;/li&gt;
&lt;li&gt;Open another terminal and run:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;htop&lt;/code&gt;: monitor CPU + processes&lt;br&gt;
&lt;code&gt;watch free -h&lt;/code&gt;: track memory over time&lt;br&gt;
&lt;code&gt;watch df -h&lt;/code&gt;: monitor disk growth&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Look for abnormal spikes or steady resource exhaustion&lt;/li&gt;
&lt;li&gt;Adjust: Batch sizes, Parallelism, Memory allocation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;p&gt;These tools are lightweight, fast, and available on most Linux systems. They provide real-time insights into system health. Regular monitoring helps: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prevent crashes&lt;/li&gt;
&lt;li&gt;Optimize performance &lt;/li&gt;
&lt;li&gt;Ensure stable production pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Resource monitoring is about being a good steward of the infrastructure we use. Without proper monitoring, pipelines may crash unexpectedly and systems can become unresponsive.With these tools, you gain early warning signals, debug performance issues faster and ensure stable, reliable data processing.&lt;/p&gt;

</description>
      <category>resourcemonitoring</category>
      <category>datapipelines</category>
      <category>dataengineering</category>
    </item>
  </channel>
</rss>
