<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tom Chege</title>
    <description>The latest articles on DEV Community by Tom Chege (@tom_chege).</description>
    <link>https://dev.to/tom_chege</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3818282%2Faee19086-1cab-428d-af04-c2182bd89c5f.png</url>
      <title>DEV Community: Tom Chege</title>
      <link>https://dev.to/tom_chege</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tom_chege"/>
    <language>en</language>
    <item>
      <title>Introduction to Python in Data Analytics</title>
      <dc:creator>Tom Chege</dc:creator>
      <pubDate>Sun, 10 May 2026 09:28:00 +0000</pubDate>
      <link>https://dev.to/tom_chege/-introduction-to-python-in-data-analytics-3kl8</link>
      <guid>https://dev.to/tom_chege/-introduction-to-python-in-data-analytics-3kl8</guid>
      <description>&lt;h2&gt;
  
  
  What is Python?
&lt;/h2&gt;

&lt;p&gt;Python is a programming language used to tell computers what to do. It is popular because it is easy to read and write, making it beginner-friendly. Python can be used to build websites, automate tasks, analyze data, and create machine learning applications.&lt;/p&gt;

&lt;p&gt;One of the reasons Python is widely used is because it is versatile. Instead of being limited to one area, it can be applied in many industries and careers such as software development, automation, artificial intelligence, cybersecurity, and data analytics.&lt;/p&gt;

&lt;p&gt;It's a tool that can be used for many different purposes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Python is Popular in Data Analytics
&lt;/h2&gt;

&lt;p&gt;Python is popular in data analytics because its syntax is simple and easy to read, making it easier for beginners to learn compared to many other programming languages.&lt;/p&gt;

&lt;p&gt;Another major reason for its popularity is the availability of powerful libraries that help analysts clean, organize, analyze, and visualize data efficiently.&lt;/p&gt;

&lt;p&gt;Python allows analysts to automate repetitive work, reduce human errors, and process large amounts of data much faster than manual methods.&lt;/p&gt;

&lt;p&gt;For businesses, this means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faster reporting&lt;/li&gt;
&lt;li&gt;More accurate insights&lt;/li&gt;
&lt;li&gt;Improved decision-making&lt;/li&gt;
&lt;li&gt;Better understanding of customer behaviour and business performance&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Python Libraries Used in Data Analytics
&lt;/h2&gt;

&lt;p&gt;A Python library is a collection of pre-written code that helps developers perform specific tasks more easily and efficiently.&lt;/p&gt;

&lt;p&gt;Instead of building everything from scratch, developers and analysts can use libraries to speed up their work.&lt;/p&gt;

&lt;p&gt;Some popular Python libraries used in data analytics include:&lt;/p&gt;

&lt;h3&gt;
  
  
  Pandas
&lt;/h3&gt;

&lt;p&gt;Pandas is a Python library used to clean, organize, and analyze data quickly and efficiently.&lt;/p&gt;

&lt;p&gt;It helps analysts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove duplicates&lt;/li&gt;
&lt;li&gt;Handle missing values&lt;/li&gt;
&lt;li&gt;Filter and sort information&lt;/li&gt;
&lt;li&gt;Organize data into tables&lt;/li&gt;
&lt;li&gt;Work with CSV and Excel files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using Pandas increases speed and accuracy compared to manual data handling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Requests
&lt;/h3&gt;

&lt;p&gt;Requests is a Python library used to communicate with websites and APIs over the internet.&lt;/p&gt;

&lt;p&gt;It helps analysts retrieve data from external systems and online services for analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Python is Used to Clean, Analyze, and Visualize Data
&lt;/h2&gt;

&lt;p&gt;Data used in Python can come from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Databases&lt;/li&gt;
&lt;li&gt;Excel files&lt;/li&gt;
&lt;li&gt;CSV files&lt;/li&gt;
&lt;li&gt;APIs&lt;/li&gt;
&lt;li&gt;Websites&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Data Cleaning
&lt;/h3&gt;

&lt;p&gt;Data cleaning involves preparing raw data for analysis.&lt;/p&gt;

&lt;p&gt;During this stage, analysts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove duplicates&lt;/li&gt;
&lt;li&gt;Fix formatting issues&lt;/li&gt;
&lt;li&gt;Handle missing values&lt;/li&gt;
&lt;li&gt;Correct errors&lt;/li&gt;
&lt;li&gt;Organize the data into usable formats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Data cleaning is important because poor-quality data can produce misleading insights and lead businesses to make incorrect decisions.&lt;/p&gt;

&lt;p&gt;A common phrase in analytics is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Garbage in, garbage out."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This means poor quality data produces poor quality analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Analysis
&lt;/h3&gt;

&lt;p&gt;After cleaning, analysts study the data to identify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trends&lt;/li&gt;
&lt;li&gt;Patterns&lt;/li&gt;
&lt;li&gt;Customer behaviour&lt;/li&gt;
&lt;li&gt;Business performance&lt;/li&gt;
&lt;li&gt;Unusual activities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Python helps analysts process large datasets quickly and accurately.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Visualization
&lt;/h3&gt;

&lt;p&gt;Visualization is used to present findings using charts and graphs that are easier to understand than large spreadsheet tables.&lt;/p&gt;

&lt;p&gt;Visualizations help businesses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spot trends quickly&lt;/li&gt;
&lt;li&gt;Compare performance&lt;/li&gt;
&lt;li&gt;Identify problems&lt;/li&gt;
&lt;li&gt;Communicate insights clearly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of reading thousands of rows of data, management can quickly understand business performance through dashboards and charts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Examples of Python in Data Analytics
&lt;/h2&gt;

&lt;p&gt;Python is used in many industries to solve real business problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Banking
&lt;/h3&gt;

&lt;p&gt;Banks use Python to monitor customer transactions and identify unusual activities that may indicate fraud.&lt;/p&gt;

&lt;h3&gt;
  
  
  Supermarkets and Retail
&lt;/h3&gt;

&lt;p&gt;Retail businesses use Python to analyze customer buying behaviour and determine which aisles customers visit most often. This helps improve product placement and store organization.&lt;/p&gt;

&lt;p&gt;Businesses can also personalize customer experiences based on shopping behaviour to increase sales and customer retention.&lt;/p&gt;

&lt;h3&gt;
  
  
  Healthcare
&lt;/h3&gt;

&lt;p&gt;Hospitals and healthcare providers use Python to analyze patient records, monitor disease trends, and improve service delivery.&lt;/p&gt;

&lt;h3&gt;
  
  
  Transport and Logistics
&lt;/h3&gt;

&lt;p&gt;Transport companies use Python to optimize delivery routes, monitor vehicle performance, and improve operational efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Beginners Should Learn Python
&lt;/h2&gt;

&lt;p&gt;Python is considered one of the best programming languages for beginners because it is easy to learn and has a large and active support community.&lt;/p&gt;

&lt;p&gt;Beginners can quickly find:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tutorials&lt;/li&gt;
&lt;li&gt;Documentation&lt;/li&gt;
&lt;li&gt;Videos&lt;/li&gt;
&lt;li&gt;Forums&lt;/li&gt;
&lt;li&gt;Community support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Python is also a valuable career skill because it is versatile and used in many industries beyond data analytics.&lt;/p&gt;

&lt;p&gt;In data analytics specifically, Python helps analysts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clean data&lt;/li&gt;
&lt;li&gt;Organize information&lt;/li&gt;
&lt;li&gt;Analyze trends&lt;/li&gt;
&lt;li&gt;Visualize insights&lt;/li&gt;
&lt;li&gt;Present findings clearly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes Python an important tool for turning raw data into useful business decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Python has become one of the most important tools in modern data analytics because of its simplicity, flexibility, and powerful libraries.&lt;/p&gt;

&lt;p&gt;With tools such as Pandas, Requests, Matplotlib, and Seaborn, analysts can clean, analyze, and visualize large amounts of data efficiently.&lt;/p&gt;

&lt;p&gt;More importantly, Python helps businesses transform raw information into meaningful insights that support smarter and faster decision-making.&lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Understanding Subqueries and CTEs in SQL</title>
      <dc:creator>Tom Chege</dc:creator>
      <pubDate>Sun, 19 Apr 2026 12:58:22 +0000</pubDate>
      <link>https://dev.to/tom_chege/understanding-subqueries-and-ctes-in-sql-12c5</link>
      <guid>https://dev.to/tom_chege/understanding-subqueries-and-ctes-in-sql-12c5</guid>
      <description>&lt;p&gt;After getting comfortable with joins and window functions, I’m now diving into more advanced query structuring techniques.&lt;/p&gt;

&lt;p&gt;Subqueries felt like a natural extension of filtering and aggregation. CTEs, on the other hand, introduced a cleaner way of breaking down complex logic into readable, step-by-step components.&lt;/p&gt;

&lt;p&gt;This guide provides a practical comparison of both concepts using a simple &lt;strong&gt;student exam results&lt;/strong&gt; dataset, highlighting when and why to choose one over the other.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is a Subquery?
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;subquery&lt;/strong&gt; is a query nested inside another query. It runs first, and its result is used by the outer (main) query.&lt;/p&gt;

&lt;p&gt;Subqueries can be placed in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;SELECT&lt;/code&gt; clause&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;FROM&lt;/code&gt; clause&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;WHERE&lt;/code&gt; or &lt;code&gt;HAVING&lt;/code&gt; clause&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why Use Subqueries?
&lt;/h2&gt;

&lt;p&gt;Subqueries are useful when you need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Perform intermediate calculations&lt;/li&gt;
&lt;li&gt;Apply dynamic filtering&lt;/li&gt;
&lt;li&gt;Break down complex logic without creating temporary tables&lt;/li&gt;
&lt;li&gt;Keep queries self-contained&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  1. Subqueries in the WHERE Clause
&lt;/h2&gt;

&lt;p&gt;This is the most common use case.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example: Students scoring above the overall average
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;marks&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;marks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What’s happening:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The inner query calculates the average marks across all students.&lt;/li&gt;
&lt;li&gt;The outer query filters for students who scored above that average.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Subqueries in the WHERE clause with IN and EXISTS
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Using IN (for value comparison)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;student_id&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;student_id&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;student_id&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Using EXISTS (for row existence check)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;e1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;student_id&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt; &lt;span class="n"&gt;e1&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt; &lt;span class="n"&gt;e2&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;e2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;student_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;e1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;student_id&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;e2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Difference:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;IN&lt;/code&gt; → Compares values against a list&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;EXISTS&lt;/code&gt; → Checks whether at least one matching row exists (usually more efficient with large datasets)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  2. Subqueries in the SELECT Clause
&lt;/h2&gt;

&lt;p&gt;Used to compute additional values for each row.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example: Show each student’s marks alongside their personal average
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;e1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;e1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;marks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;marks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt; &lt;span class="n"&gt;e2&lt;/span&gt;
        &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;e2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;student_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;e1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;student_id&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;student_avg&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt; &lt;span class="n"&gt;e1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; This is a &lt;strong&gt;correlated subquery&lt;/strong&gt; — it runs once for every row in the outer query, which can impact performance on large tables.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Subqueries in the FROM Clause (Derived Tables)
&lt;/h2&gt;

&lt;p&gt;Creates a temporary result set that can be treated like a table.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example: Average marks per student
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; 
        &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;marks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_marks&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;
    &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;student_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;student_avg&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Derived tables &lt;strong&gt;must&lt;/strong&gt; have an alias.&lt;/p&gt;




&lt;h2&gt;
  
  
  Types of Subqueries
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scalar Subquery&lt;/strong&gt;: Returns a single value&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Column Subquery&lt;/strong&gt;: Returns multiple rows (usually one column)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Correlated Subquery&lt;/strong&gt;: References columns from the outer query and runs for each row&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What is a CTE (Common Table Expression)?
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;Common Table Expression (CTE)&lt;/strong&gt; is a temporary named result set defined using the &lt;code&gt;WITH&lt;/code&gt; clause. It makes queries more readable by breaking them into logical steps.&lt;/p&gt;

&lt;p&gt;Think of it as creating a named mini-table that you can reference later in the same query.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Use CTEs?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Improved Readability&lt;/strong&gt; — Break complex logic into named, logical steps instead of deeply nested subqueries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reusability&lt;/strong&gt; — Define a result set once and reference it multiple times in the same query&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easier Debugging&lt;/strong&gt; — You can &lt;code&gt;SELECT *&lt;/code&gt; from any CTE individually to test and verify intermediate results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better Organization&lt;/strong&gt; — Complex queries become a series of clear building blocks rather than one giant nested statement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logical Flow&lt;/strong&gt; — You can build your query step by step, making it easier to understand the thought process&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reduced Repetition&lt;/strong&gt; — Avoid repeating the same subquery logic multiple times&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;CTEs are especially powerful when you need to perform multiple transformations or aggregations on the same dataset before joining everything together.&lt;/p&gt;

&lt;h3&gt;
  
  
  Basic CTE Example: Student averages
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;student_avg&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; 
        &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;marks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_marks&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;
    &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;student_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;student_avg&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Types and Use Cases of CTEs
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Simple CTE
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;high_scores&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;high_scores&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Multiple CTEs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;student_avg&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; 
        &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;marks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_marks&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;
    &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;student_id&lt;/span&gt;
&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="n"&gt;top_students&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;student_avg&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;avg_marks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;85&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;top_students&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Recursive CTE
&lt;/h3&gt;

&lt;p&gt;Used for hierarchical or recursive data (e.g., organizational charts, bill of materials). Advanced topic — not ccovered in this article.&lt;/p&gt;




&lt;h2&gt;
  
  
  Subqueries vs CTEs
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Subqueries&lt;/th&gt;
&lt;th&gt;CTEs (Common Table Expressions)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Readability&lt;/td&gt;
&lt;td&gt;Can become hard to read when deeply nested&lt;/td&gt;
&lt;td&gt;Much cleaner and more structured&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reusability&lt;/td&gt;
&lt;td&gt;Logic must be repeated if used multiple times&lt;/td&gt;
&lt;td&gt;Defined once and can be reused multiple times&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;Simple subqueries are fast; correlated ones can be slow&lt;/td&gt;
&lt;td&gt;Often better with modern optimizers; depends on usage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debugging&lt;/td&gt;
&lt;td&gt;More difficult&lt;/td&gt;
&lt;td&gt;Easier (you can &lt;code&gt;SELECT&lt;/code&gt; from each CTE separately)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scope&lt;/td&gt;
&lt;td&gt;Limited to the immediate query part&lt;/td&gt;
&lt;td&gt;Available throughout the entire query&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  When to Use Each
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Recommended Choice&lt;/th&gt;
&lt;th&gt;Reason&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple filtering or single value&lt;/td&gt;
&lt;td&gt;Subquery&lt;/td&gt;
&lt;td&gt;Quick and concise&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complex, multi-step logic&lt;/td&gt;
&lt;td&gt;CTE&lt;/td&gt;
&lt;td&gt;Better organization and readability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reusable intermediate results&lt;/td&gt;
&lt;td&gt;CTE&lt;/td&gt;
&lt;td&gt;Defined once, used multiple times&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Row-by-row comparisons&lt;/td&gt;
&lt;td&gt;Correlated Subquery&lt;/td&gt;
&lt;td&gt;Runs for each row and can reference outer query values&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Readability &amp;amp; maintainability&lt;/td&gt;
&lt;td&gt;CTE&lt;/td&gt;
&lt;td&gt;Easier to read, debug, and maintain&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Both &lt;strong&gt;subqueries&lt;/strong&gt; and &lt;strong&gt;CTEs&lt;/strong&gt; help break down complex SQL problems, but they do so in different ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subqueries&lt;/strong&gt; are great for quick, inline logic that fits naturally inside another clause.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CTEs&lt;/strong&gt; shine when your query becomes more complex and you want maximum readability and maintainability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As you progress, the goal shifts from “writing queries that work” to “writing queries that are clear, efficient, and easy to maintain.”&lt;/p&gt;

&lt;p&gt;Happy querying!&lt;/p&gt;

</description>
      <category>newbie</category>
      <category>sql</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Understanding SQL Window Functions</title>
      <dc:creator>Tom Chege</dc:creator>
      <pubDate>Fri, 17 Apr 2026 12:53:52 +0000</pubDate>
      <link>https://dev.to/tom_chege/understanding-sql-window-functions-2hp3</link>
      <guid>https://dev.to/tom_chege/understanding-sql-window-functions-2hp3</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;This week I’ve been learning a range of SQL concepts. Some clicked quickly, others took a bit more effort to fully grasp.&lt;/p&gt;

&lt;p&gt;Joins, for example, were relatively intuitive. Once I understood how tables connect, it became easier to reason about how data flows between them. Window functions, on the other hand, were a different story.&lt;/p&gt;

&lt;p&gt;I found them harder to wrap my head around. Not because they are overly complex, but because they require a shift in how you think about data — from grouping and reducing rows to enriching each row with additional context.&lt;/p&gt;

&lt;p&gt;That challenge is what led me to write this article. To keep things grounded, I’ll use a simple student exam results dataset throughout the article.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Are Window Functions?
&lt;/h2&gt;

&lt;p&gt;Window functions perform calculations across a set of rows related to the current row, without collapsing the dataset.&lt;/p&gt;

&lt;p&gt;Unlike &lt;code&gt;GROUP BY&lt;/code&gt;, which reduces multiple rows into a single summary row, window functions retain every row and add additional insights alongside the existing data.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Assigning Order with ROW_NUMBER()
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Requirement
&lt;/h3&gt;

&lt;p&gt;Assign a unique rank to each exam result from highest to lowest marks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;result_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;marks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ROW_NUMBER&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;row_num&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How it works
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The data is sorted by marks in descending order&lt;/li&gt;
&lt;li&gt;SQL assigns numbers sequentially (1, 2, 3…)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key behaviour
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Every row gets a unique number&lt;/li&gt;
&lt;li&gt;Ties are ignored&lt;/li&gt;
&lt;li&gt;Ordering strictly follows the defined rule&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;marks&lt;/th&gt;
&lt;th&gt;row_num&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;95&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;90&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;90&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;85&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Line everyone up based on a rule, then count them one by one&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  2. Handling Ties: RANK() vs DENSE_RANK()
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;marks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;RANK&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;rank_marks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;DENSE_RANK&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dense_rank&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;marks&lt;/th&gt;
&lt;th&gt;rank&lt;/th&gt;
&lt;th&gt;dense_rank&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;95&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;90&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;90&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;85&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  Rank Behaviour
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Equal values share the same rank&lt;/li&gt;
&lt;li&gt;Gaps appear after ties&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Dense_Rank Behaviour
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Equal values share the same rank&lt;/li&gt;
&lt;li&gt;No gaps in ranking&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;RANK counts positions, DENSE_RANK counts distinct values&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  3. Grouping into Buckets with NTILE()
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How it works
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Rows are sorted by marks&lt;/li&gt;
&lt;li&gt;Data is divided into equal groups as much as possible&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key behaviour
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Groups may not be equal&lt;/li&gt;
&lt;li&gt;Extra rows go to top groups first&lt;/li&gt;
&lt;li&gt;Ties are ignored
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;result_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;marks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;NTILE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;performance_band&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Group&lt;/th&gt;
&lt;th&gt;Rows&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Divide the list into equal buckets, then fill from the top in case of any remainders&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  4. Adding Context with AVG() OVER()
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Requirement
&lt;/h3&gt;

&lt;p&gt;Show each exam result alongside the student’s average mark.&lt;/p&gt;

&lt;h3&gt;
  
  
  Query
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;marks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;marks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;student_avg&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How it works
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Data is grouped per student using PARTITION BY&lt;/li&gt;
&lt;li&gt;Average is calculated per group&lt;/li&gt;
&lt;li&gt;Result is attached to each row&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key behaviour
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Rows are not collapsed&lt;/li&gt;
&lt;li&gt;Averages repeat per student&lt;/li&gt;
&lt;li&gt;Original detail is preserved&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;student_id&lt;/th&gt;
&lt;th&gt;marks&lt;/th&gt;
&lt;th&gt;student_avg&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;td&gt;85.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;90&lt;/td&gt;
&lt;td&gt;85.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;70&lt;/td&gt;
&lt;td&gt;70.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Why not GROUP BY?
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;marks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Returns one row per student&lt;/li&gt;
&lt;li&gt;Loses exam-level detail&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Calculate once per group, attach results everywhere&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;What it answers&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ROW_NUMBER()&lt;/td&gt;
&lt;td&gt;Exact order of rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RANK()&lt;/td&gt;
&lt;td&gt;Position with gaps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DENSE_RANK()&lt;/td&gt;
&lt;td&gt;Grouping without gaps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NTILE()&lt;/td&gt;
&lt;td&gt;Bucket distribution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AVG() OVER()&lt;/td&gt;
&lt;td&gt;Context per row&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Conclusions
&lt;/h3&gt;

&lt;p&gt;Window functions are not just about computation — they change how you think about data.&lt;/p&gt;

&lt;p&gt;Instead of reducing rows, they allow you to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add meaning&lt;/li&gt;
&lt;li&gt;Compare within context&lt;/li&gt;
&lt;li&gt;Build richer analysis directly in SQL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once that clicks, everything else becomes much easier, or does it?&lt;/p&gt;

</description>
      <category>newbie</category>
      <category>sql</category>
    </item>
    <item>
      <title>Understanding SQL: DDL, DML, and data manipulation</title>
      <dc:creator>Tom Chege</dc:creator>
      <pubDate>Sat, 11 Apr 2026 06:20:44 +0000</pubDate>
      <link>https://dev.to/tom_chege/understanding-sql-ddl-dml-and-data-manipulation-4ad</link>
      <guid>https://dev.to/tom_chege/understanding-sql-ddl-dml-and-data-manipulation-4ad</guid>
      <description>&lt;h3&gt;
  
  
  What are DDL and DML?
&lt;/h3&gt;

&lt;p&gt;SQL commands are grouped based on &lt;strong&gt;what they act on&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Think of a database as a house:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;DDL builds the house (structure)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DML furnishes and maintains the house (data inside it)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  DDL (data definition language) — building the structure
&lt;/h3&gt;

&lt;p&gt;DDL is used to define and manage the &lt;strong&gt;structure&lt;/strong&gt; of your database.&lt;/p&gt;

&lt;p&gt;It answers questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What tables exist?&lt;/li&gt;
&lt;li&gt;What columns do they have?&lt;/li&gt;
&lt;li&gt;What type of data goes into them?&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Common DDL commands
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;CREATE&lt;/code&gt; -&amp;gt; create new tables
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ALTER&lt;/code&gt; -&amp;gt; modify existing tables
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;DROP&lt;/code&gt; -&amp;gt; delete tables
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Example: creating a table
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;student_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;first_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;age&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Key idea
&lt;/h4&gt;

&lt;p&gt;DDL works on the &lt;strong&gt;entire structure&lt;/strong&gt;, not individual rows.&lt;/p&gt;




&lt;h3&gt;
  
  
  DML (data manipulation language) — working with the data
&lt;/h3&gt;

&lt;p&gt;DML is used to interact with the &lt;strong&gt;actual data inside the tables&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It answers questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What data do we have?&lt;/li&gt;
&lt;li&gt;How do we add, update, or remove it?&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Common DML commands
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;SELECT&lt;/code&gt; -&amp;gt; retrieve data
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;INSERT&lt;/code&gt; -&amp;gt; add data
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;UPDATE&lt;/code&gt; -&amp;gt; modify data
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;DELETE&lt;/code&gt; -&amp;gt; remove data
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Examples of DML in action
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Inserting data
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Alice'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Nakuru'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Retrieving data
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Updating data
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Nairobi'&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;student_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Deleting data
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;student_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  DDL vs DML — key differences
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;aspect&lt;/th&gt;
&lt;th&gt;DDL (data definition language)&lt;/th&gt;
&lt;th&gt;DML (data manipulation language)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Purpose&lt;/td&gt;
&lt;td&gt;defines and manages database structure&lt;/td&gt;
&lt;td&gt;works with the data inside tables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Focus&lt;/td&gt;
&lt;td&gt;tables, schemas, columns&lt;/td&gt;
&lt;td&gt;rows and records&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Level of impact&lt;/td&gt;
&lt;td&gt;affects entire objects (tables, schemas)&lt;/td&gt;
&lt;td&gt;affects specific rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Common commands&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;CREATE&lt;/code&gt;, &lt;code&gt;ALTER&lt;/code&gt;, &lt;code&gt;DROP&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;SELECT&lt;/code&gt;, &lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, &lt;code&gt;DELETE&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transaction behaviour&lt;/td&gt;
&lt;td&gt;auto-committed (usually cannot rollback)&lt;/td&gt;
&lt;td&gt;transactional (can rollback before commit)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Use of WHERE&lt;/td&gt;
&lt;td&gt;not used (operates on full objects)&lt;/td&gt;
&lt;td&gt;heavily used to filter specific rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-world analogy&lt;/td&gt;
&lt;td&gt;building or modifying a house&lt;/td&gt;
&lt;td&gt;adding, changing, or removing furniture&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Risk level&lt;/td&gt;
&lt;td&gt;high (can remove entire structures)&lt;/td&gt;
&lt;td&gt;moderate (usually scoped to selected data)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Filtering data with the where clause
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;WHERE&lt;/code&gt; clause helps you &lt;strong&gt;target specific rows&lt;/strong&gt; instead of working on everything.&lt;/p&gt;

&lt;p&gt;Think of it as a filter.&lt;/p&gt;




&lt;h3&gt;
  
  
  Examples
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Equality '='
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Nairobi'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  greater than '&amp;gt;'
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  between(inclusive of the numbers in the range)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="mi"&gt;79&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  in (multiple values)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Form 2'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Form 3'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  like (pattern matching)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'A%'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Transforming data with CASE WHEN
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;CASE WHEN&lt;/code&gt; statement is a powerful tool for adding conditional logic to queries, transforming raw data into more meaningful information.&lt;/p&gt;




&lt;h3&gt;
  
  
  For instance, I used it to convert numerical marks into a simpler way that shows the performance at a glance.
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;marks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;CASE&lt;/span&gt; 
        &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Distinction'&lt;/span&gt;
        &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Merit'&lt;/span&gt;
        &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;marks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt;  &lt;span class="s1"&gt;'Pass'&lt;/span&gt;
        &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="s1"&gt;'Fail'&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;grade&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;exam_results&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Reflection
&lt;/h2&gt;

&lt;p&gt;This week, I found the distinction between DDL and DML to be the most interesting.&lt;/p&gt;

&lt;p&gt;The simplest way to think about SQL:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;DDL builds the system&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DML brings it to life&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If DDL creates the classroom,&lt;br&gt;&lt;br&gt;
DML is the students, lessons, and exams happening inside it.&lt;/p&gt;

&lt;p&gt;Let's see what the next lessons bring.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>buildinpublic</category>
      <category>data</category>
      <category>newbie</category>
    </item>
    <item>
      <title>How to publish a Power BI report and embed it into a website</title>
      <dc:creator>Tom Chege</dc:creator>
      <pubDate>Sun, 05 Apr 2026 09:43:14 +0000</pubDate>
      <link>https://dev.to/tom_chege/how-to-publish-a-power-bi-report-and-embed-it-into-a-website-181k</link>
      <guid>https://dev.to/tom_chege/how-to-publish-a-power-bi-report-and-embed-it-into-a-website-181k</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Power BI is a tool used for data cleaning, analysis, and presentation. Once you build your report locally, the next step is making it accessible to others.&lt;/p&gt;

&lt;p&gt;In this guide, I will walk you through the process step by step. The end goal is to generate a shareable dashboard that can be viewed in a browser using an embedded iframe.&lt;/p&gt;




&lt;h2&gt;
  
  
  Items Needed
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Your organization’s Power BI credentials: &lt;a href="https://app.powerbi.com" rel="noopener noreferrer"&gt;https://app.powerbi.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;A completed dashboard report available locally on your machine
&lt;/li&gt;
&lt;li&gt;A text editor such as VS Code or Sublime
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you have the above, you can follow through the process to completion.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Open Your Dashboard Report
&lt;/h2&gt;

&lt;p&gt;Open your Power BI report on your local machine.&lt;/p&gt;

&lt;p&gt;Click on &lt;strong&gt;Sign in&lt;/strong&gt; on the far right of the screen as shown in the image below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvnmtv8tjjwbhmxriwez0.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvnmtv8tjjwbhmxriwez0.PNG" alt=" " width="800" height="184"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: Sign In to Power BI
&lt;/h2&gt;

&lt;p&gt;Proceed and provide the necessary credentials.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: Power BI Service requires an organization email. Personal emails such as Gmail will not work.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once successful, your account will appear on the top right.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: Select or Create a Workspace
&lt;/h2&gt;

&lt;p&gt;On the left navigation bar in the Power BI Service:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click on &lt;strong&gt;Workspaces&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Either:

&lt;ul&gt;
&lt;li&gt;Create a new workspace, or
&lt;/li&gt;
&lt;li&gt;Use an existing one provided by your organization
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Workspaces are where your reports are stored after publishing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: Publish the Report
&lt;/h2&gt;

&lt;p&gt;On Power BI Desktop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click &lt;strong&gt;Publish&lt;/strong&gt; on the top navigation&lt;/li&gt;
&lt;li&gt;Choose your workspace&lt;/li&gt;
&lt;li&gt;Select from the dropdown or search for it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftrw000zvgm0zge7txjuu.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftrw000zvgm0zge7txjuu.PNG" alt=" " width="800" height="211"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5: Confirm Publishing
&lt;/h2&gt;

&lt;p&gt;Once the workspace is selected, proceed to publish.&lt;/p&gt;

&lt;p&gt;You will receive a confirmation screen with a link indicating that the report has been successfully published.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 6: Generate the Embed Code
&lt;/h2&gt;

&lt;p&gt;Go to the Power BI Service in your browser:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Refresh your workspace&lt;/li&gt;
&lt;li&gt;Open the report you just uploaded&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;File → Embed report&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You will see the following options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SharePoint Online
&lt;/li&gt;
&lt;li&gt;Website or portal
&lt;/li&gt;
&lt;li&gt;Publish to web (public)
&lt;/li&gt;
&lt;li&gt;Developer playground
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For this tutorial, select &lt;strong&gt;Website or portal&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Copy the iFrame code that is generated.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 7: Create an HTML File
&lt;/h2&gt;

&lt;p&gt;Open your text editor and create a new file such as &lt;code&gt;index.html&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You can use a basic HTML boilerplate such as the one from:&lt;br&gt;&lt;br&gt;
&lt;a href="https://www.w3schools.com/html/default.asp" rel="noopener noreferrer"&gt;https://www.w3schools.com/html/default.asp&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;!DOCTYPE html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;head&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;title&amp;gt;&lt;/span&gt;Sample Dashboard&lt;span class="nt"&gt;&amp;lt;/title&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/head&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;body&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;h1&amp;gt;&lt;/span&gt;My First Power BI Dashboard Website&lt;span class="nt"&gt;&amp;lt;/h1&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;iframe&lt;/span&gt; &lt;span class="na"&gt;title=&lt;/span&gt;&lt;span class="s"&gt;"Kenya crop data"&lt;/span&gt; &lt;span class="na"&gt;width=&lt;/span&gt;&lt;span class="s"&gt;"1140"&lt;/span&gt; &lt;span class="na"&gt;height=&lt;/span&gt;&lt;span class="s"&gt;"541.25"&lt;/span&gt; &lt;span class="na"&gt;src=
&lt;/span&gt;&lt;span class="s"&gt;"https://app.powerbi.com/reportEmbed?reportId=YOUR EMBEDDED LINK"&lt;/span&gt; 
&lt;span class="na"&gt;frameborder=&lt;/span&gt;&lt;span class="s"&gt;"0"&lt;/span&gt; &lt;span class="na"&gt;allowFullScreen=&lt;/span&gt;&lt;span class="s"&gt;"true"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/iframe&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;/body&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/html&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Paste your copied iFrame into the HTML file.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 8: Open and Test
&lt;/h2&gt;

&lt;p&gt;Save the file.&lt;/p&gt;

&lt;p&gt;Right-click on &lt;code&gt;index.html&lt;/code&gt; and open it with your browser.&lt;/p&gt;

&lt;p&gt;It should render a page similar to the image below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frdbsx6qxhw4859ozdwbc.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frdbsx6qxhw4859ozdwbc.PNG" alt=" " width="800" height="631"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You may be prompted to sign in to Power BI to access the dashboard.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;You have successfully:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Published a Power BI report to the cloud
&lt;/li&gt;
&lt;li&gt;Generated an embedded link
&lt;/li&gt;
&lt;li&gt;Displayed the report in a web page using an iFrame
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach allows you to share dashboards in a more interactive and accessible way.&lt;/p&gt;




&lt;h2&gt;
  
  
  Notes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Users will need Power BI access to view the embedded report
&lt;/li&gt;
&lt;li&gt;For public sharing, &lt;strong&gt;Publish to web&lt;/strong&gt; can be used, but this should be handled carefully since it exposes your data
&lt;/li&gt;
&lt;li&gt;Ensure the right workspace permissions are set if other users cannot access the report
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Congratulations — you have now built and shared your first embedded Power BI dashboard.&lt;/p&gt;

</description>
      <category>data</category>
      <category>powerfuldevs</category>
      <category>newbie</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Understanding Data Modelling in Power BI: Joins, Relationships, and Schemas Explained</title>
      <dc:creator>Tom Chege</dc:creator>
      <pubDate>Sun, 29 Mar 2026 18:10:19 +0000</pubDate>
      <link>https://dev.to/tom_chege/understanding-data-modelling-in-power-bi-joins-relationships-and-schemas-explained-e3g</link>
      <guid>https://dev.to/tom_chege/understanding-data-modelling-in-power-bi-joins-relationships-and-schemas-explained-e3g</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Working with raw data often introduces challenges such as duplication, inconsistency, and difficulty in extracting meaningful insights.&lt;/p&gt;

&lt;p&gt;Data modelling in Power BI addresses these challenges by structuring data into related tables, enabling efficient querying, accurate calculations, and scalable reporting.&lt;/p&gt;

&lt;p&gt;This article explores:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SQL joins and their behaviour&lt;/li&gt;
&lt;li&gt;Power BI relationships and how they differ from joins&lt;/li&gt;
&lt;li&gt;Data modelling schemas&lt;/li&gt;
&lt;li&gt;Practical implementation in Power BI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’ll use a simple school dataset to ground these concepts.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;1. Sample Dataset&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We will work with three core tables:&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Students (Dimension Table)&lt;/strong&gt;
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;StudentID&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Age&lt;/th&gt;
&lt;th&gt;Class&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Form 2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;Brian&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Form 3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;103&lt;/td&gt;
&lt;td&gt;Catherine&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Form 2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;104&lt;/td&gt;
&lt;td&gt;David&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;Form 4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h4&gt;
  
  
  &lt;strong&gt;Test Scores (Fact Table)&lt;/strong&gt;
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;ScoreID&lt;/th&gt;
&lt;th&gt;StudentID&lt;/th&gt;
&lt;th&gt;Subject&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;Math&lt;/td&gt;
&lt;td&gt;85&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;td&gt;78&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;Math&lt;/td&gt;
&lt;td&gt;92&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;105&lt;/td&gt;
&lt;td&gt;Science&lt;/td&gt;
&lt;td&gt;88&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;⚠️ Note: Student &lt;strong&gt;105&lt;/strong&gt; does not exist in the Students table—this represents a referential integrity issue.&lt;/p&gt;

&lt;h5&gt;
  
  
  📊 View Dataset Online
&lt;/h5&gt;

&lt;p&gt;Explore the dataset directly: 👉&lt;a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vQJi3siC_IO_hsG7fUmNr-CKfB9aLifLiyCNCxhjRMuzpn5TTRuJ4SZNt2hcDbO3NhwAO3TEBTdC30t/pubhtml" rel="noopener noreferrer"&gt;here!&lt;/a&gt;&lt;/p&gt;




&lt;h4&gt;
  
  
  &lt;strong&gt;Teachers (Dimension Table)&lt;/strong&gt;
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;TeacherID&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Subject&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;T1&lt;/td&gt;
&lt;td&gt;Mr. Kim&lt;/td&gt;
&lt;td&gt;Math&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;T2&lt;/td&gt;
&lt;td&gt;Ms. Asha&lt;/td&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;T3&lt;/td&gt;
&lt;td&gt;Mr. Otieno&lt;/td&gt;
&lt;td&gt;Science&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Data Roles&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fact Table:&lt;/strong&gt; Test Scores (transactional/measurable data)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dimension Tables:&lt;/strong&gt; Students, Teachers (descriptive attributes)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. What is Data Modelling?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Data modelling is the process of:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Organizing data into structured tables and defining relationships that enable efficient querying and accurate analysis.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In analytical systems like Power BI, this typically involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Separating &lt;strong&gt;facts&lt;/strong&gt; from &lt;strong&gt;dimensions&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Establishing &lt;strong&gt;keys&lt;/strong&gt; (e.g., &lt;code&gt;StudentID&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Designing a schema optimized for performance&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3. Joins vs Relationships&lt;/strong&gt;
&lt;/h3&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Joins (Power Query Layer)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A join is a &lt;strong&gt;data transformation operation&lt;/strong&gt; that combines rows from two or more tables based on a matching condition.&lt;/p&gt;

&lt;h4&gt;
  
  
  Key Characteristics:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Produces a &lt;strong&gt;new table&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Data is &lt;strong&gt;physically merged&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Can introduce &lt;strong&gt;duplication&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Example:
&lt;/h4&gt;

&lt;p&gt;Joining Students and Test Scores results in repeated student records for each score.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Relationships (Model Layer)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A relationship is a &lt;strong&gt;logical link&lt;/strong&gt; between tables based on matching columns.&lt;/p&gt;

&lt;h4&gt;
  
  
  Key Characteristics:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Tables remain &lt;strong&gt;separate&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Data is combined &lt;strong&gt;at query time&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Enables efficient filtering and aggregation via DAX&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Critical Difference&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Join&lt;/th&gt;
&lt;th&gt;Relationship&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Type&lt;/td&gt;
&lt;td&gt;Physical transformation&lt;/td&gt;
&lt;td&gt;Logical connection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Location&lt;/td&gt;
&lt;td&gt;Power Query&lt;/td&gt;
&lt;td&gt;Model View&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data duplication&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance impact&lt;/td&gt;
&lt;td&gt;Higher memory usage&lt;/td&gt;
&lt;td&gt;Optimized&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;4. SQL Joins Explained (with Examples)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Joins determine how rows from two tables are combined based on a matching condition.&lt;/p&gt;

&lt;p&gt;We will use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Students table (left table)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Test Scores table (right table)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;INNER JOIN&lt;/strong&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Definition
&lt;/h4&gt;

&lt;p&gt;Returns only rows where a match exists in both tables.&lt;/p&gt;

&lt;h4&gt;
  
  
  SQL Example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Students&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;TestScores&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Result
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;StudentID&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Subject&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;Math&lt;/td&gt;
&lt;td&gt;85&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;td&gt;78&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;Brian&lt;/td&gt;
&lt;td&gt;Math&lt;/td&gt;
&lt;td&gt;92&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Explanation
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Student 103 and 104 are excluded (no scores)&lt;/li&gt;
&lt;li&gt;Score with StudentID 105 is excluded (no matching student)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;LEFT JOIN&lt;/strong&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Definition
&lt;/h4&gt;

&lt;p&gt;Returns all rows from the left table, with matching rows from the right.&lt;/p&gt;

&lt;h4&gt;
  
  
  SQL Example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Students&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;TestScores&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Result
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;StudentID&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Subject&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;Math&lt;/td&gt;
&lt;td&gt;85&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;td&gt;78&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;Brian&lt;/td&gt;
&lt;td&gt;Math&lt;/td&gt;
&lt;td&gt;92&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;103&lt;/td&gt;
&lt;td&gt;Catherine&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;104&lt;/td&gt;
&lt;td&gt;David&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Explanation
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;All students are included&lt;/li&gt;
&lt;li&gt;Missing scores appear as NULL&lt;/li&gt;
&lt;li&gt;Useful for identifying missing activity&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;RIGHT JOIN&lt;/strong&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Definition
&lt;/h4&gt;

&lt;p&gt;Returns all rows from the right table and matching rows from the left.&lt;/p&gt;

&lt;h4&gt;
  
  
  SQL Example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Students&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="k"&gt;RIGHT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;TestScores&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Result
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;StudentID&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Subject&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;Math&lt;/td&gt;
&lt;td&gt;85&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;td&gt;78&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;Brian&lt;/td&gt;
&lt;td&gt;Math&lt;/td&gt;
&lt;td&gt;92&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;Science&lt;/td&gt;
&lt;td&gt;88&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Explanation
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Score for StudentID 105 appears&lt;/li&gt;
&lt;li&gt;Missing student info is NULL&lt;/li&gt;
&lt;li&gt;Highlights data inconsistencies&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;FULL OUTER JOIN&lt;/strong&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Definition
&lt;/h4&gt;

&lt;p&gt;Returns all rows from both tables, matched where possible.&lt;/p&gt;

&lt;h4&gt;
  
  
  SQL Example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Students&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="k"&gt;FULL&lt;/span&gt; &lt;span class="k"&gt;OUTER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;TestScores&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Result
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;StudentID&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Subject&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;Math&lt;/td&gt;
&lt;td&gt;85&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;td&gt;78&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;Brian&lt;/td&gt;
&lt;td&gt;Math&lt;/td&gt;
&lt;td&gt;92&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;103&lt;/td&gt;
&lt;td&gt;Catherine&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;104&lt;/td&gt;
&lt;td&gt;David&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;Science&lt;/td&gt;
&lt;td&gt;88&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Explanation
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Combines LEFT and RIGHT JOIN behavior&lt;/li&gt;
&lt;li&gt;Shows all matched and unmatched records&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;LEFT ANTI JOIN&lt;/strong&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Definition
&lt;/h4&gt;

&lt;p&gt;Returns rows from the left table with no match in the right.&lt;/p&gt;

&lt;h4&gt;
  
  
  SQL Equivalent
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Students&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;TestScores&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Result
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;StudentID&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;103&lt;/td&gt;
&lt;td&gt;Catherine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;104&lt;/td&gt;
&lt;td&gt;David&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Explanation
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;These students have no test scores&lt;/li&gt;
&lt;li&gt;Useful for identifying inactivity&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;RIGHT ANTI JOIN&lt;/strong&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Definition
&lt;/h4&gt;

&lt;p&gt;Returns rows from the right table with no match in the left.&lt;/p&gt;

&lt;h4&gt;
  
  
  SQL Equivalent
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Subject&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;TestScores&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;Students&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StudentID&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Result
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;StudentID&lt;/th&gt;
&lt;th&gt;Subject&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;105&lt;/td&gt;
&lt;td&gt;Science&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Explanation
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;This score has no valid student&lt;/li&gt;
&lt;li&gt;Indicates a data quality issue&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Key Insight&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Each join answers a specific business question:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Join Type&lt;/th&gt;
&lt;th&gt;Business Question&lt;/th&gt;
&lt;th&gt;Example from Our Dataset&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;INNER&lt;/td&gt;
&lt;td&gt;What data exists in both tables?&lt;/td&gt;
&lt;td&gt;Which students have recorded test scores? → Alice and Brian&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LEFT&lt;/td&gt;
&lt;td&gt;What records are missing related data?&lt;/td&gt;
&lt;td&gt;Which students are in the system but have not taken any tests? → Catherine and David&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RIGHT&lt;/td&gt;
&lt;td&gt;What data exists on the secondary side?&lt;/td&gt;
&lt;td&gt;Are there any test scores without valid student records? → Score for StudentID 105&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FULL&lt;/td&gt;
&lt;td&gt;What does the complete dataset look like?&lt;/td&gt;
&lt;td&gt;What is the full picture of students and scores, including missing or unmatched data?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LEFT ANTI&lt;/td&gt;
&lt;td&gt;What records in the main table have no matches?&lt;/td&gt;
&lt;td&gt;Which students have no test scores at all? → Catherine and David&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RIGHT ANTI&lt;/td&gt;
&lt;td&gt;What records are invalid or inconsistent?&lt;/td&gt;
&lt;td&gt;Which test scores do not belong to any student? → StudentID 105&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;5. Power BI Relationships&lt;/strong&gt;
&lt;/h3&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Cardinality&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Defines how rows relate between tables.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1:M&lt;/td&gt;
&lt;td&gt;One dimension record relates to many fact records&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;M:M&lt;/td&gt;
&lt;td&gt;Multiple matches on both sides&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1:1&lt;/td&gt;
&lt;td&gt;Unique mapping&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;👉 Best practice:&lt;br&gt;
Use &lt;strong&gt;1:M from Dimension → Fact&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Cross-Filter Direction&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Defines how filters propagate between tables.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Behaviour&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single&lt;/td&gt;
&lt;td&gt;One-way filtering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Both&lt;/td&gt;
&lt;td&gt;Bi-directional filtering&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;⚠️ Overusing bi-directional filtering can introduce ambiguity.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Active vs Inactive Relationships&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Only one relationship can be active between two tables&lt;/li&gt;
&lt;li&gt;Inactive relationships can be used via DAX (&lt;code&gt;USERELATIONSHIP&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;6. Fact vs Dimension Tables&lt;/strong&gt;
&lt;/h3&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Fact Table&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Contains measurable data&lt;/li&gt;
&lt;li&gt;Often large&lt;/li&gt;
&lt;li&gt;Example: Test Scores&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Dimension Table&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Contains descriptive attributes&lt;/li&gt;
&lt;li&gt;Used for filtering and grouping&lt;/li&gt;
&lt;li&gt;Example: Students&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;7. Schemas&lt;/strong&gt;
&lt;/h3&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Star Schema&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;          Students
             |
Teachers — Test Scores — Subjects
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Characteristics:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Central fact table&lt;/li&gt;
&lt;li&gt;Denormalized dimensions&lt;/li&gt;
&lt;li&gt;Optimized for performance&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Snowflake Schema&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Students
   |
Address
   |
Region
   |
Test Scores
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Characteristics:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Normalized dimensions&lt;/li&gt;
&lt;li&gt;Reduced redundancy&lt;/li&gt;
&lt;li&gt;Increased complexity&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Flat Table (Denormalized Model)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;All data stored in one table.&lt;/p&gt;

&lt;h4&gt;
  
  
  Trade-offs:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Simpler structure&lt;/li&gt;
&lt;li&gt;Poor scalability&lt;/li&gt;
&lt;li&gt;High duplication&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;8. Implementation in Power BI&lt;/strong&gt;
&lt;/h3&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Power Query (Data Preparation)&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Clean data&lt;/li&gt;
&lt;li&gt;Perform joins&lt;/li&gt;
&lt;li&gt;Handle missing values&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Model View (Data Modelling)&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Create relationships&lt;/li&gt;
&lt;li&gt;Define cardinality&lt;/li&gt;
&lt;li&gt;Configure filters&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Best Practice Workflow&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Load data&lt;/li&gt;
&lt;li&gt;Clean in Power Query&lt;/li&gt;
&lt;li&gt;Avoid unnecessary joins&lt;/li&gt;
&lt;li&gt;Build relationships in Model View&lt;/li&gt;
&lt;li&gt;Validate using visuals&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;9. Common Modelling Issues&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Data duplication from excessive joins&lt;/li&gt;
&lt;li&gt;Incorrect cardinality&lt;/li&gt;
&lt;li&gt;Ambiguous filter paths&lt;/li&gt;
&lt;li&gt;Missing or inconsistent keys&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Conclusion and Key Takeaways&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Data modelling&lt;/strong&gt; is the foundation of effective analysis in Power BI. It defines how data is structured, interpreted, and ultimately how insights are derived.&lt;/p&gt;

&lt;p&gt;Throughout this article, we’ve seen how:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Joins&lt;/strong&gt; are used to reshape and prepare data by combining tables in Power Query&lt;br&gt;
&lt;strong&gt;Relationships&lt;/strong&gt; allow Power BI to connect tables logically, enabling efficient analysis without duplication&lt;br&gt;
&lt;strong&gt;Schemas&lt;/strong&gt;, particularly the &lt;strong&gt;star schema&lt;/strong&gt;, provide a scalable structure for organizing fact and dimension tables&lt;br&gt;
&lt;strong&gt;Data integrity&lt;/strong&gt; (e.g., handling missing or invalid records like StudentID 105) directly impacts the quality of insights&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Practical Takeaways&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;em&gt;Power Query&lt;/em&gt; for data cleaning and controlled joins&lt;/li&gt;
&lt;li&gt;Use &lt;em&gt;Model View&lt;/em&gt; to define relationships and control how data interacts&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Prefer 1:M relationships&lt;/em&gt; from dimension to fact tables&lt;/li&gt;
&lt;li&gt;Default to &lt;em&gt;single-direction filtering&lt;/em&gt; unless there is a clear need for bi-directional filtering&lt;/li&gt;
&lt;li&gt;Adopt a &lt;em&gt;star schema&lt;/em&gt; for most analytical scenarios&lt;/li&gt;
&lt;li&gt;Treat &lt;em&gt;missing or unmatched data&lt;/em&gt; as valuable signals, not just errors&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>data</category>
      <category>powerfuldevs</category>
      <category>newbie</category>
    </item>
    <item>
      <title>How Excel is Used in Real-World Data Analysis</title>
      <dc:creator>Tom Chege</dc:creator>
      <pubDate>Tue, 24 Mar 2026 06:13:39 +0000</pubDate>
      <link>https://dev.to/tom_chege/excel-data-cleaning-31p2</link>
      <guid>https://dev.to/tom_chege/excel-data-cleaning-31p2</guid>
      <description>&lt;p&gt;Working with real-world data is rarely clean. Unlike textbook examples, datasets often contain inconsistencies like text mixed with numbers, duplicate values, missing fields, and ambiguous identifiers.&lt;/p&gt;

&lt;p&gt;Over the past week, I’ve been exploring key Excel concepts such as error handling, aggregation, lookup functions, and data cleaning. Each of these plays a critical role in transforming raw, messy data into reliable insights.&lt;/p&gt;

&lt;p&gt;In this article, I’ll focus specifically on &lt;strong&gt;error handling&lt;/strong&gt; which is a foundational skill that helps ensure your formulas remain robust even when your data isn’t.&lt;/p&gt;




&lt;h2&gt;
  
  
  What are Excel errors and why do they matter?
&lt;/h2&gt;

&lt;p&gt;Excel errors are often seen as problems to eliminate, but they are actually signals. Each error tells you something specific about your data or logic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;#DIV/0!&lt;/code&gt; → You’re dividing by zero or an empty cell&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;#N/A&lt;/code&gt; → A lookup couldn’t find a match&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;#VALUE!&lt;/code&gt; → The data type is incorrect (e.g., text instead of a number)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;#NAME?&lt;/code&gt; → Excel doesn’t recognize part of your formula&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ignoring these errors or masking them without understanding their cause can lead to misleading results. Instead, they should be treated as indicators of underlying data issues.&lt;/p&gt;




&lt;h2&gt;
  
  
  The role of IFERROR
&lt;/h2&gt;

&lt;p&gt;One of the most commonly used tools for handling errors in Excel is the &lt;code&gt;IFERROR&lt;/code&gt; function. It allows you to return an alternative result when a formula fails.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=IFERROR(Bonus/Salary, 0)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures that if Salary is missing or zero, the formula does not break but instead returns &lt;code&gt;0&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;⚠️ While this improves usability, it introduces a new responsibility: ensuring that the fall back value does not distort the meaning of your analysis.&lt;/p&gt;




&lt;h2&gt;
  
  
  The risk of hiding errors
&lt;/h2&gt;

&lt;p&gt;A common mistake is using &lt;code&gt;IFERROR&lt;/code&gt; to suppress errors without considering their implications.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=IFERROR(Salary * 0.1, 0)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If Salary is missing, this formula returns &lt;code&gt;0&lt;/code&gt;, which may incorrectly suggest that the calculated value is genuinely zero.&lt;/p&gt;

&lt;p&gt;A more transparent approach would be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=IF(ISBLANK(Salary), "Missing Salary", Salary * 0.1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes the issue visible and preserves the integrity of your data.&lt;/p&gt;




&lt;h2&gt;
  
  
  Handling specific errors more intelligently
&lt;/h2&gt;

&lt;p&gt;Not all errors should be treated the same. In many cases, it is better to target specific error types.&lt;/p&gt;

&lt;p&gt;For example, when working with lookups:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=IF(ISNA(VLOOKUP(A2, A:B, 2, FALSE)), "Not Found", VLOOKUP(A2, A:B, 2, FALSE))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This distinguishes between a missing value (&lt;code&gt;#N/A&lt;/code&gt;) and other potential issues, allowing for more precise handling.&lt;/p&gt;




&lt;h2&gt;
  
  
  Combining logic and error handling
&lt;/h2&gt;

&lt;p&gt;Robust formulas often combine conditional logic with error handling to cover multiple scenarios.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=IFERROR(
  IF(ISBLANK(A2), "No Input", XLOOKUP(A2, A:A, B:B)),
  "Not Found"
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Checks for empty input -- &lt;code&gt;ISBLANK&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Performs a lookup when valid -- &lt;code&gt;XLOOKUP&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Returns a clear message if the lookup fails -- &lt;code&gt;"No Input"&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Such layered logic makes your formulas more resilient and easier to interpret.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I have learnt about error handling
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Treat errors as signals, not just problems&lt;/li&gt;
&lt;li&gt;Avoid blindly replacing errors with default/placeholder values&lt;/li&gt;
&lt;li&gt;Use targeted checks such as &lt;code&gt;ISNA&lt;/code&gt; or &lt;code&gt;ISBLANK&lt;/code&gt; where appropriate&lt;/li&gt;
&lt;li&gt;Ensure default/placeholder values do not distort analysis&lt;/li&gt;
&lt;li&gt;Design formulas with real-world data imperfections in mind&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Final takeaway:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Errors are feedback. By understanding the types of errors, applying functions like &lt;code&gt;IFERROR&lt;/code&gt; thoughtfully, and combining them with logical checks, you can build Excel models that are both reliable and transparent.&lt;/p&gt;




</description>
      <category>learning</category>
      <category>data</category>
    </item>
    <item>
      <title>Creating a windows server VM on Azure</title>
      <dc:creator>Tom Chege</dc:creator>
      <pubDate>Thu, 19 Mar 2026 06:51:07 +0000</pubDate>
      <link>https://dev.to/tom_chege/creating-a-windows-server-vm-on-azure-3h6m</link>
      <guid>https://dev.to/tom_chege/creating-a-windows-server-vm-on-azure-3h6m</guid>
      <description>&lt;p&gt;Some problems in software are complex and then there are the other peculiar problems.&lt;/p&gt;

&lt;p&gt;The ones that appear simple until they quietly refuse to cooperate.&lt;/p&gt;

&lt;p&gt;Recently, I needed to spin up a virtual machine on &lt;strong&gt;Microsoft Azure&lt;/strong&gt;. Nothing fancy. Just a small server running &lt;strong&gt;Windows Server 2025 Datacenter&lt;/strong&gt; for a bit of experimentation.&lt;/p&gt;

&lt;p&gt;In theory, creating a VM in the cloud should be one of the most straightforward tasks in modern computing. Click a few buttons, choose an operating system, wait a moment, and somewhere in a distant datacenter a machine materializes.&lt;/p&gt;

&lt;p&gt;Reality, as it often does, had a slightly different plan.&lt;/p&gt;

&lt;p&gt;The deployment failed with a message that looked innocuous but carried just enough ambiguity to slow things down:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The selected image is not valid for the specified location.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At first glance, it reads like a configuration mistake. Choose a different option and try again. But cloud systems rarely fail randomly; they fail for reasons that are usually hidden one layer deeper than the interface suggests.&lt;/p&gt;

&lt;p&gt;In this case, the issue was geographical. I discovered this after tinkering with various images and location settings.&lt;/p&gt;

&lt;p&gt;Azure operates dozens of data centers across the world. Each region has slightly different hardware, capacity, and image availability. When you select an operating system image, you are implicitly asking a very physical question: &lt;em&gt;Does this datacenter actually have that image available right now?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Sometimes the answer is no.&lt;/p&gt;

&lt;p&gt;Once that realization clicks, the error message suddenly makes perfect sense.&lt;/p&gt;

&lt;p&gt;With that small puzzle resolved, I thought it would be useful to write down the exact process for creating a VM in Azure. If nothing else, it may spare someone else a few minutes of head scratching.&lt;/p&gt;




&lt;h2&gt;
  
  
  Creating a Virtual Machine in Azure
&lt;/h2&gt;

&lt;p&gt;The starting point is the Azure portal.&lt;/p&gt;

&lt;p&gt;Open the dashboard and log in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://portal.azure.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This interface is essentially the command center for everything running on &lt;strong&gt;Microsoft&lt;/strong&gt;’s cloud infrastructure.&lt;/p&gt;

&lt;p&gt;Once inside, use the search bar and type &lt;strong&gt;Virtual Machines&lt;/strong&gt;. Select the service and click:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create → Azure Virtual Machine
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That opens the configuration wizard where the real work begins.&lt;/p&gt;




&lt;h2&gt;
  
  
  Basic Configuration
&lt;/h2&gt;

&lt;p&gt;The first section defines the environment where the VM will live.&lt;/p&gt;

&lt;p&gt;Start by creating or selecting a &lt;strong&gt;resource group&lt;/strong&gt;. Resource groups act as containers that hold related infrastructure components.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vm-lab-rg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, give the machine a name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-first-vm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Naming may seem trivial, but good naming conventions save time later when environments grow larger.&lt;/p&gt;




&lt;h2&gt;
  
  
  Choosing the Right Region
&lt;/h2&gt;

&lt;p&gt;Azure requires you to pick a geographic region where the VM will run.&lt;/p&gt;

&lt;p&gt;Some commonly reliable regions include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;West Europe&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;UK South&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;UAE North&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The region choice affects latency, pricing, and as I learned sometimes which operating system images are available.&lt;/p&gt;




&lt;h2&gt;
  
  
  Selecting the Operating System
&lt;/h2&gt;

&lt;p&gt;This is the step where many deployments quietly succeed or fail.&lt;/p&gt;

&lt;p&gt;A widely compatible option is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Windows Server 2025 Datacenter: Azure Edition – x64 Gen2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This image integrates well with Azure’s virtualization environment and is generally available across most regions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Choosing a VM Size
&lt;/h2&gt;

&lt;p&gt;Next comes the machine size. In cloud terms, this defines CPU power, memory, and &lt;code&gt;cost&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For experimentation or lightweight workloads, something modest works well.&lt;/p&gt;

&lt;p&gt;Choosing a VM size is essentially selecting how powerful you want your rented computer to be.&lt;/p&gt;




&lt;h2&gt;
  
  
  Administrator Credentials
&lt;/h2&gt;

&lt;p&gt;Azure then asks you to create an administrator account for the machine.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Username: azureadmin
Password: ********
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are creating a Windows machine, enable the option that allows &lt;strong&gt;Remote Desktop (RDP)&lt;/strong&gt; access. This allows you to connect to the server after deployment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Storage and Networking
&lt;/h2&gt;

&lt;p&gt;For most use cases, the default settings here work perfectly well.&lt;/p&gt;

&lt;p&gt;Azure automatically provisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a virtual network&lt;/li&gt;
&lt;li&gt;a subnet&lt;/li&gt;
&lt;li&gt;a public IP address&lt;/li&gt;
&lt;li&gt;a network security rule for RDP&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For storage, the default &lt;strong&gt;Standard SSD&lt;/strong&gt; disk is usually a good balance between performance and cost.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment
&lt;/h2&gt;

&lt;p&gt;With the configuration complete, click &lt;strong&gt;Review + Create&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Azure validates the settings and then begins the deployment process.&lt;/p&gt;

&lt;p&gt;Within a couple of minutes the platform allocates compute resources, attaches storage, configures networking, and boots the virtual machine.&lt;/p&gt;




&lt;h2&gt;
  
  
  Connecting to the Machine
&lt;/h2&gt;

&lt;p&gt;Once the deployment completes, the next step is actually getting into the server.&lt;/p&gt;

&lt;p&gt;Azure provides a few connection options through the portal, but the simplest approach is to connect using &lt;strong&gt;Microsoft Remote Desktop&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;After the VM is created, Azure automatically assigns a &lt;strong&gt;public IP address&lt;/strong&gt; to the machine. You can find it in the VM’s overview page inside &lt;strong&gt;Microsoft Azure&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In my case, I simply copied that public IP and opened a connection through the Microsoft Remote Desktop client.&lt;/p&gt;

&lt;p&gt;The process looks roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Host: &amp;lt;public-ip-address&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When prompted, enter the administrator credentials you created during the VM setup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Username: azureadmin
Password: ********
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If everything is configured correctly and the RDP port was allowed during deployment the connection should establish within a few seconds.&lt;/p&gt;

&lt;p&gt;At that point, you are no longer looking at the Azure portal. You are looking at the desktop of a Windows machine running somewhere inside &lt;strong&gt;Microsoft’s global datacenter network.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yet there it is, responding to your keyboard and mouse as if it were sitting right in front of you.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Small Reflection
&lt;/h2&gt;

&lt;p&gt;Cloud computing often feels magical. Servers appear and disappear on command. Storage scales endlessly. Networks stretch across continents.&lt;/p&gt;

&lt;p&gt;But every now and then the illusion cracks just enough to remind you what is happening underneath.&lt;/p&gt;

&lt;p&gt;And sometimes, all it takes to see that reality is &lt;code&gt;a small error message about a missing image in chosen location&lt;/code&gt;.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
