<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Lawrence Murithi</title>
    <description>The latest articles on DEV Community by Lawrence Murithi (@lawrence_murithi).</description>
    <link>https://dev.to/lawrence_murithi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3713327%2Fe1187555-8b89-4a2c-9168-be280e1c6b86.png</url>
      <title>DEV Community: Lawrence Murithi</title>
      <link>https://dev.to/lawrence_murithi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lawrence_murithi"/>
    <language>en</language>
    <item>
      <title>ETL vs ELT: Which One Should You Use and Why?</title>
      <dc:creator>Lawrence Murithi</dc:creator>
      <pubDate>Sat, 11 Apr 2026 17:51:32 +0000</pubDate>
      <link>https://dev.to/lawrence_murithi/etl-vs-elt-which-one-should-you-use-and-why-4and</link>
      <guid>https://dev.to/lawrence_murithi/etl-vs-elt-which-one-should-you-use-and-why-4and</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Imagine you are running a massive kitchen. Every day, trucks arrive carrying raw ingredients from different farms. Some boxes have dirty potatoes, some tomatoes are bruised, and the meat needs to be separated from the bone.&lt;br&gt;
Can you just throw all of this straight onto a customer’s plate? Definitely not. You have to wash, chop, season, and cook the ingredients first.&lt;/p&gt;

&lt;p&gt;In the business world, data works the same way. Every day, companies generate tons of raw data from apps, websites, payment gateways, customer service logs etc. This raw data is usually dirty and messy. It has errors, missing fields and mismatched formats. Before it can be used for reporting or decision-making, it needs to be moved, processed and organized. This process of moving and cleaning data is called &lt;strong&gt;data integration&lt;/strong&gt;. &lt;br&gt;
The two main approaches are used in data integration are ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform). Although both methods aim to prepare data for analysis, they follow different steps and are suited for different situations. &lt;br&gt;
If you are just stepping into data engineering, software engineering or backend development, ETL and ELT are common terms you will encounter. &lt;br&gt;
This article explains both approaches in detail, compares them, and helps you understand when to use each one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is ETL?
&lt;/h2&gt;

&lt;p&gt;ETL stands for Extract, Transform, Load. It is the traditional method used to move and prepare data.&lt;br&gt;
The key idea in ETL is that data is cleaned and transformed before it is stored in the final system. This means that by the time the data reaches the data warehouse, it is already structured, organized, and ready for use.&lt;br&gt;
This approach was developed at a time when computing resources were limited, and companies had to be very careful about what data they stored.&lt;/p&gt;

&lt;h3&gt;
  
  
  Steps in ETL
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Extract&lt;/strong&gt;&lt;br&gt;
This step involves collecting raw data from different sources such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Databases&lt;/li&gt;
&lt;li&gt;APIs&lt;/li&gt;
&lt;li&gt;Excel files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In real-world scenarios, data rarely comes from a single source. A company may have customer data in one system, sales data in another, and marketing data in a third system. This extraction step pulls all this data together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Transform&lt;/strong&gt;&lt;br&gt;
In this stage, data is processed in a separate system before being stored. This transformation step ensures that all data is consistent, accurate, and usable.&lt;/p&gt;

&lt;p&gt;Common transformations include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standardizing data formats&lt;/li&gt;
&lt;li&gt;Handling missing values&lt;/li&gt;
&lt;li&gt;Removing duplicate records&lt;/li&gt;
&lt;li&gt;Fixing errors in data&lt;/li&gt;
&lt;li&gt;Masking sensitive data such as credit card numbers&lt;/li&gt;
&lt;li&gt;Combining data from different sources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This step is where raw data is made meaningful. Without transformation, data would remain inconsistent and difficult to analyze.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Load&lt;/strong&gt;&lt;br&gt;
After transformation, the cleaned data is loaded into a data warehouse or database.&lt;br&gt;
At this stage, the data is ready for carrying out analysis, creating dashboards and reporting. &lt;br&gt;
&lt;strong&gt;Simple Diagram of ETL&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr3blgiapqpgd16xihdio.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr3blgiapqpgd16xihdio.png" alt="ETL" width="341" height="148"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why ETL Was Popular
&lt;/h3&gt;

&lt;p&gt;In the past, data warehouses were physical servers sitting in basements. Storage space was incredibly expensive and computing power was very limited. Companies, therefore, could not afford to store raw, useless data. They had to clean it up and shrink it down before loading it into the warehouse.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is ELT?
&lt;/h2&gt;

&lt;p&gt;ELT stands for Extract, Load, Transform. It is a modern approach made possible by cloud computing. Here data is loaded first and transformed later inside the data lake.&lt;br&gt;
This approach takes advantage of modern systems that can store large amounts of data and process it quickly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Steps in ELT
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Extract&lt;/strong&gt;&lt;br&gt;
Data is collected from different sources just like in ETL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Load&lt;/strong&gt;&lt;br&gt;
This is a major shift from ETL. Instead of first cleaning the data, you load the raw data directly into your target data lake without any changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Transform&lt;/strong&gt;&lt;br&gt;
The transformation happens inside the data lake. This means analysts can use the warehouse's own computing power to clean, format, and organize the data..&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simple Diagram of ELT&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdv7cxynv3ek5ljfic0f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdv7cxynv3ek5ljfic0f.png" alt="ELT" width="795" height="197"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why ELT Became Popular
&lt;/h3&gt;

&lt;p&gt;The emergence of modern cloud data warehouses such as Snowflake, Google BigQuery, and Amazon Redshift changed the game. Today, storing data in the cloud is incredibly cheap. Furthermore, these cloud warehouses have massive, scalable computing power.&lt;br&gt;
Instead of buying a separate, expensive server just to transform data (like in ETL), companies no longer need to clean data before storing it. They can store everything and process it later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Differences Between ETL and ELT
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Order of Steps&lt;/strong&gt;&lt;br&gt;
in ETL, transformation happens before loading while in ELT transformation happens after loading.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Where the Transformation Happens&lt;/strong&gt;&lt;br&gt;
In ETL, transformation happens in a separate server outside the warehouse while in ELT, the transformation happens right inside the destination data warehouse.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Speed of Loading&lt;/strong&gt;&lt;br&gt;
ELT is usually much faster at the loading stage since there is no cleaning of the data. ETL takes longer because the data has to wait in line to be processed before it can be loaded into the warehouse.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Maintenance and Flexibility&lt;/strong&gt;&lt;br&gt;
ETL is less flexible and changes require rebuilding pipelines. If a mistake is made in an ETL pipeline, or if you want to format the data differently, you have to go back to the source, re-extract the data, and run it through the whole pipeline again.&lt;br&gt;
With ELT, the raw data is already sitting in your warehouse. Any mistake during transformation, you simply write a new SQL query and transform the raw data afresh. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. The Skills Required&lt;/strong&gt;&lt;br&gt;
ETL often requires specialized tools and programming such as software engineers who know Java, Python or drag-and-drop tools. ELT uses SQL and since the data is transformed inside a database, it is accessible to analysts.&lt;br&gt;
&lt;strong&gt;NB:&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ETL focuses on control, structure, and quality before storage&lt;/li&gt;
&lt;li&gt;ELT focuses on speed, flexibility, and scalability after storage.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Advantages and Disadvantages
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ETL
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Advantages&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Security and Compliance&lt;/strong&gt; - If you are dealing with highly sensitive data (like medical records or credit cards), ETL allows you to strip out/mask the sensitive parts before storage in the main warehouse. &lt;br&gt;
&lt;strong&gt;Reduced and cheaper Storage&lt;/strong&gt; - Because you are only loading refined data, you take up much less storage space in your destination database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disadvantages&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Rigid&lt;/strong&gt; - Setting up an ETL pipeline takes a lot of time. If a source system needs to make a change, the whole ETL pipeline might break and need to be rewritten.&lt;br&gt;
&lt;strong&gt;Bottlenecks&lt;/strong&gt; - If you have massive amounts of data, the processing server can easily get overwhelmed and slow down the whole operation.&lt;/p&gt;

&lt;h3&gt;
  
  
  ELT
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Advantages&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Agility&lt;/strong&gt; - Since raw data is loaded quickly and directly into the warehouse, analysts do not have to wait for engineers to build complex pipelines to access the raw data.&lt;br&gt;
&lt;strong&gt;Future-Proof&lt;/strong&gt; - Because you keep a copy of the exact raw data, reprocessing of raw data is always possible. You can also go back and answer new business questions that you hadn't thought of previously.&lt;br&gt;
&lt;strong&gt;Scalability&lt;/strong&gt; - Cloud warehouses are designed to scale automatically thus are able to support large datasets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disadvantages&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Security Risks&lt;/strong&gt; - Since you are loading raw, unfiltered data into your warehouse, you have to be careful about who has access to the warehouse if that data contains sensitive information such as passwords, personal addresses or financial details.&lt;br&gt;
&lt;strong&gt;Higher computing costs&lt;/strong&gt; - While cloud storage is cheap, cloud computing can get expensive. If you have bad SQL code running inefficient transformations inside your warehouse every hour, your monthly cloud bill will skyrocket.&lt;/p&gt;

&lt;h3&gt;
  
  
  ETL Tools
&lt;/h3&gt;

&lt;p&gt;These tools are designed for structured, enterprise-level data pipelines.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Informatica&lt;/li&gt;
&lt;li&gt;IBM DataStage&lt;/li&gt;
&lt;li&gt;Talend&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ELT Tools
&lt;/h3&gt;

&lt;p&gt;Modern ELT uses different tools for each step:&lt;br&gt;
These tools allow analysts to work directly with data using SQL.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fivetran / Airbyte → Extract and Load&lt;/li&gt;
&lt;li&gt;dbt (Data Build Tool) → Transform&lt;/li&gt;
&lt;li&gt;Cloud Warehouses → Snowflake, BigQuery, Redshift&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Real-World Use Cases
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Banking System (ETL)&lt;/strong&gt;&lt;br&gt;
A bank handles sensitive data from mobile app banking, ATMs and physical branch locations. This data contains raw account numbers, account balances, passwords and PIN, personal details and financial transactions thus must be secured before storage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;E-commerce Startup (ELT)&lt;/strong&gt;&lt;br&gt;
An online store that wants to track user behavior will generate large amounts of data daily just from people clicking around their website, viewing products, adding items to carts etc. The marketing team thus has to constantly change what they want to measure. One week they may want to track abandoned carts while the following week they may want to track how long people look at a specific product. The business has to frequently change what it wants to analyze.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which One Should You Use and Why?
&lt;/h2&gt;

&lt;p&gt;If you are starting a new project and trying to choose between ETL and ELT, here is a practical guide to help you decide.&lt;br&gt;
&lt;strong&gt;Choose ETL if&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;- You are bound by strict privacy laws&lt;/strong&gt; - If you work with sensitive data (healthcare, banking), the ability to scrub data before it lands in a database should be key.&lt;br&gt;
&lt;strong&gt;- Your system uses on-premise databases&lt;/strong&gt; - If your company still keeps its servers in a physical server room, your database may not have high processing power required to do transformations internally hence you will need a separate ETL server.&lt;br&gt;
&lt;strong&gt;- Your data source is unstructured&lt;/strong&gt; - If you are extracting data from highly complex, old mainframes that output weird file types, standard ELT tools might not know how to read them. You will need a custom ETL script to decode and format the data before it can be saved.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose ELT if&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;- You are using a cloud data warehouse&lt;/strong&gt; - If you have Snowflake, BigQuery, or Redshift, ELT is most convinient since it takes advantage of what you are already paying for.&lt;br&gt;
&lt;strong&gt;- You work with large volumes of diverse data&lt;/strong&gt; - If you are tracking millions of tiny events (like website clicks, product views or IoT sensor readings), pushing it directly to the cloud is the only way to keep up with the volume.&lt;br&gt;
&lt;strong&gt;- You need flexibility in analysis and fast data processing&lt;/strong&gt; - ELT allows data engineers to focus purely on moving data from point A to point B, while empowering data analysts to handle the business logic and formatting using SQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The debate between ETL and ELT is less about which one is better and more about matching your business needs, data size, and system architecture. Understanding both approaches helps you design better data pipelines and make smarter decisions when working with data.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>data</category>
      <category>database</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Advanced SQL Techniques for Data Analytics Every Data Analyst Should Know</title>
      <dc:creator>Lawrence Murithi</dc:creator>
      <pubDate>Thu, 09 Apr 2026 13:21:19 +0000</pubDate>
      <link>https://dev.to/lawrence_murithi/advanced-sql-techniques-for-data-analytics-every-data-analyst-should-know-53c8</link>
      <guid>https://dev.to/lawrence_murithi/advanced-sql-techniques-for-data-analytics-every-data-analyst-should-know-53c8</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In today’s data-driven environment, organizations rely heavily on data to make decisions. Businesses collect large amounts of information from different sources such as sales systems, customer platforms, and operational databases. However, raw data alone is not useful unless it can be analyzed and transformed into meaningful insights.&lt;/p&gt;

&lt;p&gt;SQL (Structured Query Language) plays a central role in this process. It allows analysts to retrieve, clean, and analyze data stored in relational databases. While basic SQL skills are important, advanced SQL techniques are what truly enable analysts to solve complex business problems.&lt;/p&gt;

&lt;p&gt;This article explains advanced SQL concepts in simple terms and shows how they are applied in real-world data analytics scenarios. The goal is to help you understand not just how to write SQL queries, but how to use them effectively in practical situations.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Role of SQL in Data Analytics
&lt;/h3&gt;

&lt;p&gt;SQL is the foundation of data analytics. Most business data is stored in databases, and SQL is the language used to interact with that data.&lt;/p&gt;

&lt;p&gt;Data analysts use SQL to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extract data from databases&lt;/li&gt;
&lt;li&gt;Filter and clean datasets&lt;/li&gt;
&lt;li&gt;Combine data from multiple tables&lt;/li&gt;
&lt;li&gt;Perform calculations and aggregations&lt;/li&gt;
&lt;li&gt;Prepare data for reporting tools like Power BI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SQL is often the first step before using any visualization tools. If the data is not properly prepared using SQL, the final reports may be inaccurate or misleading.&lt;/p&gt;

&lt;h3&gt;
  
  
  Working with Complex Queries
&lt;/h3&gt;

&lt;p&gt;As data becomes more complex, simple queries are not enough to handle it. Advanced SQL, therefore, introduces techniques that help break down complex problems into manageable steps. &lt;br&gt;
In real-world data analysis, datasets are often large and contain multiple tables with different relationships. Moreover, analysts are expected to answer questions that involve comparisons, calculations and multiple layers of logic. These techniques therefore allow analysts to solve the problems step by step instead of trying to do everything in one single query.&lt;br&gt;
Complex query techniques thus help analysts organize their queries in a way that is easier to understand, maintain, and scale. &lt;/p&gt;

&lt;p&gt;They are useful when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Comparing values against aggregated results&lt;/li&gt;
&lt;li&gt;Reusing part of a query&lt;/li&gt;
&lt;li&gt;Working with multi-step transformations&lt;/li&gt;
&lt;li&gt;Simplifying long and confusing SQL statements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some of the advanced SQL techniques include:&lt;/p&gt;
&lt;h4&gt;
  
  
  Subqueries
&lt;/h4&gt;

&lt;p&gt;A subquery is a query inside another query. Subqueries are useful when you need to perform a calculation first and then use that result in another query. They allow you to embed logic directly inside your main query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Explanation:&lt;br&gt;
_- The inner query calculates the average salary&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The outer query returns employees earning above average_&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Subqueries can be used in different parts of a query:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the WHERE clause&lt;/li&gt;
&lt;li&gt;In the SELECT clause&lt;/li&gt;
&lt;li&gt;In the FROM clause (called derived tables)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Real-World Case Scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identify high-performing employees based on salary or performance metrics. &lt;/li&gt;
&lt;li&gt;Finding customers who spend more than the average customer&lt;/li&gt;
&lt;li&gt;Identifying products priced above the average price&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;NB:&lt;/strong&gt; While subqueries are powerful, they can become slow if used incorrectly, especially with large datasets.&lt;/p&gt;
&lt;h4&gt;
  
  
  Common Table Expressions (CTEs)
&lt;/h4&gt;

&lt;p&gt;A CTE is a temporary result in an SQL query that helps improves readability and organization(temporary table that exists only while the query is running).&lt;/p&gt;

&lt;p&gt;CTEs allow you to define a query once and then use it in the main query. This makes complex queries easier to read and understand, especially when working with multiple steps.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;sales_summary&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_sales&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;
    &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales_summary&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;total_sales&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Types of CTEs:&lt;br&gt;
&lt;strong&gt;- Recursive CTE&lt;/strong&gt;: A specialized CTE that references itself, which is essential for querying hierarchical data like organizational charts or family trees.&lt;br&gt;
&lt;strong&gt;- Non-Recursive CTE&lt;/strong&gt;: The most common type, used to simplify standard queries by creating manageable logical steps.&lt;br&gt;
Benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Makes queries clean and easier to read&lt;/li&gt;
&lt;li&gt;Breaks complex logic into steps, thus easier to debug and modify&lt;/li&gt;
&lt;li&gt;Improves maintainability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;NB:&lt;/strong&gt; You can also have multiple CTEs in one query, which is useful for complex data transformations.&lt;/p&gt;

&lt;p&gt;In business reporting, analysts often build layered queries. CTEs allow them to structure their logic clearly when working with large datasets.&lt;br&gt;
Step 1: Calculate total sales per product&lt;br&gt;
Step 2: Filter high-performing products&lt;br&gt;
Step 3: Join with other tables for reporting&lt;/p&gt;
&lt;h3&gt;
  
  
  Advanced Joins
&lt;/h3&gt;

&lt;p&gt;Joins are used to combine data from multiple tables. In advanced SQL, joins become more powerful when dealing with complex relationships.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a retail company:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customers table stores customer details&lt;/li&gt;
&lt;li&gt;Orders table stores transactions&lt;/li&gt;
&lt;li&gt;Products table stores product information&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using joins, analysts can create a full view of customer purchases.&lt;/p&gt;

&lt;p&gt;Poor joins can lead to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Duplicate data&lt;/li&gt;
&lt;li&gt;Incorrect totals&lt;/li&gt;
&lt;li&gt;Misleading reports&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Window Functions
&lt;/h3&gt;

&lt;p&gt;Window functions allow us to perform advanced calculations across a group of related rows while keeping the original data. They are useful for ranking, running totals, moving averages, and analytical reporting.&lt;br&gt;
Window functions often remove the need for complex self-joins and provide an analytical layer within SQL.&lt;br&gt;
Window functions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keep every row&lt;/li&gt;
&lt;li&gt;Add calculated values to each row
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;column_1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
           &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;column&lt;/span&gt;
           &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;column&lt;/span&gt;
       &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;output_column&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Window functions are widely used in business intelligence and reporting for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rankings within a group&lt;/li&gt;
&lt;li&gt;Calculating running totals&lt;/li&gt;
&lt;li&gt;Compare rows (current vs previous)&lt;/li&gt;
&lt;li&gt;Analyzing trends over time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Companies use ranking to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identify top performers&lt;/li&gt;
&lt;li&gt;Allocate bonuses&lt;/li&gt;
&lt;li&gt;Compare employee performance
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;##&lt;/span&gt; &lt;span class="n"&gt;Ranking&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;RANK&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Businesses use running totals to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Track revenue growth&lt;/li&gt;
&lt;li&gt;Monitor daily or monthly performance&lt;/li&gt;
&lt;li&gt;Forecast future trends
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;##&lt;/span&gt; &lt;span class="n"&gt;Running&lt;/span&gt; &lt;span class="n"&gt;totals&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;running_total&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Aggregations and Grouping
&lt;/h3&gt;

&lt;p&gt;Aggregation helps summarize large datasets. Raw data is often too detailed to understand directly. Aggregation thus helps turn large datasets into meaningful summaries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;total_sales&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Aggregation allows analysts to answer questions such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total sales by region&lt;/li&gt;
&lt;li&gt;Sales by product category&lt;/li&gt;
&lt;li&gt;Monthly revenue trends&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Aggregation is often used together with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filtering (HAVING)&lt;/li&gt;
&lt;li&gt;Sorting (ORDER BY)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Data Cleaning and Transformation
&lt;/h3&gt;

&lt;p&gt;Data cleaning is one of the most important steps in analytics. Since raw data is usually dirty and messy, SQL helps clean and prepare it before analysis. &lt;br&gt;
Raw data may contain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Duplicates&lt;/li&gt;
&lt;li&gt;Missing values&lt;/li&gt;
&lt;li&gt;Incorrect formats&lt;/li&gt;
&lt;li&gt;Inconsistent entries&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Removing Duplicates
&lt;/h4&gt;

&lt;p&gt;Removes repeated values and ensures each entry appears only once.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Handling Missing Values
&lt;/h4&gt;

&lt;p&gt;Replaces NULL values with a default value thus preventing errors in reports&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Not Available'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Data Transformation
&lt;/h4&gt;

&lt;p&gt;Creates a new calculated column&lt;br&gt;
Data transformation also includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Changing data types&lt;/li&gt;
&lt;li&gt;Formatting dates&lt;/li&gt;
&lt;li&gt;Standardizing values
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_sales&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Using SQL for Real-World Business Problems
&lt;/h3&gt;

&lt;p&gt;Advanced SQL is not just about writing queries but solving real problems.&lt;br&gt;
In organizations, SQL is used daily to answer business questions and support decisions.&lt;/p&gt;
&lt;h4&gt;
  
  
  Customer Segmentation
&lt;/h4&gt;

&lt;p&gt;Businesses use customer segmentation to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Target high-value customers&lt;/li&gt;
&lt;li&gt;Design marketing strategies&lt;/li&gt;
&lt;li&gt;Improve customer retention
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;##&lt;/span&gt; &lt;span class="k"&gt;Grouping&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="n"&gt;based&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;spending&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="k"&gt;CASE&lt;/span&gt; 
    &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;total_spent&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'High Value'&lt;/span&gt;
    &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;total_spent&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Medium Value'&lt;/span&gt;
    &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="s1"&gt;'Low Value'&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;segment&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customer_sales&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Sales Performance Analysis
&lt;/h4&gt;

&lt;p&gt;Total sales are calculated per product and sorted products by performance to identify best-selling products.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_sales&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;total_sales&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Segmentation helps organizations to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understand performance&lt;/li&gt;
&lt;li&gt;Identify opportunities&lt;/li&gt;
&lt;li&gt;Solve operational problems&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Performance Optimization
&lt;/h3&gt;

&lt;p&gt;SQL queries must be clean, easy to understand and efficient.&lt;br&gt;
In large databases, poor queries can slow down systems and delay reports.&lt;/p&gt;

&lt;p&gt;Best Practices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use indexes on important columns to speed up data retrieval&lt;/li&gt;
&lt;li&gt;Avoid selecting unnecessary columns&lt;/li&gt;
&lt;li&gt;Filter data early to reduces data size&lt;/li&gt;
&lt;li&gt;Use CTEs instead of repeated subqueries&lt;/li&gt;
&lt;li&gt;Avoid unnecessary joins&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Advanced SQL is a critical skill for data analysts. It goes beyond basic queries and allows analysts to work with complex datasets, perform advanced calculations and solve real-world business problems.&lt;/p&gt;

&lt;p&gt;In this article, we explored key advanced SQL techniques such as subqueries, CTEs, joins, window functions, aggregations, and data transformation and how they are applied in real business scenarios&lt;/p&gt;

&lt;p&gt;In data analytics, SQL is not just a tool but is a core skill that connects raw data to meaningful insights. Mastering advanced SQL allows analysts to move from basic reporting to deeper, more impactful analysis&lt;/p&gt;

</description>
      <category>sql</category>
      <category>luxdev</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Connecting Power BI to SQL Databases: A Practical Guide for Data Analysts</title>
      <dc:creator>Lawrence Murithi</dc:creator>
      <pubDate>Tue, 17 Mar 2026 12:03:47 +0000</pubDate>
      <link>https://dev.to/lawrence_murithi/connecting-power-bi-to-sql-databases-a-practical-guide-for-data-analysts-5745</link>
      <guid>https://dev.to/lawrence_murithi/connecting-power-bi-to-sql-databases-a-practical-guide-for-data-analysts-5745</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In most modern organizations, data is one of the most valuable assets. Companies collect large amounts of information from sales systems, websites, customer platforms, and operational databases. To make sense of this information, businesses use tools that can transform this raw data into clear insights. One of the most widely used tools for this purpose is the Microsoft Power BI platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Power BI
&lt;/h2&gt;

&lt;p&gt;Power BI is a business intelligence and data visualization tool developed by Microsoft. It allows users to connect to different data sources, analyze data, and create interactive dashboards and reports. These reports help organizations monitor performance, understand trends, and support decision-making among other uses.&lt;/p&gt;

&lt;p&gt;Power BI is commonly used by data analysts, business managers, and decision makers because it can present complex data in simple visual forms such as charts, tables, maps, and dashboards.&lt;/p&gt;

&lt;p&gt;Most organizations store their operational and analytical data in SQL databases. SQL databases are designed to store large amounts of structured data in tables. They allow users to query, filter, update, and analyze data efficiently using Structured Query Language (SQL). SQL databases are reliable, secure, and scalable, hence they are widely used in business systems such as sales platforms, customer management systems, and inventory systems.&lt;/p&gt;

&lt;p&gt;Connecting Power BI to a database allows analysts to access this stored data directly. Instead of manually exporting data into spreadsheets, Power BI can retrieve the data automatically, refresh it when the database changes, and build dashboards that always reflect the latest information.&lt;/p&gt;

&lt;p&gt;This article explains how Power BI connects to SQL databases, how to connect to a local PostgreSQL database, how to connect to a cloud database such as Aiven PostgreSQL, and how the loaded data is modeled for analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding the Power BI Interface
&lt;/h3&gt;

&lt;p&gt;Before connecting to a database, it is helpful to understand the Power BI Desktop interface. Power BI Desktop is the main application used for building reports and dashboards.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fccc89cx73itso4ddn7vt.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fccc89cx73itso4ddn7vt.jpg" alt="BI Desktop" width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
The Power BI Desktop interface includes several sections such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ribbon (Top Menu) – Contains commands and tabs such as Get Data, Transform Data, and Publish.&lt;/li&gt;
&lt;li&gt;Report Canvas – The workspace where charts and dashboards are created.&lt;/li&gt;
&lt;li&gt;Visualizations Pane – Used to select and customize charts.&lt;/li&gt;
&lt;li&gt;Fields Pane – Displays the tables and columns loaded into Power BI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can download Power BI desktop app &lt;a href="https://www.microsoft.com/en-us/power-platform/products/power-bi/desktop" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Connecting Power BI to a Local PostgreSQL Database
&lt;/h3&gt;

&lt;p&gt;PostgreSQL is one of the most popular open-source relational databases used in data analytics. Many organizations run databases locally on their own servers. &lt;br&gt;
The steps below explain how to connect Power BI to a local PostgreSQL database.&lt;/p&gt;
&lt;h4&gt;
  
  
  Step 1: Open Power BI Desktop
&lt;/h4&gt;

&lt;p&gt;Start by opening Power BI Desktop on your computer.&lt;br&gt;
When the application opens, a blank report canvas appears. This is where you will build your report after loading the data.&lt;/p&gt;
&lt;h4&gt;
  
  
  Step 2: Get Data
&lt;/h4&gt;

&lt;p&gt;On the Home tab of the ribbon, click Get Data.&lt;br&gt;
This button opens a list of available data sources. Power BI supports many data sources including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Excel&lt;/li&gt;
&lt;li&gt;SQL Server&lt;/li&gt;
&lt;li&gt;PostgreSQL&lt;/li&gt;
&lt;li&gt;Web APIs
The Get Data feature is the starting point for connecting Power BI to any data source. Other data sources are as shown on the image.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flsqplwvo6mi1vf0n859m.jpg" alt="Get Data" width="800" height="450"&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Step 3: Select PostgreSQL Database
&lt;/h4&gt;

&lt;p&gt;From the list of available data connectors, click &lt;strong&gt;more&lt;/strong&gt; to view more options. Scroll down, select &lt;strong&gt;PostgreSQL Database&lt;/strong&gt; and click &lt;strong&gt;Connect&lt;/strong&gt;.&lt;/p&gt;
&lt;h4&gt;
  
  
  Step 4: Enter the Database Connection Details
&lt;/h4&gt;

&lt;p&gt;After selecting PostgreSQL and clicking connect, Power BI opens a connection window that requires connection details for the connection to go through.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdwa7subvjsrzicalek1n.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdwa7subvjsrzicalek1n.jpg" alt="Credentials" width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Server&lt;/strong&gt; – The location of the database server. If the database is on your computer, use localhost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Database&lt;/strong&gt; – The name of the database you want to connect to.&lt;/p&gt;
&lt;h4&gt;
  
  
  Step 5: Provide Login Credentials
&lt;/h4&gt;

&lt;p&gt;After a connection is made, Power BI will ask for authentication details.&lt;br&gt;
You will need to provide:&lt;br&gt;
&lt;strong&gt;Username&lt;br&gt;
Password&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These credentials were set up during installation of the PostgreSQL and allows Power BI to securely access the database.&lt;/p&gt;

&lt;p&gt;Once the credentials are entered, click Connect.&lt;/p&gt;
&lt;h4&gt;
  
  
  Step 6: Select Tables to Import
&lt;/h4&gt;

&lt;p&gt;After connecting successfully, Power BI opens the Navigator Window which displays all available tables in the database.&lt;br&gt;
You can preview the contents of each table before loading them.&lt;br&gt;
There are two options:&lt;br&gt;
&lt;strong&gt;Load&lt;/strong&gt; – Import the data directly.&lt;br&gt;
&lt;strong&gt;Transform Data&lt;/strong&gt; – Clean or modify the data before loading it.&lt;/p&gt;
&lt;h3&gt;
  
  
  Connecting Power BI to a Cloud Database (Aiven PostgreSQL)
&lt;/h3&gt;

&lt;p&gt;Many organizations now store their databases in the cloud. Cloud databases are accessible through the internet and provide benefits such as scalability, backups, and easier management.&lt;br&gt;
Aiven is a cloud platform that provides managed PostgreSQL databases.&lt;br&gt;
Connecting Power BI to a cloud database is not different to connecting to a local database, only that additional security steps are required.&lt;/p&gt;

&lt;p&gt;The steps below explain how to connect Power BI to an Aiven PostgreSQL database.&lt;/p&gt;
&lt;h4&gt;
  
  
  Step 1: Get the Database Connection Details from Aiven
&lt;/h4&gt;

&lt;p&gt;Login to Aiven and inside the dashboard, you will find the connection information for your PostgreSQL service. These details are used by Power BI to locate and connect to the database.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgniq9zxf2r4e78vijr8d.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgniq9zxf2r4e78vijr8d.jpg" alt="Connection Details" width="800" height="289"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Step 2: Download and install the SSL Certificate
&lt;/h4&gt;

&lt;p&gt;Cloud database providers often require SSL encryption to secure the connection.&lt;br&gt;
An SSL certificate ensures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data transferred between Power BI and the database is encrypted&lt;/li&gt;
&lt;li&gt;Unauthorized users cannot intercept the connection&lt;/li&gt;
&lt;li&gt;The database server identity is verified&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In Aiven, download the certificate file(CA Certificate) from the Connection Information section of the service dashboard.&lt;br&gt;
Rename the downloaded file from &lt;strong&gt;ca.pem&lt;/strong&gt; to &lt;strong&gt;ca.crt&lt;/strong&gt; and install the Certificate on your PC.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzahcljjsav3r40i5mh96.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzahcljjsav3r40i5mh96.jpg" alt="SSL Certificate" width="597" height="763"&gt;&lt;/a&gt;&lt;br&gt;
Choose Local Machine as the location of the installation and click next.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx30qrvf497dbag9gsbad.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx30qrvf497dbag9gsbad.jpg" alt="Local Machine" width="793" height="775"&gt;&lt;/a&gt;&lt;br&gt;
Choose &lt;strong&gt;place all certificates in the following store&lt;/strong&gt; and browse certificate store to &lt;strong&gt;Trusted Root Certification Authorities&lt;/strong&gt;. &lt;br&gt;
Click &lt;strong&gt;ok&lt;/strong&gt; and &lt;strong&gt;finish&lt;/strong&gt;.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyg4l0upupjwxaz519po5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyg4l0upupjwxaz519po5.jpg" alt="Store" width="783" height="811"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Step 3: Connect Power BI
&lt;/h4&gt;

&lt;p&gt;Open Power BI desktop as before, click &lt;strong&gt;Get Data&lt;/strong&gt;, click &lt;strong&gt;more&lt;/strong&gt;, scroll down and select &lt;strong&gt;PostgreSQL Database&lt;/strong&gt;.&lt;br&gt;
Copy the &lt;strong&gt;Server Name&lt;/strong&gt; from the service URL(&lt;strong&gt;host_name:port_number&lt;/strong&gt;) on Connection Information and paste on Power BI.&lt;br&gt;
Input the name of your database and click &lt;strong&gt;ok&lt;/strong&gt;.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu18dtkx45ilpu9hofwse.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu18dtkx45ilpu9hofwse.jpg" alt="server &amp;amp; db" width="800" height="425"&gt;&lt;/a&gt;&lt;br&gt;
Copy the username and password from Aiven, input them on the Power BI credentials window that opens and click &lt;strong&gt;connect&lt;/strong&gt;.&lt;br&gt;
Once the connection is successful, a navigator window opens and displays all tables in the database. &lt;br&gt;
Select the tables you want to work with and click on &lt;strong&gt;load/transfrom data&lt;/strong&gt; depending on what you wish to do with the data.&lt;br&gt;
Transform data option is used to clean raw data e.g delete any duplicates and address null values using the most appropriate method.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj384cz97geuk12zvbe1s.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj384cz97geuk12zvbe1s.jpg" alt="Load data" width="800" height="424"&gt;&lt;/a&gt;&lt;br&gt;
Successfully loaded data displays on the data pane as shown on figure below. &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmoqoi3v0etj70udifbs3.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmoqoi3v0etj70udifbs3.jpg" alt="Tables" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Creating Relationships Between Tables
&lt;/h4&gt;

&lt;p&gt;Once loaded, Power BI automatically detects relationships between tables based on matching columns using primary and foreign keys. Relationships not created can be manually created by dragging a column from one table onto the matching column in another table.&lt;br&gt;
These relationships allow Power BI to combine information across multiple tables. &lt;br&gt;
For example:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17mn1jzfmmcng1bk54zg.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17mn1jzfmmcng1bk54zg.jpg" alt="Connection" width="543" height="237"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the Model View, the tables appear as connected boxes. The relationships show how data flows between tables.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ebu7s8y3tr1j5io4sqr.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ebu7s8y3tr1j5io4sqr.jpg" alt="Relatioships" width="800" height="424"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Data Modeling and Why its Important
&lt;/h2&gt;

&lt;p&gt;Data modeling is the process of defining how data is stored, structured, and related within a database. It ensures that Power BI understands how different tables are related.&lt;/p&gt;

&lt;p&gt;Good data modeling allows Power BI to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filter data correctly&lt;/li&gt;
&lt;li&gt;Calculate totals accurately&lt;/li&gt;
&lt;li&gt;Create meaningful visualizations&lt;/li&gt;
&lt;li&gt;Avoid duplicated values&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, when analyzing sales:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The sales table stores transaction records.&lt;/li&gt;
&lt;li&gt;The customers table provides customer information.&lt;/li&gt;
&lt;li&gt;The products table describes the items sold.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  why SQL skills are important for Power BI analysts
&lt;/h3&gt;

&lt;p&gt;Power BI is a powerful tool for building reports and dashboards, but it does not replace the need for strong data handling skills. Most business data is stored in SQL databases, and before that data can be visualized in Power BI, it must first be retrieved, cleaned, and structured properly. &lt;br&gt;
SQL skills give Power BI analysts a real edge by providing an easier way to grab just what you need without pulling everything into Power BI. &lt;br&gt;
Without SQL, analysts may rely too much on raw data, which can lead to slow reports, incorrect results, and inefficient workflows.&lt;br&gt;
SQL allows analysts to:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Retrieve Data&lt;/strong&gt;&lt;br&gt;
Analysts can write queries to select specific rows and columns relevant to their analysis from a database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- selecting only products name and price columns from products table&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why this matters in Power BI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduces the amount of data imported&lt;/li&gt;
&lt;li&gt;Improves performance&lt;/li&gt;
&lt;li&gt;Makes the model easier to manage&lt;/li&gt;
&lt;li&gt;Avoids unnecessary columns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Filter Data&lt;/strong&gt;&lt;br&gt;
In real-world scenarios, not all data is useful for analysis. Analysts often need to focus on specific time periods, regions, or business conditions. SQL thus makes it easy to filter datasets based on a specific criteria before loading them into Power BI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Retrieving only sales from 2024 onwards.&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why this matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduces dataset size&lt;/li&gt;
&lt;li&gt;Speeds up report loading&lt;/li&gt;
&lt;li&gt;Focuses analysis on relevant data&lt;/li&gt;
&lt;li&gt;Avoids unnecessary processing inside Power BI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Perform Aggregations&lt;/strong&gt;&lt;br&gt;
Aggregation is the process of summarizing data. In business analysis, analysts often need totals, averages, counts, and other summary metrics. SQL can summarize large datasets quickly by using functions such as GROUP BY, SUM, COUNT, and AVG.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Calculating total sales per product&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_quantity&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why aggregation in SQL is important:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduces data volume before loading&lt;/li&gt;
&lt;li&gt;Improves Power BI performance&lt;/li&gt;
&lt;li&gt;Simplifies data models&lt;/li&gt;
&lt;li&gt;Avoids heavy calculations in DAX&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Preparing data for Analysis
&lt;/h3&gt;

&lt;p&gt;Raw data must be cleaned or transformed before it is ready for visualization. &lt;br&gt;
SQL can be used to:&lt;/p&gt;
&lt;h4&gt;
  
  
  Joining Tables and Combining Data
&lt;/h4&gt;

&lt;p&gt;Business data is usually stored in multiple tables.&lt;br&gt;
SQL allows analysts to combine these tables using joins. Joined datasets in SQL can simplify the data model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sales_amount&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why this is important:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Combines related data into one dataset&lt;/li&gt;
&lt;li&gt;Reduces the need for complex relationships in Power BI&lt;/li&gt;
&lt;li&gt;Makes analysis easier&lt;/li&gt;
&lt;li&gt;Prevents duplication errors&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Data Cleaning and Preparation
&lt;/h4&gt;

&lt;p&gt;Raw data is often messy. It may contain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Missing values&lt;/li&gt;
&lt;li&gt;Duplicate records&lt;/li&gt;
&lt;li&gt;Incorrect formats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SQL helps clean and prepare the data before it is loaded into Power BI hence leading to better insights.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Eliminating duplicates&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;--Handling missing values&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Not Provided'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;phone&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why data cleaning matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensures data accuracy&lt;/li&gt;
&lt;li&gt;Improves report reliability&lt;/li&gt;
&lt;li&gt;Reduces cleaning work in Power BI&lt;/li&gt;
&lt;li&gt;Prevents errors in calculations&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Creating Calculated Fields
&lt;/h4&gt;

&lt;p&gt;SQL allows analysts to create new columns based on existing data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Calculate total sales&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_sales&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why calculated fields are useful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prepares key metrics before loading&lt;/li&gt;
&lt;li&gt;Reduces need for DAX calculations&lt;/li&gt;
&lt;li&gt;Keeps logic centralized in the database&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Supporting Advanced Analysis
&lt;/h4&gt;

&lt;p&gt;SQL also supports more advanced operations such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Window functions (running totals, ranking)&lt;/li&gt;
&lt;li&gt;Subqueries&lt;/li&gt;
&lt;li&gt;Common Table Expressions (CTEs)&lt;/li&gt;
&lt;li&gt;Data transformations
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sales_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;running_total&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Power BI is a powerful tool that helps organizations transform raw data into meaningful insights. By connecting directly to SQL databases, Power BI allows analysts to access structured data stored in business systems and convert it into interactive dashboards and reports.&lt;br&gt;
SQL prepares the foundation, and Power BI builds the story on top of it. Strong SQL skills allow analysts to work more efficiently, produce accurate reports, and deliver better insights for decision-making.&lt;br&gt;
When SQL and Power BI are used together, they provide a powerful combination for modern data analysis and business intelligence.&lt;/p&gt;

</description>
      <category>luxdev</category>
      <category>dataengineering</category>
      <category>sql</category>
      <category>database</category>
    </item>
    <item>
      <title>Mastering SQL Joins and Window Functions</title>
      <dc:creator>Lawrence Murithi</dc:creator>
      <pubDate>Tue, 03 Mar 2026 10:36:56 +0000</pubDate>
      <link>https://dev.to/lawrence_murithi/mastering-sql-joins-and-window-functions-1f30</link>
      <guid>https://dev.to/lawrence_murithi/mastering-sql-joins-and-window-functions-1f30</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;SQL (Structured Query Language) is a powerful tool used to search, manage, and analyze large amounts of data. It is widely used by data enthusiasts, software developers and even marketing professionals.&lt;br&gt;
In real-world databases, data is not stored in one large table. It is divided into multiple related tables. This makes storage efficient and avoids duplication. To work effectively with such data, you must understand SQL joins and window functions. These two features allow you to combine data correctly and perform advanced analysis without losing important details.&lt;/p&gt;
&lt;h2&gt;
  
  
  SQL Joins
&lt;/h2&gt;

&lt;p&gt;A JOIN in SQL is used to combine rows from two or more tables based on a related column. This relationship is usually created using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A primary key (unique identifier in one table)&lt;/li&gt;
&lt;li&gt;A foreign key (reference to that key in another table)
Joins are essential when working with relational databases because data is often split across multiple tables.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Importance of Joins
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Combining related data from multiple tables&lt;/li&gt;
&lt;li&gt;Maintaining relational integrity&lt;/li&gt;
&lt;li&gt;Supporting one-to-many and many-to-many relationships&lt;/li&gt;
&lt;li&gt;Building meaningful reports and analytics&lt;/li&gt;
&lt;li&gt;Preventing unnecessary duplication of data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The type of join you use directly affects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The number of rows returned&lt;/li&gt;
&lt;li&gt;Whether NULL values appear&lt;/li&gt;
&lt;li&gt;How business logic is interpreted
NB: Choosing the wrong join can lead to missing data, duplicated records, or incorrect analysis.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Types of SQL Joins
&lt;/h3&gt;
&lt;h4&gt;
  
  
  INNER JOIN
&lt;/h4&gt;

&lt;p&gt;The INNER JOIN returns only the rows that have matching values in both tables.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Combines records based on a related column&lt;/li&gt;
&lt;li&gt;Returns only matching rows&lt;/li&gt;
&lt;li&gt;Excludes non-matching rows
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fijwwf41l1fwvnmt9kowo.jpg" alt="Inner Join" width="800" height="105"&gt;
INNER JOIN is used when:&lt;/li&gt;
&lt;li&gt;You only need matched data&lt;/li&gt;
&lt;li&gt;You want to exclude incomplete relationships&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  LEFT (OUTER) JOIN
&lt;/h4&gt;

&lt;p&gt;The LEFT (OUTER) JOIN returns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All rows from the left table&lt;/li&gt;
&lt;li&gt;Matching rows from the right table&lt;/li&gt;
&lt;li&gt;NULL values if no match exists
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fftfmhtuhwawp91s4otip.jpg" alt="Left Join" width="800" height="85"&gt;
LEFT JOIN is used when:&lt;/li&gt;
&lt;li&gt;You want all records from the main table&lt;/li&gt;
&lt;li&gt;You want to identify missing matches&lt;/li&gt;
&lt;li&gt;You need complete reporting from one side&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  RIGHT (OUTER) JOIN
&lt;/h4&gt;

&lt;p&gt;The RIGHT (OUTER) JOIN returns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All rows from the right table&lt;/li&gt;
&lt;li&gt;Matching rows from the left table&lt;/li&gt;
&lt;li&gt;NULL where no match exists on the left
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjn4qq9vkqqibun7ys4i9.jpg" alt="Right Join" width="800" height="83"&gt;
NB: RIGHT JOIN works like LEFT JOIN but from the opposite direction.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  FULL (OUTER) JOIN
&lt;/h4&gt;

&lt;p&gt;The FULL JOIN returns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All rows from both tables&lt;/li&gt;
&lt;li&gt;Matching records where possible&lt;/li&gt;
&lt;li&gt;NULL values where no match exists
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk7lesr3ojfu51sn1gj8g.jpg" alt="Full Join" width="800" height="83"&gt;
The FULL JOIN is used when:&lt;/li&gt;
&lt;li&gt;Comparing two datasets&lt;/li&gt;
&lt;li&gt;Identifying differences between systems&lt;/li&gt;
&lt;li&gt;Performing reconciliation tasks&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  CROSS JOIN
&lt;/h4&gt;

&lt;p&gt;A CROSS JOIN returns all possible combinations of rows thus can create very large results.&lt;br&gt;
If Table A has 5 rows and Table B has 10 rows:&lt;br&gt;
Result = 50 rows.&lt;br&gt;
It does not use a matching condition.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbk5gfm1fjjiemhjcp68.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbk5gfm1fjjiemhjcp68.jpg" alt="Cross Join" width="800" height="112"&gt;&lt;/a&gt;&lt;br&gt;
A CROSS JOIN is used to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generate combinations&lt;/li&gt;
&lt;li&gt;Create calendar expansions&lt;/li&gt;
&lt;li&gt;Test scenarios&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  SELF JOIN
&lt;/h4&gt;

&lt;p&gt;A self join joins a table to itself. Aliases are used to refer to the same tale&lt;br&gt;
Example:&lt;br&gt;
Employee table:&lt;br&gt;
| EmployeeID | ManagerID |&lt;br&gt;
To show each employee and their manager name, the table is joined to itself.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxkfb28zynixzjr1xj2wm.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxkfb28zynixzjr1xj2wm.jpg" alt="Self Join" width="800" height="103"&gt;&lt;/a&gt;&lt;br&gt;
Self joins are useful for hierarchical data.&lt;/p&gt;
&lt;h4&gt;
  
  
  NATURAL JOIN
&lt;/h4&gt;

&lt;p&gt;A natural join Joins all tables using columns that have the same name.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq7932nmokz5hfoco4y1k.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq7932nmokz5hfoco4y1k.jpg" alt="Natural Join" width="800" height="47"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Performance Considerations for Joins.
&lt;/h3&gt;

&lt;p&gt;Joins can affect performance, especially in large databases.&lt;br&gt;
Best practices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Index join columns (primary and foreign keys)&lt;/li&gt;
&lt;li&gt;Avoid unnecessary joins&lt;/li&gt;
&lt;li&gt;Filter data early using WHERE&lt;/li&gt;
&lt;li&gt;Understand execution plans&lt;/li&gt;
&lt;li&gt;Be careful with joins that multiply rows unintentionally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Improper joins can cause:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Duplicate results&lt;/li&gt;
&lt;li&gt;Data inflation&lt;/li&gt;
&lt;li&gt;Slow query execution&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Window Functions
&lt;/h2&gt;

&lt;p&gt;Window functions allow us to perform advanced calculations across a group of related rows while keeping the original data. They are useful for ranking, running totals, moving averages, and analytical reporting.&lt;br&gt;
Window functions often remove the need for complex self-joins and provide an analytical layer within SQL.&lt;br&gt;
Window functions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Keep every row&lt;/li&gt;
&lt;li&gt;Add calculated values to each row&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Structure of a window function:
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT column_1,
       function() OVER (
           PARTITION BY column
           ORDER BY column
       ) AS output_column
FROM table_name;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  1.  OVER()
&lt;/h4&gt;

&lt;p&gt;The OVER() clause defines how the window function operates and controls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Partitioning&lt;/li&gt;
&lt;li&gt;Ordering&lt;/li&gt;
&lt;li&gt;Optional frame boundaries&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  2.  PARTITION BY()
&lt;/h4&gt;

&lt;p&gt;The PARTITION BY divides rows into logical groups. If omitted, the entire dataset is treated as one group.&lt;/p&gt;
&lt;h4&gt;
  
  
  3.  ORDER BY()
&lt;/h4&gt;

&lt;p&gt;ORDER BY defines the sequence of rows inside each partition.&lt;/p&gt;

&lt;p&gt;It is essential for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ranking&lt;/li&gt;
&lt;li&gt;Running totals&lt;/li&gt;
&lt;li&gt;Time-based comparisons&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If ORDER BY is omitted, row processing order is undefined.&lt;/p&gt;
&lt;h4&gt;
  
  
  4. Frame Clause (ROWS vs RANGE)
&lt;/h4&gt;

&lt;p&gt;Used to define a range of rows(boundary) and commonly used for moving averages and cumulative calculations.&lt;br&gt;
In the ROWS subclause, the frame is defined by beginning and ending row positions while in the RANGE subclause, the frame is defined by a value range.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ROWS BETWEEN lower_bound AND upper_bound
RANGE BETWEEN lower_bound AND upper_bound
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Types of SQL Window Functions
&lt;/h3&gt;

&lt;p&gt;Window functions fall into three main categories.&lt;/p&gt;

&lt;h4&gt;
  
  
  1. Aggregate Window Functions
&lt;/h4&gt;

&lt;p&gt;These include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AVG() - Calculates moving averages.&lt;/li&gt;
&lt;li&gt;SUM() - Creates running totals.&lt;/li&gt;
&lt;li&gt;COUNT() - calculates the number of items found in a group.&lt;/li&gt;
&lt;li&gt;MIN() - returns the minimum value.&lt;/li&gt;
&lt;li&gt;MAX() - returns the maximum value.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some use cases of Aggregate window functions include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Department totals&lt;/li&gt;
&lt;li&gt;Running totals&lt;/li&gt;
&lt;li&gt;Moving averages&lt;/li&gt;
&lt;li&gt;Cumulative metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. Ranking Window Functions
&lt;/h4&gt;

&lt;p&gt;They are used to assign position or rank.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ROW_NUMBER() - Assigns a unique number to each row. &lt;/li&gt;
&lt;li&gt;RANK() - Assigns rank with gaps when ties exist.&lt;/li&gt;
&lt;li&gt;DENSE_RANK() - Similar to RANK but does not skip numbers and better for ranking reports where gaps are not desired.&lt;/li&gt;
&lt;li&gt;PERCENT_RANK() - calculates the relative rank of a row within a group of rows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some use cases of Ranking window functions include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Top N per group&lt;/li&gt;
&lt;li&gt;Performance ranking&lt;/li&gt;
&lt;li&gt;Leaderboards&lt;/li&gt;
&lt;li&gt;Percentile analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. Offset (Value) Window Functions
&lt;/h4&gt;

&lt;p&gt;They are used to access data from other rows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LAG() - shows previous row value and used in time-based analysis.&lt;/li&gt;
&lt;li&gt;LEAD() - shows next row value and used in time-based analysis.&lt;/li&gt;
&lt;li&gt;FIRST_VALUE() - returns the first value in an ordered set of values within a partition.&lt;/li&gt;
&lt;li&gt;LAST_VALUE() - returns the last value in an ordered set of values within a partition.&lt;/li&gt;
&lt;li&gt;NTH_VALUE() - Divides rows into equal groups and useful in performance analysis and segmentation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some use cases of Offset window functions are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Month-over-month growth&lt;/li&gt;
&lt;li&gt;Time-series comparison&lt;/li&gt;
&lt;li&gt;Trend detection&lt;/li&gt;
&lt;li&gt;Sequential analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;SQL joins and window functions are core tools for designing efficient and powerful queries.&lt;br&gt;
Joins allow you to combine data from multiple tables using defined relationships while Window functions provide an advanced analytical layer in SQL.&lt;/p&gt;

</description>
      <category>luxdev</category>
      <category>sql</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>The Power of BI; Translating Messy Data, DAX, and Dashboards into Action</title>
      <dc:creator>Lawrence Murithi</dc:creator>
      <pubDate>Mon, 09 Feb 2026 17:59:06 +0000</pubDate>
      <link>https://dev.to/lawrence_murithi/the-power-of-bi-translating-messy-data-dax-and-dashboards-into-action-3kmj</link>
      <guid>https://dev.to/lawrence_murithi/the-power-of-bi-translating-messy-data-dax-and-dashboards-into-action-3kmj</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In the real world, data rarely comes in a clean and perfect format. Most of the time, it comes from multiple systems, created by different people, and maintained with different rules. It may have missing values, inconsistent naming, or outdated records. Databases may store the same information in different ways. This is where analysts come in. Their work is not just about building reports, but about turning raw, messy data into clear insights that drive real business actions. Power BI is one of the main tools that helps them do this effectively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the Business Before Touching the Data
&lt;/h2&gt;

&lt;p&gt;Good analysts first try to understand what problem they want to solve before embarking on the data analysis journey. They ask questions like: What decision needs to be made? Who will use the report? What actions should the dashboard support? Without this context, even the best-looking dashboard can fail.&lt;br&gt;
Understanding who your target is and what their needs are could help analysts decide what data to use, what calculations matter, and what level of detail is required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making Sense of Messy Data
&lt;/h2&gt;

&lt;p&gt;Most data comes in a rough and messy state, therefore, a majority of an analyst’s time is spent cleaning and preparing the data to make sure it is reliable. Power BI’s Power Query tool is designed for this task. Analysts use it to load data from many sources such as Excel files, SQL databases, APIs, and cloud platforms. Power Query allows analysts to apply repeatable step-by-step transformations to clean the data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common data problems analysts handle
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Missing values&lt;/li&gt;
&lt;li&gt;Duplicate records that inflate totals&lt;/li&gt;
&lt;li&gt;Different spellings or codes for the same category&lt;/li&gt;
&lt;li&gt;Incomplete dates or incorrect data types&lt;/li&gt;
&lt;li&gt;Columns that mix multiple values in one field&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Building a Strong Data Model
&lt;/h2&gt;

&lt;p&gt;After cleaning the data, analysts focus on building a proper data model. This includes defining relationships between tables, choosing the correct granularity, and organizing data in a way that supports accurate analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  Characteristics of a well-designed data model
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Improves performance&lt;/li&gt;
&lt;li&gt;Makes DAX calculations easier&lt;/li&gt;
&lt;li&gt;Reduces confusion for report users&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Analysts often use star schemas, separating fact tables from dimension tables, to keep the model simple and efficient. This step is invisible to most users, but is critical for reliable results.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using DAX to Add Meaning to the Data
&lt;/h2&gt;

&lt;p&gt;Raw data alone does not answer business questions. DAX(Data Analysis Expressions), therefore, helps analysts turn raw numbers into useful metrics/insight. Analysts use DAX to create measures that reflect real performance. For example, instead of showing total sales, DAX can show how sales compare over different months or years thus leading to better decision-making.&lt;/p&gt;

&lt;h3&gt;
  
  
  Examples of Insights obtained using DAX
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Year-over-year growth&lt;/li&gt;
&lt;li&gt;Employee turnover rate&lt;/li&gt;
&lt;li&gt;Running totals and averages&lt;/li&gt;
&lt;li&gt;Comparison of current performance to past periods&lt;/li&gt;
&lt;li&gt;Percentages, ratios, and growth rates&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Building Dashboards That Tell a Story
&lt;/h2&gt;

&lt;p&gt;Once the data and calculations are ready, analysts design dashboards with the end user in mind. Dashboards are not just charts but are tools for communication. They should guide users toward key insights without overwhelming them. The goal is clarity, not complexity. A good dashboard should have the right visuals, avoid clutter and highlight the most important numbers hence telling a clear story. Filters and slicers allow users to explore the data without needing technical skills.&lt;br&gt;
A good dashboard should have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The right chart type for each metric&lt;/li&gt;
&lt;li&gt;Logical layout and flow of information&lt;/li&gt;
&lt;li&gt;Clear labels, titles, and tooltips&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Good Dashboards help users:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quickly see what is going well&lt;/li&gt;
&lt;li&gt;Spot problems early&lt;/li&gt;
&lt;li&gt;Ask better questions&lt;/li&gt;
&lt;li&gt;Drill down to find the cause&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Turning Insights into Real Business Actions
&lt;/h2&gt;

&lt;p&gt;The real success of a dashboard is determined by the actions it supports. A good dashboard helps teams respond faster and make better decisions. This could include setting targets, tracking performance, or identifying risks early.&lt;br&gt;
Since dashboards can refresh automatically, decisions are based on up-to-date information rather than static reports. Alerts and shared reports also ensure that insights reach the right people at the right time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Measuring Impact and Improving Over Time
&lt;/h2&gt;

&lt;p&gt;Analyst work does not end after publishing a dashboard. They gather feedback, monitor usage, and refine reports over time. As business needs change, dashboards must evolve with them.&lt;/p&gt;

&lt;p&gt;Over time, Power BI reports can show measurable impact, such as reduced costs, improved performance, faster reporting cycles, or better accountability. These outcomes demonstrate how technical skills translate into real business value.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Analysts act as a bridge between raw data and business decisions. They clean messy data, use DAX to add meaning, and design dashboards that help people understand what is happening and what to do next. While Power BI provides the tools, it is the analyst’s understanding of data and business that turns information into action.&lt;/p&gt;

&lt;p&gt;The value of Power BI, therefore, is not in the visuals or formulas alone but lies in how analysts use it to support smarter decisions and create measurable impact across the organization.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>luxdev</category>
    </item>
    <item>
      <title>Schemas and Data Modelling in Power BI</title>
      <dc:creator>Lawrence Murithi</dc:creator>
      <pubDate>Tue, 03 Feb 2026 15:19:33 +0000</pubDate>
      <link>https://dev.to/lawrence_murithi/schemas-and-data-modelling-in-power-bi-2ja6</link>
      <guid>https://dev.to/lawrence_murithi/schemas-and-data-modelling-in-power-bi-2ja6</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Power BI is a business intelligence tool used to turn raw data into meaningful reports and dashboards. It allows organizations to analyze data, track performance, and make informed decisions. However, the quality of insights produced by Power BI depends heavily on how the data is structured behind the scenes. Good visuals and advanced calculations cannot compensate for poorly designed data.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are Schemas and Data Modelling
&lt;/h3&gt;

&lt;p&gt;Data modelling is the process of organizing data into tables and defining how those tables relate to each other. A schema is the structure or design of this data model. &lt;br&gt;
Schemas and data modelling define how data is organized, connected, and interpreted in Power BI. They determine how tables relate to each other, how filters flow across the model, and how calculations are performed. &lt;/p&gt;
&lt;h4&gt;
  
  
  Characteristics of a data model.
&lt;/h4&gt;

&lt;p&gt;A good data model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Makes reports faster and more responsive&lt;/li&gt;
&lt;li&gt;Produces correct totals and calculations&lt;/li&gt;
&lt;li&gt;Is easy to understand and maintain&lt;/li&gt;
&lt;li&gt;Works naturally with DAX measures and visuals.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A bad data model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Slows down reports&lt;/li&gt;
&lt;li&gt;Produces wrong or inconsistent numbers&lt;/li&gt;
&lt;li&gt;Forces complex and hard-to-read DAX formulas&lt;/li&gt;
&lt;li&gt;Confuses report users and developers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article explains schemas and data modelling in Power BI, focusing on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Star schema&lt;/li&gt;
&lt;li&gt;Snowflake schema&lt;/li&gt;
&lt;li&gt;Fact and dimension tables&lt;/li&gt;
&lt;li&gt;Relationships&lt;/li&gt;
&lt;li&gt;Importance of good modelling for performance and accurate reporting&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Understanding Data Modelling in Power BI
&lt;/h2&gt;

&lt;p&gt;Data modelling happens after data is loaded from sources such as Excel, databases, or cloud systems. The model is built in the Model view, where tables and relationships are defined.&lt;br&gt;
A Power BI data model usually includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fact tables (business events or measurements)&lt;/li&gt;
&lt;li&gt;Dimension tables (descriptive information)&lt;/li&gt;
&lt;li&gt;Relationships between tables&lt;/li&gt;
&lt;li&gt;A schema design (such as star or snowflake)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Fact Tables and Dimension Tables
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Fact tables&lt;/strong&gt; store measurable business data (i.e) numerical data that you want to analyze and measure.&lt;br&gt;
Characteristics of fact tables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Very large, with many rows&lt;/li&gt;
&lt;li&gt;Contain numeric values used in calculations&lt;/li&gt;
&lt;li&gt;Contain keys that link to dimension tables&lt;/li&gt;
&lt;li&gt;Grow over time as new transactions are added.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples of fact data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sales amount&lt;/li&gt;
&lt;li&gt;Quantity sold&lt;/li&gt;
&lt;li&gt;Profit.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Dimension tables&lt;/strong&gt; store descriptive information that helps explain the facts. Dimension tables are used for filtering, grouping, and slicing data in reports.&lt;/p&gt;

&lt;p&gt;Characteristics of dimension tables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Smaller than fact tables&lt;/li&gt;
&lt;li&gt;Mostly text and categorical data&lt;/li&gt;
&lt;li&gt;Used for filtering, grouping, and slicing&lt;/li&gt;
&lt;li&gt;Change less frequently than facts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples of dimension data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product name&lt;/li&gt;
&lt;li&gt;Customer name&lt;/li&gt;
&lt;li&gt;Region&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Relationships in Power BI
&lt;/h2&gt;

&lt;p&gt;Relationships in Power BI define how tables are connected and how data flows between them. A relationship is usually created between a key column in one table and a matching column in another table. These keys allow Power BI to link descriptive data from dimension tables to numerical data in fact tables. Relationships are mainly defined by cardinality, direction, and status.&lt;/p&gt;
&lt;h3&gt;
  
  
  Types of Relationships
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;One-to-Many&lt;/strong&gt; - This is the most common and recommended relationship where one record in a dimension table matches many records in a fact table.&lt;br&gt;
&lt;strong&gt;One-to-One&lt;/strong&gt; - One row in one table matches one row in another table.&lt;br&gt;
&lt;strong&gt;Many-to-One&lt;/strong&gt; - Many rows in the fact table match one row in the dimension table.&lt;br&gt;
&lt;strong&gt;Many-to-Many&lt;/strong&gt; - multiple rows in one table match multiple rows in another table and is often used when there is no unique key.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why Relationships Matter
&lt;/h3&gt;

&lt;p&gt;Good relationships:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensure correct totals and aggregations&lt;/li&gt;
&lt;li&gt;Control how slicers and filters behave&lt;/li&gt;
&lt;li&gt;Improve report performance&lt;/li&gt;
&lt;li&gt;Make DAX measures simpler and easier to maintain&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On the contrary, poorly defined relationships often result in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wrong numbers&lt;/li&gt;
&lt;li&gt;Missing data in visuals&lt;/li&gt;
&lt;li&gt;Confusing filter behavior&lt;/li&gt;
&lt;li&gt;Slow reports&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Best Practices for Relationships in Power BI
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use one-to-many relationships wherever possible&lt;/li&gt;
&lt;li&gt;Connect dimension tables to fact tables, not dimension to dimension&lt;/li&gt;
&lt;li&gt;Use numeric surrogate keys instead of text&lt;/li&gt;
&lt;li&gt;Avoid unnecessary many-to-many relationships&lt;/li&gt;
&lt;li&gt;Use single-direction filtering by default&lt;/li&gt;
&lt;li&gt;Keep the model simple and clear&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Star Schema
&lt;/h2&gt;

&lt;p&gt;A star schema is the recommended data model in Power BI and consists of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One central fact table&lt;/li&gt;
&lt;li&gt;Multiple dimension tables connected directly to the fact table.
A star schema structure looks like a star, with the fact table in the center and dimension tables branching out around it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;        Date
         |
Product — Sales — Customer
         |
       Region
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Benefits of Star Schema
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Simple and easy to understand&lt;/li&gt;
&lt;li&gt;Faster query performance&lt;/li&gt;
&lt;li&gt;Fewer relationships&lt;/li&gt;
&lt;li&gt;Easier DAX calculations&lt;/li&gt;
&lt;li&gt;Better compatibility with Power BI’s engine.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Power BI can process queries more efficiently because dimension tables are not connected to each other.&lt;/p&gt;

&lt;h2&gt;
  
  
  Snowflake Schema
&lt;/h2&gt;

&lt;p&gt;A snowflake schema is a more complex version of the star schema. In this structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dimension tables are normalized&lt;/li&gt;
&lt;li&gt;Dimension tables are connected to other dimension tables.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sales → Product → Category → Department
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Benefits of Snowflake Schema
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Reduces data redundancy&lt;/li&gt;
&lt;li&gt;Useful for very large or complex dimensions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Challenges of Snowflake Schema:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;More complex relationships&lt;/li&gt;
&lt;li&gt;Slower performance due to extra joins&lt;/li&gt;
&lt;li&gt;More complex DAX measures&lt;/li&gt;
&lt;li&gt;Harder for users to understand&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Good Data Modelling is Critical
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Performance&lt;/strong&gt; - Power BI uses an in-memory engine. A clean star schema reduces joins and improves query speed. Poor models can cause reports to load slowly or even fail.&lt;br&gt;
&lt;strong&gt;Accurate Reporting&lt;/strong&gt;  - Correct relationships and proper table design ensure that filters and totals behave correctly. Bad modelling often leads to duplicated values or missing data.&lt;br&gt;
&lt;strong&gt;Simpler DAX&lt;/strong&gt; - DAX formulas are easier to write and maintain when the model is simple. Complex schemas often require complicated formulas, which are harder to debug.&lt;br&gt;
&lt;strong&gt;Better User Experience&lt;/strong&gt; - Business users prefer models that are easy to understand. Clear table names, logical relationships, and simple structures help users create reports without confusion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Modelling Mistakes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Using many-to-many relationships unnecessarily&lt;/li&gt;
&lt;li&gt;Mixing transactional and lookup data in one table&lt;/li&gt;
&lt;li&gt;Using bi-directional relationships everywhere&lt;/li&gt;
&lt;li&gt;Not creating a proper date dimension&lt;/li&gt;
&lt;li&gt;Loading unnecessary columns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;NB:&lt;/strong&gt; Avoiding these mistakes improves both performance and reliability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Good data modelling is the foundation of effective Power BI reporting. Visuals and calculations only work well when the underlying model is designed correctly. A clean star schema with clear fact and dimension tables leads to faster performance, accurate results, and reports that are easy to build and maintain. &lt;br&gt;
 Good data modelling improves performance, ensures accurate reporting, simplifies DAX, and makes reports easier to use. Without proper modelling, even the best visuals cannot deliver correct insights.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>schema</category>
      <category>datamodelling</category>
    </item>
    <item>
      <title>Introduction to Linux for Data Engineers</title>
      <dc:creator>Lawrence Murithi</dc:creator>
      <pubDate>Mon, 26 Jan 2026 16:28:35 +0000</pubDate>
      <link>https://dev.to/lawrence_murithi/introduction-to-linux-for-data-engineers-29jn</link>
      <guid>https://dev.to/lawrence_murithi/introduction-to-linux-for-data-engineers-29jn</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;Linux is one of the most important tools for data engineers. Most data systems today run on Linux servers, including cloud platforms, databases, and big data tools like Hadoop and Spark. Understanding Linux basics is, therefore, a key skill for anyone starting a career in data engineering.&lt;/p&gt;

&lt;p&gt;This article introduces Linux in a simple way. It explains why Linux is important for data engineers, shows basic Linux commands, and demonstrates how to create and edit files using Vi and Nano, which are common Linux text editors.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Linux for Data Engineers
&lt;/h3&gt;

&lt;p&gt;Linux is important for data engineers for several reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Most data pipelines run on Linux servers&lt;/li&gt;
&lt;li&gt;Cloud platforms like AWS, Azure, and Google Cloud use Linux&lt;/li&gt;
&lt;li&gt;Tools such as Hadoop, Spark, Airflow, and Kafka are built for Linux&lt;/li&gt;
&lt;li&gt;Linux is stable, secure, and efficient for large data processing&lt;/li&gt;
&lt;li&gt;Data engineers often work with Log files, Configuration files and Scripts written in Python, SQL, or Bash. Linux makes it easy to manage these files directly from the terminal.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Basic Linux Commands For Beginners
&lt;/h3&gt;

&lt;p&gt;Linux commands are instructions typed in the terminal to tell the operating system what to do, such as creating files, moving between folders, or running programs. They allow users to interact directly with the system in a fast and efficient way. Linux commands help manage files, automate tasks and work effectively on servers, which is essential in data engineering and software development.&lt;br&gt;
Below are some of the beginner linux commands.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ssh root@IP&lt;/strong&gt; - connects to the server
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkgnls3mkhgft5boas1i2.jpg" alt="connect to server" width="800" height="202"&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;pwd&lt;/strong&gt; - Shows current directory&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F587xtbaj2j82213ilg0k.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F587xtbaj2j82213ilg0k.jpg" alt="current directory" width="800" height="66"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ls&lt;/strong&gt; - shows all files and folders in the current directory.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3mpamitb90f2ygfrxgn2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3mpamitb90f2ygfrxgn2.jpg" alt="list of files" width="800" height="79"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;cd&lt;/strong&gt; - Changes directory and &lt;strong&gt;cd ..&lt;/strong&gt; moves back one level&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fev19c7uurtluu6dnt6kr.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fev19c7uurtluu6dnt6kr.jpg" alt="Changes directory" width="800" height="54"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;mkdir&lt;/strong&gt;- Creates a new directory&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6et04w276yxlpbo7uqus.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6et04w276yxlpbo7uqus.jpg" alt="Creates a new folder" width="800" height="88"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;touch&lt;/strong&gt; - Creates an empty file&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzail6y4d6jb2n1dxk3fd.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzail6y4d6jb2n1dxk3fd.jpg" alt="Creates a new file" width="800" height="58"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;cp&lt;/strong&gt; - Copies files and &lt;strong&gt;cp -r&lt;/strong&gt; copies folder&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F16obwrr1ufd3g8ybl85c.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F16obwrr1ufd3g8ybl85c.jpg" alt="copy files" width="800" height="221"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;mv&lt;/strong&gt; - Moves or renames files&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fywehezjr2voj3kb9omxn.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fywehezjr2voj3kb9omxn.jpg" alt="rename file/folder" width="800" height="185"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;rm&lt;/strong&gt; - Deletes a file and &lt;strong&gt;rm -r&lt;/strong&gt; deletes a folder&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70xipdqhtewyj4jpe7dg.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70xipdqhtewyj4jpe7dg.jpg" alt="delete file/folder" width="800" height="145"&gt;&lt;/a&gt;&lt;br&gt;
-&lt;strong&gt;cat&lt;/strong&gt; - display file content&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx8kkeo6w9ctdxvc2z6g4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx8kkeo6w9ctdxvc2z6g4.jpg" alt="display file content" width="800" height="60"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Linux Vi and Nano Text Editors&lt;/strong&gt;&lt;br&gt;
Linux editors are programs used to create, open, and edit text files directly from the terminal. They are important because many configuration files, scripts, and logs in Linux are text-based. Data engineers and developers often use Linux editors when working on servers where graphical tools are not available.&lt;br&gt;
Some of the common Linux text editors are Vi and Nano. &lt;br&gt;
&lt;strong&gt;1. Nano Editor&lt;/strong&gt;&lt;br&gt;
Nano is a simple and beginner-friendly editor.&lt;br&gt;
To open or create a file with Nano:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nano filename.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F22el82mc6eivsybispy4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F22el82mc6eivsybispy4.jpg" alt="opening nano" width="800" height="161"&gt;&lt;/a&gt;&lt;br&gt;
The command opens the window below.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0xzfjrwjr1ts9jcapa8f.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0xzfjrwjr1ts9jcapa8f.jpg" alt="nano editor" width="800" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Other nano commands
&lt;/h4&gt;

&lt;h5&gt;
  
  
  Command    What it does
&lt;/h5&gt;

&lt;p&gt;Ctrl + O    Saves the file&lt;br&gt;
Ctrl + X    Exits Nano&lt;br&gt;
Ctrl + G    Shows help&lt;br&gt;
Ctrl + W    Searches for text&lt;br&gt;
Ctrl + K    Cuts (removes) a line&lt;br&gt;
Ctrl + U    Pastes a cut line&lt;br&gt;
Ctrl + A    Moves cursor to start of line&lt;br&gt;
Ctrl + E    Moves cursor to end of line&lt;br&gt;
Ctrl + C    Shows current line and column&lt;br&gt;
Ctrl + _    Go to a specific line number  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Vi/Vim Editor&lt;/strong&gt;&lt;br&gt;
Vi is a powerful editor and widely used in professional environments.&lt;br&gt;
Vi has 3 main modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Normal mode - navigation and commands&lt;/li&gt;
&lt;li&gt;Insert mode - typing text&lt;/li&gt;
&lt;li&gt;Visual mode - selecting text
To open a file in Vi editor:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vi filename.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0xlo6qvida5587lliorm.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0xlo6qvida5587lliorm.jpg" alt="open vi editor" width="800" height="199"&gt;&lt;/a&gt;&lt;br&gt;
The below window opens when the command is prompted.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fefoqg9hnzcivrrm3ce4t.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fefoqg9hnzcivrrm3ce4t.jpg" alt="vi editor" width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Other vi commands
&lt;/h4&gt;

&lt;h5&gt;
  
  
  Entering Insert Mode
&lt;/h5&gt;

&lt;p&gt;i - insert before cursor&lt;br&gt;
a - append after cursor&lt;br&gt;
o - open new line below&lt;br&gt;
I - insert at beginning of line&lt;br&gt;
A - append at end of line&lt;/p&gt;

&lt;h5&gt;
  
  
  Saving and Exiting/quiting
&lt;/h5&gt;

&lt;p&gt;:w - save (write)&lt;br&gt;
:q - quit&lt;br&gt;
:wq or ZZ - save and quit&lt;br&gt;
:q! - quit without saving&lt;br&gt;
:w filename - save as new file&lt;/p&gt;

&lt;h5&gt;
  
  
  Navigation Commands
&lt;/h5&gt;

&lt;p&gt;h - Move left&lt;br&gt;
l - Move right&lt;br&gt;
j - Move down&lt;br&gt;
k - Move up&lt;br&gt;
gg - Go to start of file&lt;br&gt;
G - Go to end of file&lt;br&gt;
0 - Start of line&lt;br&gt;
$ - End of line&lt;/p&gt;

&lt;h5&gt;
  
  
  Editing Commands
&lt;/h5&gt;

&lt;p&gt;x - delete character&lt;br&gt;
dd - delete line&lt;br&gt;
yy - copy line&lt;br&gt;
p - paste below&lt;br&gt;
P - paste above&lt;br&gt;
u - undo&lt;br&gt;
Ctrl+r - redo&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
Linux is a core skill for data engineers because it is used in servers, cloud platforms, and data tools. Basic Linux commands help you move around the system and manage files. &lt;br&gt;
Learning Linux early makes it easier to work with data pipelines, scripts, and production systems.&lt;/p&gt;

</description>
      <category>linux</category>
      <category>dataengineering</category>
      <category>vim</category>
      <category>luxdev</category>
    </item>
    <item>
      <title>A Simple beginners Guide to Git &amp; GitHub</title>
      <dc:creator>Lawrence Murithi</dc:creator>
      <pubDate>Sun, 25 Jan 2026 12:37:27 +0000</pubDate>
      <link>https://dev.to/lawrence_murithi/a-simple-beginners-guide-to-git-github-j29</link>
      <guid>https://dev.to/lawrence_murithi/a-simple-beginners-guide-to-git-github-j29</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;What is Git, GitHub and why version control is important&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. What is Git &amp;amp; GitHub&lt;/strong&gt;&lt;br&gt;
Git is a distributed version control system (VCS)/tool that helps you save, track, and manage changes in your code. It keeps a history of your work so you can see what changed, when it changed, and who changed it.&lt;br&gt;
Think of Git like a save system for your code, but much better than “Save As”.&lt;br&gt;
GitHub is an online platform where you store Git projects.&lt;br&gt;
GitHub allows you to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Back up your code&lt;/li&gt;
&lt;li&gt;Share code with others&lt;/li&gt;
&lt;li&gt;Work on the same project from different computers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What is version control?&lt;/strong&gt;&lt;br&gt;
Version control is a system that tracks and manages changes to software code and files over time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why is version control important?&lt;/strong&gt;&lt;br&gt;
Version control helps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Track changes in your work&lt;/li&gt;
&lt;li&gt;Go back to an older version if something breaks&lt;/li&gt;
&lt;li&gt;Work safely without fear of losing files&lt;/li&gt;
&lt;li&gt;Collaborate with other people on the same project&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Set Up the Git &amp;amp; GitHub Environments&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Step 1: Create a GitHub Account&lt;/strong&gt;&lt;br&gt;
Sign up on &lt;a href="https://github.com" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; using you credentials.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Install Git&lt;/strong&gt;&lt;br&gt;
Download &lt;a href="https://git-scm.com" rel="noopener noreferrer"&gt;Git&lt;/a&gt;.&lt;br&gt;
Install the application using the downloaded file.&lt;br&gt;
Open &lt;strong&gt;Git Bash&lt;/strong&gt; (Windows) or &lt;strong&gt;Terminal&lt;/strong&gt; (Mac/Linux).&lt;br&gt;
Configure your identity(Name and Email) to help Git Identify who is making the changes any time the changes are made.&lt;br&gt;
&lt;strong&gt;NB:&lt;/strong&gt; Use the email address used to sign up on GitHub.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git config -global user.name "your name"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git config -global user.email "youremail@example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check to ensure your configuration has been set up.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git config -global --list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3: Connect Git to Your GitHub Account&lt;/strong&gt;&lt;br&gt;
One of the easiest ways to connect your Git to your GitHub Account is using an SSH key which provides(digital identity) a secure password-less way to connect both to avoid the need for inputing a password every time.&lt;br&gt;
To generate the SSH key on GitBash, run the command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh-keygen -t ed25519 -C "youremail@example.com"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, you need to generate an agent; a helper program that holds your key in memory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;eval "$(ssh-agent -s)"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add the key generated to the agent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh-add ~/.ssh/id_ed25519
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lastly, print the public key and use it to connect your GitHub account.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat ~/.ssh/id_ed25519.pub
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copy the key, navigate to the SSH and GHG keys on settings in you GitHub account and add a new SSH key.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8s16vtnmc45jlvs01ohb.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8s16vtnmc45jlvs01ohb.jpg" alt="SSH key" width="800" height="440"&gt;&lt;/a&gt;&lt;br&gt;
Finally, authenticate your GitHub account in Git by running this command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh -T git@github.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Set up a Repository in GitHub and Project folder in Git&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a repository on GitHub&lt;/li&gt;
&lt;li&gt;Create a project folder in Git
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mkdir Myproject
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Initialize Git in your project. Tells Git to start tracking this project.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Add files in your folder. Tells Git which files to track.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git add 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Save changes (commit)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git commit -m "your message"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. Connect your project to GitHub)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git remote add origin https://github.com/yourusername/repositoryname.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Push code to GitHub (send code from your computer to GitHub)
Pushing refers to uploading local commits to a remote server to make your code accessible to others.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git push -u origin main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Pull code from GitHub (send code from GitHub to your computer).
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/username/repositoryname.git

git pull origin main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;clone&lt;/strong&gt; creates/downloads a new copy on your local computer while &lt;strong&gt;pull&lt;/strong&gt; updates your local files with any changes made.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check the status of your file.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Checking status shows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Modified files&lt;/li&gt;
&lt;li&gt;New files&lt;/li&gt;
&lt;li&gt;Files ready to be committed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Summary Cheatsheet&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokt7ogphm09i1x7ywgct.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokt7ogphm09i1x7ywgct.png" alt="Cheat sheet" width="514" height="598"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
Git, GitHub, and version control help developers and data professionals manage their work in a clear and organized way. While Git allows users to track changes, save progress through commits, and return to earlier versions when needed, GitHub provides an online platform to store projects, share code, and collaborate with others. Together, they reduce the risk of losing work, make teamwork easier, and support professional development by keeping code history clear and well documented.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>github</category>
      <category>git</category>
      <category>luxdev</category>
    </item>
  </channel>
</rss>
