<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Samit Paudel</title>
    <description>The latest articles on DEV Community by Samit Paudel (@samit_paudel_5d330ffcfb22).</description>
    <link>https://dev.to/samit_paudel_5d330ffcfb22</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3369255%2F59e555c7-17ea-49ec-b32f-1167f2b79457.jpg</url>
      <title>DEV Community: Samit Paudel</title>
      <link>https://dev.to/samit_paudel_5d330ffcfb22</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/samit_paudel_5d330ffcfb22"/>
    <language>en</language>
    <item>
      <title>Building a Modern Data Warehouse in SQL Server with Medallion Architecture</title>
      <dc:creator>Samit Paudel</dc:creator>
      <pubDate>Sat, 26 Jul 2025 10:53:10 +0000</pubDate>
      <link>https://dev.to/samit_paudel_5d330ffcfb22/building-a-modern-data-warehouse-in-sql-server-with-medallion-architecture-1234</link>
      <guid>https://dev.to/samit_paudel_5d330ffcfb22/building-a-modern-data-warehouse-in-sql-server-with-medallion-architecture-1234</guid>
      <description>&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/4rLd4M7G2n8"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;A data warehouse is essential for consolidating and transforming data from multiple sources into a single, reliable source for business intelligence, reporting, and analytical insights. It is a subject-oriented (focused on specific business areas like sales, customers, or products), integrated (combines data from multiple sources into a unified view), time-variant (stores historical data), non-volatile (data is usually not deleted from a warehouse) collection of data to support decision-making.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a Data Warehouse?
&lt;/h2&gt;

&lt;p&gt;Implementing a data warehouse offers many benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Replaces manual data gathering, significantly reducing human error, and increasing speed through ETL (Extract, Transform, Load) processes.&lt;/li&gt;
&lt;li&gt;Becomes the single source of truth for all analysis and reporting, ensuring consistency across the organization.&lt;/li&gt;
&lt;li&gt;Enables combining data from various systems.&lt;/li&gt;
&lt;li&gt;Historical Data Preservation&lt;/li&gt;
&lt;li&gt;Ensures all reports reflect the same, consistent data.&lt;/li&gt;
&lt;li&gt;Capable of handling large volumes of data, including "Big Data."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For this project, the objective was to develop a modern data warehouse using SQL Server to consolidate sales data, enabling analytical reporting and informed decision-making. This project aimed to cleanse and resolve data quality issues, combine ERP and CRM sources into a single data model requiring historization of all data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Medallion Architecture
&lt;/h2&gt;

&lt;p&gt;Given the need for both structured data processing and a desire for a modern, scalable approach, this project opted for the Medallion Architecture. This multi-layered architecture provides a clear and robust framework for data processing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwxdfxhkybvp96c6zosar.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwxdfxhkybvp96c6zosar.png" alt="Medallion Architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Medallion Architecture consists of three distinct layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bronze Layer (Raw):&lt;/strong&gt; Ingests raw, unprocessed data directly from source systems. Data is stored "as-is" to ensure traceability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Silver Layer (Cleaned &amp;amp; Transformed):&lt;/strong&gt; This layer holds cleaned, transformed, and integrated data. This is where data quality issues are resolved, duplicates are removed, formats are standardized, and data from different sources is joined.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gold Layer (Aggregated &amp;amp; Business-Ready):&lt;/strong&gt; Contains aggregated, business-level data optimized for business intelligence, analytics and reporting.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This layered approach ensures data quality and provides a structured path from raw input to analytical insights.&lt;/p&gt;

&lt;h2&gt;
  
  
  ETL:
&lt;/h2&gt;

&lt;p&gt;The ETL process is the backbone of every data warehouse project. It is where the data is moved from its raw state to usable format.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Extraction:&lt;/strong&gt; The first step involves identifying and extracting the necessary subset of data from the source systems (in our case, CSV files from ERP and CRM systems). At this stage, the data remains unchanged. Extraction happens in the Bronze layer of the Medallion architecture.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transformation:&lt;/strong&gt; This is the most critical phase, where extracted data undergoes various manipulations to meet business requirements. This includes data cleansing (handling missing values, correcting errors), data integration (combining data from multiple sources), formatting, and normalization. Transformation takes place in Silver layer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Loading:&lt;/strong&gt; Finally, the transformed data is loaded into the target database tables within the data warehouse. It happens in the Gold layer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxfmaqhqks20igzwrrudu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxfmaqhqks20igzwrrudu.png" alt="High Level Architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation Details:
&lt;/h2&gt;

&lt;p&gt;All the source files for the project are in csv format from two source systems: CRM and ERP. CRM is taken as the primary data source, in case of conflicts encounter between the data. An example of the csv file is provided in the image below. This is for the source file crm_cust_info.csv.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjcd3wh8syt9gabrnn62p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjcd3wh8syt9gabrnn62p.png" alt="csv"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Following six files are handled:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CRM&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;crm_cust_info&lt;/li&gt;
&lt;li&gt;prd_info&lt;/li&gt;
&lt;li&gt;sales_details&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;ERM&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;CUST_AZ12&lt;/li&gt;
&lt;li&gt;LOC_A101&lt;/li&gt;
&lt;li&gt;PX_CAT_G1V2&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;All the examples will primarily focus on the table crm_cust_info from CRM. If there are important considerations and drastic changes in procedure of handling the problem, then those implementations are also provided.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bronze Layer: Raw Data Ingestion
&lt;/h3&gt;

&lt;p&gt;This is the landing zone for the raw data. Here the data from the source is stored in the database – without changing it. Raw data Ingestion is handled in two steps: creating a table in SQL server exactly similar to the source csv file inside the bronze schema and loading all the data to these tables. An example of an ingestion of crm_cust_info.csv into our database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="n"&gt;OBJECT_ID&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'bronze.crm_cust_info'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'U'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
    &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;bronze&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;crm_cust_info&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;bronze&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;crm_cust_info&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;cst_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cst_key&lt;/span&gt; &lt;span class="n"&gt;NVARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;cst_firstname&lt;/span&gt; &lt;span class="n"&gt;NVARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;cst_lastname&lt;/span&gt; &lt;span class="n"&gt;NVARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;cst_marital_status&lt;/span&gt; &lt;span class="n"&gt;NVARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;cst_gndr&lt;/span&gt; &lt;span class="n"&gt;NVARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;cst_create_date&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then the data is bulk loaded into this table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;TRUNCATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;bronze&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;crm_cust_info&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;BULK&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="n"&gt;bronze&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;crm_cust_info&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;the_file_location&amp;gt;'&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;FIRSTROW&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;-- Skips the header row in the CSV file&lt;/span&gt;
    &lt;span class="n"&gt;FIELDTERMINATOR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;','&lt;/span&gt; &lt;span class="c1"&gt;-- Specifies comma as the field delimiter&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same process is repeated for all other files. These two processes are then consolidated into a stored procedure with some logging, error handling and some performance monitoring. Following is a snippet of stored procedure to load data into gold layer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;PROCEDURE&lt;/span&gt; &lt;span class="n"&gt;bronze&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load_bronze&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="k"&gt;DECLARE&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="nb"&gt;DATETIME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;end_time&lt;/span&gt; &lt;span class="nb"&gt;DATETIME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;start_bronze_layer&lt;/span&gt; &lt;span class="nb"&gt;DATETIME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;end_bronze_layer&lt;/span&gt; &lt;span class="nb"&gt;DATETIME&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;-- These variables can be used for checking how much time each table took to truncate and load,&lt;/span&gt;
    &lt;span class="c1"&gt;-- and the total time it took to load the batch.&lt;/span&gt;

    &lt;span class="k"&gt;BEGIN&lt;/span&gt; &lt;span class="n"&gt;TRY&lt;/span&gt;
            &lt;span class="n"&gt;PRINT&lt;/span&gt; &lt;span class="s1"&gt;'---------------------'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;PRINT&lt;/span&gt; &lt;span class="s1"&gt;'Loading Bronze Layer '&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;PRINT&lt;/span&gt; &lt;span class="s1"&gt;'---------------------'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="n"&gt;PRINT&lt;/span&gt; &lt;span class="s1"&gt;'Loading CRM Tables...'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;start_bronze_layer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GETDATE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GETDATE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="n"&gt;PRINT&lt;/span&gt; &lt;span class="s1"&gt;'&amp;gt;&amp;gt; Truncating Table: bronze.crm_cust_info'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

         &lt;span class="c1"&gt;-- PRINT statements log into console for debugging&lt;/span&gt;
         &lt;span class="c1"&gt;-- ... (The Bulk Load Queries for each of the 6 tables)&lt;/span&gt;
        &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="n"&gt;TRY&lt;/span&gt;
    &lt;span class="k"&gt;BEGIN&lt;/span&gt; &lt;span class="n"&gt;CATCH&lt;/span&gt;
        &lt;span class="n"&gt;PRINT&lt;/span&gt; &lt;span class="s1"&gt;'--------------------------------------------'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;PRINT&lt;/span&gt; &lt;span class="s1"&gt;'ERROR OCCURRED DURING LOADING BRONZE LAYER...'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;PRINT&lt;/span&gt; &lt;span class="s1"&gt;'--------------------------------------------'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;PRINT&lt;/span&gt; &lt;span class="s1"&gt;'Error: '&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;ERROR_MESSAGE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="n"&gt;PRINT&lt;/span&gt; &lt;span class="s1"&gt;'Error No.'&lt;/span&gt;
        &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ERROR_NUMBER&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;NVARCHAR&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;PRINT&lt;/span&gt; &lt;span class="s1"&gt;'Error State.'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ERROR_STATE&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;NVARCHAR&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="n"&gt;CATCH&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Silver Layer: Data Cleaning and Transformation:
&lt;/h3&gt;

&lt;p&gt;The Silver layer is where the core data transformation and integration take place. This is crucial for preparing data for analytical consumption. A data integration model was created before proceeding through our transformation mainly with the aim of creating appropriate data for the joins later on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frc7r6rnumzq44hqlv89n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frc7r6rnumzq44hqlv89n.png" alt="Data Integration Model"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Again, transformation took place in two steps, which was handle by single query. Each tables presented unique problems requiring respective transformation techniques to handle it. Similar to bronze layer, the initial step was to create corresponding tables. The table structures were kept similar to the bronze layer when starting which was subjected to change as we solidify the transformation approach by working on the bronze layer tables.&lt;/p&gt;

&lt;h4&gt;
  
  
  Some Transformation Problems Handled
&lt;/h4&gt;

&lt;p&gt;Let's look at the cleaning and transformation steps for key tables:&lt;/p&gt;

&lt;h4&gt;
  
  
  crm_cust_info Cleaning
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcdnbrqqknfrtpctbqwr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcdnbrqqknfrtpctbqwr.png" alt="initial state of crm_cust_info"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Handling Duplicate and NULL Primary Keys:&lt;/strong&gt; We identified duplicate &lt;code&gt;cst_id&lt;/code&gt; values in the bronze layer. To resolve this, we used &lt;code&gt;ROW_NUMBER()&lt;/code&gt; with &lt;code&gt;PARTITION BY cst_id ORDER BY cst_create_date DESC&lt;/code&gt; to select the latest record for each customer.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Query to identify duplicates and NULLs&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;cst_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;Occurance&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bronze&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;crm_cust_info&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;cst_id&lt;/span&gt;
&lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;cst_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;Occurance&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxw5yuacspd2fzq0lhsqy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxw5yuacspd2fzq0lhsqy.png" alt="Final Table"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The `flag_last` column helps us pick the most recent record:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```sql
SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY cst_id ORDER BY cst_create_date DESC) AS flag_last
FROM bronze.crm_cust_info;
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqtghj93n1b4o1ve84p4i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqtghj93n1b4o1ve84p4i.png" alt="Ssadsad"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Removing Unwanted Spaces:&lt;/strong&gt; For text columns like &lt;code&gt;cst_firstname&lt;/code&gt;, we used &lt;code&gt;TRIM()&lt;/code&gt; to remove leading/trailing spaces.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Data Normalization (Low Cardinality Columns):&lt;/strong&gt; For columns like &lt;code&gt;cst_marital_status&lt;/code&gt; and &lt;code&gt;cst_gndr&lt;/code&gt;, we standardized values using &lt;code&gt;CASE&lt;/code&gt; statements to ensure consistency (e.g., 'M' to 'Male', 'F' to 'Female', 'n/a' for NULLs).&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;cst_gndr&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bronze&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;crm_cust_info&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```sql
-- Example of gender normalization
…CASE WHEN UPPER(TRIM(cst_gndr)) = 'M' THEN 'Male'
     WHEN UPPER(TRIM(cst_gndr)) = 'F' THEN 'Female'
     ELSE 'n/a'
     END AS cst_gndr
…
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Final Transformed Query for silver.crm_cust_info:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;silver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;crm_cust_info&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;cst_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cst_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cst_firstname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cst_lastname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cst_marital_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cst_gndr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cst_create_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;cst_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cst_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cst_firstname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cst_firstname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cst_lastname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cst_lastname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="k"&gt;UPPER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cst_marital_status&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'S'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Single'&lt;/span&gt;
         &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="k"&gt;UPPER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cst_marital_status&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'M'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Married'&lt;/span&gt;
         &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="s1"&gt;'n/a'&lt;/span&gt;
     &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cst_marital_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="k"&gt;UPPER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cst_gndr&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'M'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Male'&lt;/span&gt;
         &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="k"&gt;UPPER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cst_gndr&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'F'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Female'&lt;/span&gt;
         &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="s1"&gt;'n/a'&lt;/span&gt;
     &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cst_gndr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cst_create_date&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt;
        &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;ROW_NUMBER&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;cst_id&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;cst_create_date&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;flag_last&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bronze&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;crm_cust_info&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;flag_last&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;cst_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmcffor0vydl9tomagl5o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmcffor0vydl9tomagl5o.png" alt="final table crm_cust_info"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  crm_prd_info Cleaning
&lt;/h4&gt;

&lt;p&gt;Similar checks for PKs and spaces were performed. Key transformations for &lt;code&gt;crm_prd_info&lt;/code&gt; included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Derived Columns:&lt;/strong&gt; Extracting &lt;code&gt;cat_id&lt;/code&gt; and &lt;code&gt;prd_key&lt;/code&gt; from the &lt;code&gt;prd_key&lt;/code&gt; column in the source to facilitate joins with other tables. We used &lt;code&gt;SUBSTRING&lt;/code&gt; and &lt;code&gt;REPLACE&lt;/code&gt; to format these keys consistently.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Example of derived columns&lt;/span&gt;
&lt;span class="k"&gt;REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUBSTRING&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;prd_key&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'-'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'_'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;cat_id&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="k"&gt;REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUBSTRING&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;prd_key&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LEN&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;prd_key&lt;/span&gt;&lt;span class="p"&gt;])),&lt;/span&gt; &lt;span class="s1"&gt;'-'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'_'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prd_key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Handling Integers:&lt;/strong&gt; Using &lt;code&gt;COALESCE(prd_cost, 0)&lt;/code&gt; to replace NULL product costs with 0.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Start and End Date Logic:&lt;/strong&gt; Business logic dictated that a NULL &lt;code&gt;prd_end_dt&lt;/code&gt; signifies the current price. We derived &lt;code&gt;prd_end_dt&lt;/code&gt; for historical records using &lt;code&gt;LEAD()&lt;/code&gt; to set the end date as one day before the next product's start date, ensuring non-overlapping periods.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Example for prd_end_dt derivation&lt;/span&gt;
&lt;span class="n"&gt;DATEADD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LEAD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prd_start_dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;prd_key&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;prd_start_dt&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;prd_end_dt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxa35h4meltlwf9p6u21l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxa35h4meltlwf9p6u21l.png" alt="Handling Start and End Date"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Final Transformed Query for silver.crm_prd_info:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;silver&lt;/span&gt;&lt;span class="p"&gt;].[&lt;/span&gt;&lt;span class="n"&gt;crm_prd_info&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prd_id&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;cat_id&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prd_key&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prd_nm&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prd_cost&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prd_line&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prd_start_dt&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prd_end_dt&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
      &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prd_id&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="k"&gt;REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUBSTRING&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;prd_key&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'-'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'_'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;cat_id&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="k"&gt;REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUBSTRING&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;prd_key&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LEN&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;prd_key&lt;/span&gt;&lt;span class="p"&gt;])),&lt;/span&gt; &lt;span class="s1"&gt;'-'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'_'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prd_key&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prd_nm&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prd_cost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;prd_cost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="k"&gt;UPPER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prd_line&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'M'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Mounting'&lt;/span&gt;
      &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="k"&gt;UPPER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prd_line&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'R'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Road'&lt;/span&gt;
            &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="k"&gt;UPPER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prd_line&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'S'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Other Sales'&lt;/span&gt;
            &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="k"&gt;UPPER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prd_line&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'T'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Touring'&lt;/span&gt;
            &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="s1"&gt;'n/a'&lt;/span&gt;
        &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;prd_line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;prd_start_dt&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prd_start_date&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="n"&gt;DATEADD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;LEAD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prd_start_dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;prd_key&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;prd_start_dt&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;prd_end_dt&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;DataWarehouse&lt;/span&gt;&lt;span class="p"&gt;].[&lt;/span&gt;&lt;span class="n"&gt;bronze&lt;/span&gt;&lt;span class="p"&gt;].[&lt;/span&gt;&lt;span class="n"&gt;crm_prd_info&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1szcr3onluyy0wibj3rw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1szcr3onluyy0wibj3rw.png" alt="Final State of crm_prd_info"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  crm_sales_details Cleaning
&lt;/h4&gt;

&lt;p&gt;The initial state of &lt;code&gt;crm_sales_details&lt;/code&gt; table:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72wg8gc7u6i271x2s2ve.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72wg8gc7u6i271x2s2ve.png" alt="Initial stage of crm_sales_details"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Following transformations were applied to this table:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Date Column Conversion and Validation:&lt;/strong&gt; Sales dates were sometimes 0 or in an incorrect format. We converted these to NULL and then cast valid entries to DATE format. We also implicitly validated that Order Date &amp;lt;= Shipping Date &amp;lt;= Due Date by ensuring proper casting.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Example for date conversion&lt;/span&gt;
&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;sls_order_dt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;LEN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sls_order_dt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
     &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sls_order_dt&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
     &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_order_dt&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Checking Derived Values:&lt;/strong&gt; For &lt;code&gt;sls_sales&lt;/code&gt;, &lt;code&gt;sls_quantity&lt;/code&gt;, and &lt;code&gt;sls_price&lt;/code&gt;, we ensured logical consistency (e.g., &lt;code&gt;sls_sales = sls_quantity * sls_price&lt;/code&gt;) and handled NULL or negative values using &lt;code&gt;COALESCE&lt;/code&gt; and &lt;code&gt;CASE&lt;/code&gt; statements.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Final Transformed Query for silver.crm_sales_details:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;silver&lt;/span&gt;&lt;span class="p"&gt;].[&lt;/span&gt;&lt;span class="n"&gt;crm_sales_details&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_ord_num&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
           &lt;span class="p"&gt;,[&lt;/span&gt;&lt;span class="n"&gt;sls_prd_key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
           &lt;span class="p"&gt;,[&lt;/span&gt;&lt;span class="n"&gt;sls_cust_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
           &lt;span class="p"&gt;,[&lt;/span&gt;&lt;span class="n"&gt;sls_order_dt&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
           &lt;span class="p"&gt;,[&lt;/span&gt;&lt;span class="n"&gt;sls_ship_dt&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
           &lt;span class="p"&gt;,[&lt;/span&gt;&lt;span class="n"&gt;sls_due_dt&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
           &lt;span class="p"&gt;,[&lt;/span&gt;&lt;span class="n"&gt;sls_sales&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
           &lt;span class="p"&gt;,[&lt;/span&gt;&lt;span class="n"&gt;sls_quantity&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
           &lt;span class="p"&gt;,[&lt;/span&gt;&lt;span class="n"&gt;sls_price&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_ord_num&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_prd_key&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_cust_id&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;sls_order_dt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;LEN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sls_order_dt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
         &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sls_order_dt&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
         &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_order_dt&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;sls_ship_dt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;LEN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sls_ship_dt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
         &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sls_ship_dt&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
         &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_ship_dt&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;sls_due_dt&lt;/span&gt;
&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;LEN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sls_due_dt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
     &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sls_due_dt&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
     &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_due_dt&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;sls_sales&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_price&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_quantity&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_sales&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;sls_quantity&lt;/span&gt;&lt;span class="p"&gt;],[&lt;/span&gt;&lt;span class="n"&gt;sls_sales&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_price&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_quantity&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_price&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_sales&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_quantity&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;sls_price&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_sales&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_quantity&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sls_price&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bronze&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;crm_sales_details&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The remaining ERP tables (erp_cust_az12, erp_loc_a101, erp_px_cat_g1v2) underwent similar cleaning processes, including handling NULLs, duplicates, date formats, and data normalization. All these silver layer insertions were then consolidated into another stored procedure, &lt;code&gt;silver.load_silver&lt;/code&gt;, for automated execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gold Layer: Analytical Data Model for Business Ready Data
&lt;/h3&gt;

&lt;p&gt;The Gold layer is designed for optimal analytical performance and ease of understanding for business users. We adopted a Star Schema model, which is highly effective for BI and reporting.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faelp7tpmmlrnddbyyycu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faelp7tpmmlrnddbyyycu.png" alt="Star Schema"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In a Star Schema:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A central fact table contains quantitative information (measures) about business events or transactions (e.g., sales amount, order quantity). It answers, "How Much?" or "How Many?".&lt;/li&gt;
&lt;li&gt;It is surrounded by dimension tables, which provide descriptive context to the data (e.g., customer names, product categories, dates, locations). They answer "Who?" "What?" "Where?".&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We created views in the gold schema to represent our dimension and fact tables. Views are excellent for the Gold layer as they provide a logical abstraction without duplicating data, always reflecting the latest transformations from the Silver layer.&lt;/p&gt;

&lt;h4&gt;
  
  
  Customer Dimension (gold.dim_customers)
&lt;/h4&gt;

&lt;p&gt;This dimension integrates customer information from CRM and ERP sources. A key challenge was reconciling two gender columns from different sources. We prioritized the CRM source and defaulted to 'n/a' if both were missing. We also added a surrogate key (&lt;code&gt;customer_key&lt;/code&gt;) using &lt;code&gt;ROW_NUMBER()&lt;/code&gt; for efficient joining with fact tables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;gold&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dim_customers&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;ROW_NUMBER&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cst_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;customer_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;-- Add surrogate key&lt;/span&gt;
    &lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cst_id&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cst_key&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;customer_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cst_firstname&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cst_lastname&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cntry&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cst_marital_status&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;marital_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;-- Data integration for gender: CRM is primary, then ERP, then 'n/a'&lt;/span&gt;
    &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cst_gndr&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;'n/a'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cst_gndr&lt;/span&gt;
         &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ca&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gen&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'n/a'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
         &lt;span class="k"&gt;END&lt;/span&gt;
&lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;gender&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ca&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bdate&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;birth_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cst_create_date&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;created_date&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;silver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;crm_cust_info&lt;/span&gt; &lt;span class="n"&gt;ci&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;silver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;erp_cust_az12&lt;/span&gt; &lt;span class="n"&gt;ca&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cst_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ca&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cid&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;silver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;erp_loc_a101&lt;/span&gt; &lt;span class="n"&gt;cl&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cst_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cid&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0nvnolbb0bzwqc84jqrh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0nvnolbb0bzwqc84jqrh.png" alt="Customers Dimension table"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Product Dimension (gold.dim_products)
&lt;/h4&gt;

&lt;p&gt;This dimension provides descriptive information about products. Based on the project scope, we focused on the latest product information, filtering out historical records using &lt;code&gt;WHERE pi.prd_end_dt IS NULL&lt;/code&gt;. A &lt;code&gt;product_key&lt;/code&gt; surrogate key was also added.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;gold&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dim_products&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;ROW_NUMBER&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prd_start_dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prd_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;product_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;-- Surrogate key&lt;/span&gt;
    &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prd_id&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prd_key&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;product_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prd_nm&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cat_id&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;category_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cat&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;subcat&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;subcategory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;maintenance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prd_cost&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;product_cost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prd_line&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;product_line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prd_start_dt&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;start_date&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;silver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;crm_prd_info&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;silver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;erp_px_cat_g1v2&lt;/span&gt; &lt;span class="n"&gt;ep&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cat_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prd_end_dt&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;-- Filters for current product information&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpqo69dmx9vyem4asalf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpqo69dmx9vyem4asalf.png" alt="Products Dimension table"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Sales Fact (gold.fact_sales)
&lt;/h4&gt;

&lt;p&gt;The fact table contains the core sales transaction data. It links to our dimension tables using their respective surrogate keys (&lt;code&gt;product_key&lt;/code&gt; and &lt;code&gt;customer_key&lt;/code&gt;), allowing for flexible and performant analytical queries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;gold&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fact_sales&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;sd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sls_ord_num&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;-- Surrogate key from dim_products&lt;/span&gt;
    &lt;span class="n"&gt;dc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;-- Surrogate key from dim_customers&lt;/span&gt;
    &lt;span class="n"&gt;sd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sls_order_dt&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sls_ship_dt&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;shipping_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sls_due_dt&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;due_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sls_sales&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sls_quantity&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sls_price&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;silver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;crm_sales_details&lt;/span&gt; &lt;span class="n"&gt;sd&lt;/span&gt;
&lt;span class="c1"&gt;-- JOIN with dimension tables using surrogate keys&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;gold&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dim_products&lt;/span&gt; &lt;span class="n"&gt;dp&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;sd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sls_prd_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_number&lt;/span&gt; &lt;span class="c1"&gt;-- Join on business key, but select surrogate key&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;gold&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dim_customers&lt;/span&gt; &lt;span class="n"&gt;dc&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;sd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sls_cust_id&lt;/span&gt;
&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Join on business key, but select surrogate key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Final Data Model and Flow
&lt;/h3&gt;

&lt;p&gt;The resulting Star Schema provides a clear and intuitive structure for reporting:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5ohy7fuznevefnfkjfg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5ohy7fuznevefnfkjfg.png" alt="Integration Data Model"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And the complete data flow from source to consumption:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3gdb1h3kgts0uadgq8tp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3gdb1h3kgts0uadgq8tp.png" alt="Data Flow Diagram"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion:
&lt;/h2&gt;

&lt;p&gt;This project successfully built a modern data warehouse in SQL Server using the Medallion Architecture, consolidating sales data from ERP and CRM systems into a unified, clean, and structured resource for analytics. I gained hands-on experience in raw data ingestion (Bronze layer) , resolving data quality issues and transforming data (Silver layer) , and designing an optimized analytical data model using a Star Schema in the Gold layer for business intelligence and reporting. This experience significantly enhanced my skills in SQL Server, ETL processes, and data warehousing best practices, demonstrating the Medallion Architecture's effectiveness in creating reliable data pipelines and a "single source of truth".&lt;/p&gt;

&lt;p&gt;Feel free to connect with me on &lt;a href="https://www.linkedin.com/in/samit-paudel-573a6b279/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; for any queries! You can find the full project code and resources on my &lt;a href="https://github.com/Samizen/sql_data_warehouse" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; repository.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>sql</category>
      <category>etl</category>
      <category>datawarehouse</category>
    </item>
    <item>
      <title>From Requirements to a Data Model in MSSQL</title>
      <dc:creator>Samit Paudel</dc:creator>
      <pubDate>Sat, 19 Jul 2025 12:44:57 +0000</pubDate>
      <link>https://dev.to/samit_paudel_5d330ffcfb22/from-requirements-to-a-data-model-in-mssql-3927</link>
      <guid>https://dev.to/samit_paudel_5d330ffcfb22/from-requirements-to-a-data-model-in-mssql-3927</guid>
      <description>&lt;h3&gt;
  
  
  Designing a Database from Barely-There Requirements
&lt;/h3&gt;

&lt;p&gt;Like many ideas born in Kathmandu, this one emerged from a tea shop: two of my friends discussed a room-rental app to bridge a market gap for a direct-to-owner platform eliminating brokers, a concept surprisingly absent in Nepal's popular app scene. &lt;br&gt;
I am currently learning Data Engineering and was genuinely interested in finding out what might go in the backend of such an application. What kind of data is involved? What tables would be needed? This post explains my process of translating a few bullet points to a fully-fledged database schema, offering insights into how to approach database design when the requirements are barely there.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Initial Requirements
&lt;/h2&gt;

&lt;p&gt;This was the original brief I received:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpqpkl23h5kzjy3ns2u1j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpqpkl23h5kzjy3ns2u1j.png" alt="Initial Requirement" width="800" height="725"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The initial requirements expressed a desire to build an application, but in terms of actual details, they were almost non-existent. I urged my friends to go a bit deeper into each requirement, but there was no follow-up. So, I took it as a chance to build something realistic, grounded in what I had seen in the real world. &lt;/p&gt;

&lt;h2&gt;
  
  
  Refining the Requirement
&lt;/h2&gt;

&lt;p&gt;The first step was research. The initial document referenced "NoBroker," a popular Indian app. Given that rental practices are similar in India and Nepal, I used it as a starting point. I also explored local apps like Basobas, which had clearer classifications and were more relevant to the Nepali market. &lt;/p&gt;

&lt;p&gt;From the research, I wrote my own detailed requirements. After laying down the requirements in detail, the design became a lot easier. The core entities and their attributes began to emerge naturally. &lt;br&gt;
Here is an example of one of my requirements from the perspective of a tenant: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdb3wq2dltg3214rec8gx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdb3wq2dltg3214rec8gx.png" alt="A requirement for the application" width="800" height="716"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can view the full requirements &lt;a href="https://docs.google.com/document/d/1U6aH24MBhSfkSTpMTIhEoM3bmH3W7gyXZKKPn0UfnQE/edit?usp=sharing" rel="noopener noreferrer"&gt;here&lt;/a&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  Building a Conceptual Data Model
&lt;/h2&gt;

&lt;p&gt;With better-defined requirements, I began creating a Conceptual Data Model. This included identifying the key entities and how they relate to each other. An example of this for the above requirement would be: &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Entities:&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Listing&lt;/strong&gt; – with attributes like Location, Title, Description, Property Type (Residential or Commercial), Rental Type, Layout, Price, Availability, Preferred Tenant Type, Is Pet Allowed?, Is Smoking Allowed? &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amenity&lt;/strong&gt; – with attributes like Amenity’s Category, Name&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Relationship between Entities&lt;/strong&gt;&lt;br&gt;
Listings contains Amenities&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8p6mpxbge6b47bgd0mqk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8p6mpxbge6b47bgd0mqk.png" alt="Relationship between Listing and Amenity" width="680" height="191"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Conceptual Model served two main purposes for me – it defined the scope of the application, and it clarified the implementation of written requirements. I tried to clearly identify many-to-many relationships, like Listing and Amenity, which would introduce a bridge table between them to convert many-to-many relationships to multiple one-to-many relationships. I visualized these using pen and paper before moving to a tool.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5j5k7rk9dixiljtqnk7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5j5k7rk9dixiljtqnk7.png" alt="Turning Many-to-Many to One-to-Many" width="672" height="344"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Transitioning into Logical Data Model:
&lt;/h2&gt;

&lt;p&gt;Then, I translated my conceptual model into a Logical Data Model. A logical model is a conceptual model to which data normalization techniques have been applied and for which all of the attributes have been identified. For me, this meant adding more details to my entities, including attributes and their data types, relationships and cardinality, and normalization of tables.&lt;/p&gt;

&lt;p&gt;I found it much easier to accomplish this using MySQL workbench’s EER Diagram creator. It also helps visualize the tables and their relationships well.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6r945wkfoina0y5byee.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6r945wkfoina0y5byee.png" alt="MySQL Workbench" width="800" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Initially, I used ENUMs for controlled fields (e.g., rental type), but soon converted these into lookup tables to keep the database flexible and scalable for the future. I focused on normalizing the tables in this stage, applying the Third Normal Form. This Logical Data Modeling step completed with the creation of a complete schema that included 37 tables, properly defined keys and constraints, and logical implementation of lookup, transaction, and system tables. Lastly, I indexed all the foreign keys for performance optimization. &lt;/p&gt;

&lt;p&gt;The final &lt;a href="https://github.com/Samizen/RoomRental/blob/main/images/Rentals_PNG.png" rel="noopener noreferrer"&gt;ERD&lt;/a&gt; for the Logical Data model is displayed below. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fum2pi6navfliig617u50.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fum2pi6navfliig617u50.png" alt="Final ERD" width="800" height="553"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Implementation in SQL Server
&lt;/h2&gt;

&lt;p&gt;I implemented the final schema in MS SQL Server using DDL queries. While the initial modeling was easier with MySQL Workbench because of its UI-based modeling, my experience and comfort with MSSQL was the only reason I chose it. &lt;/p&gt;

&lt;p&gt;You can find the following in my &lt;a href="https://github.com/Samizen/RoomRental" rel="noopener noreferrer"&gt;Github Repository&lt;/a&gt;: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full DDL Script &lt;/li&gt;
&lt;li&gt;Dummy Data Insertion Script &lt;/li&gt;
&lt;li&gt;ERD&lt;/li&gt;
&lt;li&gt;Data Dictionary&lt;/li&gt;
&lt;li&gt;Requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I learned and Implemented
&lt;/h2&gt;

&lt;p&gt;Starting with vague requirements, I created a detailed, actionable set of specifications using design thinking with minimal direction. I improved my research skills, doing it with the intention of implementation. I was able to apply Conceptual and Logical Data Modeling techniques to a real-life scenario and normalize a database. &lt;/p&gt;

&lt;p&gt;I also got to implement these SQL concepts during the project: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;CREATE TABLE&lt;/code&gt;, &lt;code&gt;DROP TABLE IF EXISTS&lt;/code&gt; – to define and reset schema&lt;/li&gt;
&lt;li&gt;Use of various data types and their actual impact on a real database&lt;/li&gt;
&lt;li&gt;Primary and Foreign Keys – to define relationships and integrity&lt;/li&gt;
&lt;li&gt;Other constraints like &lt;code&gt;NOT NULL&lt;/code&gt;, &lt;code&gt;DEFAULT&lt;/code&gt;, &lt;code&gt;UNIQUE&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Implementing Relationships &lt;/li&gt;
&lt;li&gt;Lookup Tables – for flexible ENUM alternatives&lt;/li&gt;
&lt;li&gt;Bridge Tables – to resolve many-to-many relationships&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;IDENTITY&lt;/code&gt; – for auto-increment primary keys&lt;/li&gt;
&lt;li&gt;Indexing – applied on foreign keys for performance optimization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're learning SQL or starting in Data Engineering, I encourage you to use my schema as a playground. It’s full of real-world entities and dense enough to practice joins, aggregations, constraints, and more.&lt;/p&gt;

&lt;p&gt;I hope this post helps others see that getting started is more important than getting it perfect. If you've ever built something from vague specs, I’d love to hear how you approached it. Feel free to reach out or comment on &lt;a href="https://www.linkedin.com/in/samit-paudel-573a6b279/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;! &lt;/p&gt;

</description>
    </item>
    <item>
      <title>From Requirements to a Data Model in MSSQL</title>
      <dc:creator>Samit Paudel</dc:creator>
      <pubDate>Sat, 19 Jul 2025 12:44:57 +0000</pubDate>
      <link>https://dev.to/samit_paudel_5d330ffcfb22/from-requirements-to-a-data-model-in-mssql-3n5j</link>
      <guid>https://dev.to/samit_paudel_5d330ffcfb22/from-requirements-to-a-data-model-in-mssql-3n5j</guid>
      <description>&lt;h3&gt;
  
  
  Designing a Database from Barely-There Requirements
&lt;/h3&gt;

&lt;p&gt;Like many ideas born in Kathmandu, this one emerged from a tea shop: two of my friends discussed a room-rental app to bridge a market gap for a direct-to-owner platform eliminating brokers, a concept surprisingly absent in Nepal's popular app scene. &lt;br&gt;
I am currently learning Data Engineering and was genuinely interested in finding out what might go in the backend of such an application. What kind of data is involved? What tables would be needed? This post explains my process of translating a few bullet points to a fully-fledged database schema, offering insights into how to approach database design when the requirements are barely there.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Initial Requirements
&lt;/h2&gt;

&lt;p&gt;This was the original brief I received:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpqpkl23h5kzjy3ns2u1j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpqpkl23h5kzjy3ns2u1j.png" alt="Initial Requirement" width="800" height="725"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The initial requirements expressed a desire to build an application, but in terms of actual details, they were almost non-existent. I urged my friends to go a bit deeper into each requirement, but there was no follow-up. So, I took it as a chance to build something realistic, grounded in what I had seen in the real world. &lt;/p&gt;

&lt;h2&gt;
  
  
  Refining the Requirement
&lt;/h2&gt;

&lt;p&gt;The first step was research. The initial document referenced "NoBroker," a popular Indian app. Given that rental practices are similar in India and Nepal, I used it as a starting point. I also explored local apps like Basobas, which had clearer classifications and were more relevant to the Nepali market. &lt;/p&gt;

&lt;p&gt;From the research, I wrote my own detailed requirements. After laying down the requirements in detail, the design became a lot easier. The core entities and their attributes began to emerge naturally. &lt;br&gt;
Here is an example of one of my requirements from the perspective of a tenant: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdb3wq2dltg3214rec8gx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdb3wq2dltg3214rec8gx.png" alt="A requirement for the application" width="800" height="716"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can view the full requirements &lt;a href="https://docs.google.com/document/d/1U6aH24MBhSfkSTpMTIhEoM3bmH3W7gyXZKKPn0UfnQE/edit?usp=sharing" rel="noopener noreferrer"&gt;here&lt;/a&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  Building a Conceptual Data Model
&lt;/h2&gt;

&lt;p&gt;With better-defined requirements, I began creating a Conceptual Data Model. This included identifying the key entities and how they relate to each other. An example of this for the above requirement would be: &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Entities:&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Listing&lt;/strong&gt; – with attributes like Location, Title, Description, Property Type (Residential or Commercial), Rental Type, Layout, Price, Availability, Preferred Tenant Type, Is Pet Allowed?, Is Smoking Allowed? &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amenity&lt;/strong&gt; – with attributes like Amenity’s Category, Name&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Relationship between Entities&lt;/strong&gt;&lt;br&gt;
Listings contains Amenities&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8p6mpxbge6b47bgd0mqk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8p6mpxbge6b47bgd0mqk.png" alt="Relationship between Listing and Amenity" width="680" height="191"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Conceptual Model served two main purposes for me – it defined the scope of the application, and it clarified the implementation of written requirements. I tried to clearly identify many-to-many relationships, like Listing and Amenity, which would introduce a bridge table between them to convert many-to-many relationships to multiple one-to-many relationships. I visualized these using pen and paper before moving to a tool.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5j5k7rk9dixiljtqnk7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5j5k7rk9dixiljtqnk7.png" alt="Turning Many-to-Many to One-to-Many" width="672" height="344"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Transitioning into Logical Data Model:
&lt;/h2&gt;

&lt;p&gt;Then, I translated my conceptual model into a Logical Data Model. A logical model is a conceptual model to which data normalization techniques have been applied and for which all of the attributes have been identified. For me, this meant adding more details to my entities, including attributes and their data types, relationships and cardinality, and normalization of tables.&lt;/p&gt;

&lt;p&gt;I found it much easier to accomplish this using MySQL workbench’s EER Diagram creator. It also helps visualize the tables and their relationships well.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6r945wkfoina0y5byee.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6r945wkfoina0y5byee.png" alt="MySQL Workbench" width="800" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Initially, I used ENUMs for controlled fields (e.g., rental type), but soon converted these into lookup tables to keep the database flexible and scalable for the future. I focused on normalizing the tables in this stage, applying the Third Normal Form. This Logical Data Modeling step completed with the creation of a complete schema that included 37 tables, properly defined keys and constraints, and logical implementation of lookup, transaction, and system tables. Lastly, I indexed all the foreign keys for performance optimization. &lt;/p&gt;

&lt;p&gt;The final &lt;a href="https://github.com/Samizen/RoomRental/blob/main/images/Rentals_PNG.png" rel="noopener noreferrer"&gt;ERD&lt;/a&gt; for the Logical Data model is displayed below. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fum2pi6navfliig617u50.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fum2pi6navfliig617u50.png" alt="Final ERD" width="800" height="553"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Implementation in SQL Server
&lt;/h2&gt;

&lt;p&gt;I implemented the final schema in MS SQL Server using DDL queries. While the initial modeling was easier with MySQL Workbench because of its UI-based modeling, my experience and comfort with MSSQL was the only reason I chose it. &lt;/p&gt;

&lt;p&gt;You can find the following in my &lt;a href="https://github.com/Samizen/RoomRental" rel="noopener noreferrer"&gt;Github Repository&lt;/a&gt;: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full DDL Script &lt;/li&gt;
&lt;li&gt;Dummy Data Insertion Script &lt;/li&gt;
&lt;li&gt;ERD&lt;/li&gt;
&lt;li&gt;Data Dictionary&lt;/li&gt;
&lt;li&gt;Requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I learned and Implemented
&lt;/h2&gt;

&lt;p&gt;Starting with vague requirements, I created a detailed, actionable set of specifications using design thinking with minimal direction. I improved my research skills, doing it with the intention of implementation. I was able to apply Conceptual and Logical Data Modeling techniques to a real-life scenario and normalize a database. &lt;/p&gt;

&lt;p&gt;I also got to implement these SQL concepts during the project: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;CREATE TABLE&lt;/code&gt;, &lt;code&gt;DROP TABLE IF EXISTS&lt;/code&gt; – to define and reset schema&lt;/li&gt;
&lt;li&gt;Use of various data types and their actual impact on a real database&lt;/li&gt;
&lt;li&gt;Primary and Foreign Keys – to define relationships and integrity&lt;/li&gt;
&lt;li&gt;Other constraints like &lt;code&gt;NOT NULL&lt;/code&gt;, &lt;code&gt;DEFAULT&lt;/code&gt;, &lt;code&gt;UNIQUE&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Implementing Relationships &lt;/li&gt;
&lt;li&gt;Lookup Tables – for flexible ENUM alternatives&lt;/li&gt;
&lt;li&gt;Bridge Tables – to resolve many-to-many relationships&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;IDENTITY&lt;/code&gt; – for auto-increment primary keys&lt;/li&gt;
&lt;li&gt;Indexing – applied on foreign keys for performance optimization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're learning SQL or starting in Data Engineering, I encourage you to use my schema as a playground. It’s full of real-world entities and dense enough to practice joins, aggregations, constraints, and more.&lt;/p&gt;

&lt;p&gt;I hope this post helps others see that getting started is more important than getting it perfect. If you've ever built something from vague specs, I’d love to hear how you approached it. Feel free to reach out or comment on &lt;a href="https://www.linkedin.com/in/samit-paudel-573a6b279/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;! &lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
