<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: muzzamilanis</title>
    <description>The latest articles on DEV Community by muzzamilanis (@muzzamilanis).</description>
    <link>https://dev.to/muzzamilanis</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3904600%2Ffb771dd8-0869-44df-8f35-3cff26962ea6.png</url>
      <title>DEV Community: muzzamilanis</title>
      <link>https://dev.to/muzzamilanis</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/muzzamilanis"/>
    <language>en</language>
    <item>
      <title>Building a PSX Data Pipeline: How I get introduced to dbt</title>
      <dc:creator>muzzamilanis</dc:creator>
      <pubDate>Wed, 29 Apr 2026 16:22:16 +0000</pubDate>
      <link>https://dev.to/muzzamilanis/building-a-psx-data-pipeline-how-i-get-introduced-to-dbt-2l22</link>
      <guid>https://dev.to/muzzamilanis/building-a-psx-data-pipeline-how-i-get-introduced-to-dbt-2l22</guid>
      <description>&lt;p&gt;I've been in software development since 2017. Started as a .NET developer, transitioned into SDE-III, and for the past 4 years working as data engineer. Eight years in tech means I'm not new to building things but data engineering was a different world for me. More of a sql related task but different in its own uniqueness. Like a different world in itself.&lt;/p&gt;

&lt;p&gt;I knew dbt existed. I'd seen it in job descriptions. But I had no real idea what it actually did or why companies were treating it like a big deal. My mental model was basically: "May be a new tool like most other available in market."&lt;/p&gt;

&lt;p&gt;Spoiler: I was wrong. Here's what I built and what changed my mind.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem I Was Trying to Solve
&lt;/h2&gt;

&lt;p&gt;I wanted a real data engineering project I could point to. Not a tutorial I followed on YouTube. Not a dataset from Kaggle. Something with actual data, actual decisions, and actual engineering challenges.&lt;/p&gt;

&lt;p&gt;I'm based in Pakistan and the Pakistan Stock Exchange (PSX) had no decent open data tooling. 285 shariah complaint listed stocks, updated daily, and nothing clean to work with. That felt like a real problem worth solving.&lt;/p&gt;

&lt;p&gt;So I built a pipeline from scratch.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python&lt;/strong&gt; — scraping daily snapshots from PSX&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL on Neon&lt;/strong&gt; — cloud database, free tier&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;dbt-core&lt;/strong&gt; — transformations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windows Task Scheduler&lt;/strong&gt; — automation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nothing exotic. Deliberately. I wanted to prove the architecture matters more than the tools.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture: Medallion
&lt;/h2&gt;

&lt;p&gt;If you've read anything about modern data engineering you've seen this pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Bronze → Silver → Gold
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Raw data → Cleaned data → Analytics-ready data&lt;/p&gt;

&lt;p&gt;Here's what that looks like in practice for this project:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bronze&lt;/strong&gt; — &lt;code&gt;PsxAllShr&lt;/code&gt; table in PostgreSQL. Raw scraper output. Every field is TEXT. No constraints except a primary key. Nothing gets rejected here. This is the immutable source of truth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Silver&lt;/strong&gt; — &lt;code&gt;stg_psx_daily_snapshot&lt;/code&gt; dbt view. This is where the actual work happens: strip commas from numbers like &lt;code&gt;15,616&lt;/code&gt;, strip &lt;code&gt;%&lt;/code&gt; signs from change percentages, cast everything to proper numeric types, deduplicate in case the scraper runs twice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gold&lt;/strong&gt; — Two mart tables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;mart_top_movers&lt;/code&gt; — all 285 stocks ranked by daily change %&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mart_sector_summary&lt;/code&gt; — market cap, volume, average change grouped by sector per day&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Where dbt Changed My Thinking
&lt;/h2&gt;

&lt;p&gt;My background is SSIS. In SSIS, if you have 10 packages and one depends on another, you wire them manually with sequence containers, precedence constraints, all done by hand in a GUI.&lt;/p&gt;

&lt;p&gt;dbt replaces all of that with one function call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'stg_psx_daily_snapshot'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That single &lt;code&gt;ref()&lt;/code&gt; tells dbt: this model depends on &lt;code&gt;stg_psx_daily_snapshot&lt;/code&gt;. Run that first. If it fails, don't run this. Draw the lineage graph. Document the dependency. Generate the docs.&lt;/p&gt;

&lt;p&gt;All of that from one line.&lt;/p&gt;

&lt;p&gt;Coming from SSIS where I'd spend hours managing package execution order, this felt almost too simple. But that's the point. dbt isn't doing anything you couldn't do manually, it's just making it impossible to skip the discipline.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Part Nobody Talks About: Data Quality
&lt;/h2&gt;

&lt;p&gt;Before this project my answer to "how do you ensure data quality" would have been: "I check the data manually after the run."&lt;/p&gt;

&lt;p&gt;That's not good enough.&lt;/p&gt;

&lt;p&gt;dbt has built-in tests. Four lines of YAML:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;columns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;symbol&lt;/span&gt;
    &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;not_null&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;id&lt;/span&gt;
    &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;unique&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;not_null&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now every time the pipeline runs, dbt automatically checks: are there nulls in symbol? Are IDs unique? If any test fails, the run fails. Bad data never reaches the gold layer.&lt;/p&gt;

&lt;p&gt;When I ran &lt;code&gt;dbt test&lt;/code&gt; and saw this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;5 of 5 PASS .................. [PASS]
Completed successfully
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was the moment it clicked. This isn't "just SQL." This is SQL with the same discipline software engineers apply to code — testing, documentation, version control, dependency management.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Lineage Graph
&lt;/h2&gt;

&lt;p&gt;One command generates a full documentation site:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dbt docs generate
dbt docs serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is what you get:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6so3le5oac8ki718u9s4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6so3le5oac8ki718u9s4.png" alt="Lineage Graph"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;raw.PsxAllShr → stg_psx_daily_snapshot → mart_top_movers&lt;/code&gt;&lt;br&gt;&lt;br&gt;
&lt;code&gt;psx_sector_mapping → mart_sector_summary&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Anyone who clones this repo can run &lt;code&gt;dbt docs serve&lt;/code&gt; and immediately understand the entire pipeline without reading a single line of code. In SSIS you'd write a Word document for this and it would be outdated within a week.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Pipeline Actually Produces
&lt;/h2&gt;

&lt;p&gt;Top movers on any given day:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Symbol&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;Change %&lt;/th&gt;
&lt;th&gt;Volume&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;TRSM&lt;/td&gt;
&lt;td&gt;Trust Modaraba&lt;/td&gt;
&lt;td&gt;17.24&lt;/td&gt;
&lt;td&gt;+10.02%&lt;/td&gt;
&lt;td&gt;2,082,863&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MSCL&lt;/td&gt;
&lt;td&gt;Metropolitan Steel&lt;/td&gt;
&lt;td&gt;26.25&lt;/td&gt;
&lt;td&gt;+10.02%&lt;/td&gt;
&lt;td&gt;2,021,553&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FCEPL&lt;/td&gt;
&lt;td&gt;Frieslandcampina Engro&lt;/td&gt;
&lt;td&gt;85.87&lt;/td&gt;
&lt;td&gt;+10.01%&lt;/td&gt;
&lt;td&gt;1,842,353&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Multiple stocks hitting exactly 10% on the same day? That's PSX's upper circuit breaker triggering — a signal you'd miss if you were just eyeballing a spreadsheet.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The pipeline is running daily. Data is accumulating. Once I hit 30 days of history I'll add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;7-day and 30-day moving averages&lt;/li&gt;
&lt;li&gt;Relative strength by sector&lt;/li&gt;
&lt;li&gt;A live dashboard (Metabase or Superset)&lt;/li&gt;
&lt;li&gt;Prefect for proper orchestration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The GitHub repo is open if you want to look at the code or build something similar:&lt;br&gt;
🔗 &lt;a href="https://github.com/muzzamilanis/psx-data-pipeline" rel="noopener noreferrer"&gt;github.com/muzzamilanis/psx-data-pipeline&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  One Thing I'd Tell Anyone Transitioning into Data Engineering
&lt;/h2&gt;

&lt;p&gt;You don't need a certification to call yourself a data engineer. You need a project you built yourself, that solves a real problem, that you can explain and defend in an interview.&lt;/p&gt;

&lt;p&gt;Build something. Break it. Fix it. Document it. Ship it.&lt;/p&gt;

&lt;p&gt;That's the portfolio.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>python</category>
      <category>dbt</category>
      <category>postgres</category>
    </item>
  </channel>
</rss>
