<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yaw Opoku Mensah Baffoe</title>
    <description>The latest articles on DEV Community by Yaw Opoku Mensah Baffoe (@yaw_baffoe).</description>
    <link>https://dev.to/yaw_baffoe</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3888866%2F30547046-daea-4d32-affb-57dee83c1bed.jpg</url>
      <title>DEV Community: Yaw Opoku Mensah Baffoe</title>
      <link>https://dev.to/yaw_baffoe</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yaw_baffoe"/>
    <language>en</language>
    <item>
      <title>Pre-Code Planning Stopped Me From Getting Stuck on a 3-Hour ETL Pipeline</title>
      <dc:creator>Yaw Opoku Mensah Baffoe</dc:creator>
      <pubDate>Sun, 31 May 2026 10:55:25 +0000</pubDate>
      <link>https://dev.to/yaw_baffoe/pre-code-planning-stopped-me-from-getting-stuck-on-a-3-hour-etl-pipeline-2922</link>
      <guid>https://dev.to/yaw_baffoe/pre-code-planning-stopped-me-from-getting-stuck-on-a-3-hour-etl-pipeline-2922</guid>
      <description>&lt;p&gt;I recently finished building a multi-channel data pipeline to solve a specific problem: consolidating fragmented monthly sales data from an in-store till, Uber Eats, and Deliveroo into a single master source for Looker Studio.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://datastudio.google.com/reporting/c54dd374-fe50-45c5-a88b-d4fd22aafdad" rel="noopener noreferrer"&gt;My Dashboard&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The technical work took me slightly under 3 hours to build. But the biggest takeaway for me wasn't the speed, it was the shift in how I approached the build.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with "Figuring it out as you go"
&lt;/h2&gt;

&lt;p&gt;On previous projects, my default habit was to open a blank IDE and just start typing. I’d try to figure out the data schemas and transformations while mid script.&lt;/p&gt;

&lt;p&gt;When I did that, I ended up hitting a wall. I’d confuse myself, lose track of the data flow, and get stuck in a loop of debugging things I hadn't fully defined yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Did Differently This Time
&lt;/h2&gt;

&lt;p&gt;Before touching a single line of Python for this dashboard, I forced myself to properly define the parameters of the project in a document:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The Exact Inputs: Mapping out the column mismatches between the three different CSV exports (Till vs. Uber vs. Deliveroo).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Exact Transformation Steps: Deciding how to handle commission deductions and currency formatting before writing the cleaning functions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Destination: Defining how the final master source needed to be structured for Looker Studio to read it cleanly.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;Because the blueprint was already written, the actual coding process was just pure execution. I stayed hyper-focused for the full 3 hours because there was zero guesswork involved.&lt;/p&gt;

&lt;p&gt;It showed me that my biggest bottleneck usually isn't my technical ability or my understanding of code; it's the framework I use to approach a problem. Spending time upfront to structure the data pipeline logically turned the coding phase into a straightforward task rather than a guessing game.&lt;/p&gt;

&lt;p&gt;For anyone else working through data projects or personal builds: if you find yourself constantly hitting mental blocks mid code, try closing the IDE and explicitly defining the data flow first. It saves a massive amount of cognitive energy.&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>data</category>
      <category>dataengineering</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Building Dashboard: A Beginner's Case Study with E‑commerce Data</title>
      <dc:creator>Yaw Opoku Mensah Baffoe</dc:creator>
      <pubDate>Mon, 20 Apr 2026 14:35:30 +0000</pubDate>
      <link>https://dev.to/yaw_baffoe/building-dashboard-a-beginners-case-study-with-e-commerce-data-dg8</link>
      <guid>https://dev.to/yaw_baffoe/building-dashboard-a-beginners-case-study-with-e-commerce-data-dg8</guid>
      <description>&lt;p&gt;I wanted to learn and grow as a data analyst by analysing a well known multi tabled dataset, to grow my portfolio and prove to myself and others my skills as a data analyst. I learned a lot along the way, here’s the step by step process I used, the tools involved, and the surprising insight I found.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting the data and defining the goal
&lt;/h2&gt;

&lt;p&gt;The O-list dataset from Kaggle is one of the first big datasets I've ever seen or used, but I never went in depth like I would for this. I decided to use it because it had lots of data with multiple tables, it was far beyond the scope of skills that I thought clients would expect from me. I picked overarching questions that I thought would push me to develop as a data analyst.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which product categories should Olist prioritize? &lt;/li&gt;
&lt;li&gt;When should marketing be scheduled?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cleaning and Organizing the Data with SQL
&lt;/h2&gt;

&lt;p&gt;I used PostgreSQL to clean and organise the dataset. I made tables and uploaded the correct schema into the server and I used pgAdmin to query my database. Throughout the project I used smaller scale queries and excel to validate my SQL logic. &lt;/p&gt;

&lt;h2&gt;
  
  
  Calculating key metrics
&lt;/h2&gt;

&lt;p&gt;I joined various tables depending on what question I was looking for. For example, when I wanted to know the top performing products, I joined the order_items, orders and products tables, and used aggregate functions when needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_revenue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_trns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_category_name_english&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;freight_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;order_items&lt;/span&gt; &lt;span class="n"&gt;oi&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;category_name_translation&lt;/span&gt; &lt;span class="n"&gt;n_trns&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_category_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;n_trns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_category_name&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;n_trns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_category_name_english&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;total_revenue&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;I calculated other key statistics such as the most popular hours of business, and the order funnel, showing where the business would lose customers in the ordering process.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'1. Placed'&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;stage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;order_purchase_timestamp&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'2. Approved'&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;stage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;order_approved_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'3. Shipped'&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;stage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;order_delivered_carrier_date&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'4. Delivered'&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;stage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;order_status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'delivered'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Building the dashboard
&lt;/h2&gt;

&lt;p&gt;The dashboard is organized into two pages: a high-level overview, and a deep dive into product performance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fldft398xhwncv1tzc1p2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fldft398xhwncv1tzc1p2.png" alt=" " width="800" height="486"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4um1zwxh8wegjtiy7dz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4um1zwxh8wegjtiy7dz.png" alt=" " width="800" height="535"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the top KPI cards show key metrics, like total revenue, and the week in which the most orders were made. The page also contains key graphs like the order funnel and the hour order count statistic.&lt;/p&gt;

&lt;p&gt;The next page showcases the statistics the items are generating for the business. I researched and made an equation for theoretical profit based on the profit that businesses in the same sector are likely to make, making it 25% of the revenue. I also showcase the top performing items and the percentage of revenue each item brings in for the business. I also showcase the overall growth of the business overtime.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Key Insight &amp;amp; Why It Matters
&lt;/h2&gt;

&lt;p&gt;The first page shows that only roughly 6% of customers return for repeat purchases, this was easily my most surprising find. Why? Because a healthy business will retain roughly 30% of customers, higher end-businesses will want to retain closer to 40%, but despite the massive growth and revenue, the business is failing to bring customers back onto their website or into their shops. This would showcase a massive retention opportunity for the business.&lt;/p&gt;
&lt;h2&gt;
  
  
  Conclusion and takeaways
&lt;/h2&gt;

&lt;p&gt;This project taught me that data analysis isn't just about writing SQL queries or building charts. It's about asking the right questions and translating numbers into business decisions.&lt;/p&gt;

&lt;p&gt;The O-list dataset gave me a sandbox to practice every step of the analyst workflow. Cleaning messy data, joining multiple tables, validating results, designing a dashboard, and most importantly, finding an insight that actually matters.&lt;/p&gt;

&lt;p&gt;That insight only 6% of customers return for a second purchase is not just a number. It's a signal. It tells the business that their marketing budget might be focused on acquiring new customers when the real opportunity is keeping the ones they already have.&lt;/p&gt;

&lt;p&gt;For me, this project was proof. Proof that I can take a raw dataset, turn it into something useful, and deliver a clear recommendation. That's what I want to do for e‑commerce brands: help them see what's hidden in their data so they can grow smarter.&lt;/p&gt;

&lt;p&gt;If you're an e‑commerce owner struggling to understand why customers don't come back, or a fellow data analyst just starting out, I'd love to connect. You can find my dashboard here and my portfolio on GitHub.&lt;/p&gt;

&lt;p&gt;Thanks for reading. Let me know what you think.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://datastudio.google.com/reporting/62dc59b6-4dfb-4a01-848e-e4ab55bcc328" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/http%3A%2F%2Fdatastudio.google.com%2Freporting%2F62dc59b6-4dfb-4a01-848e-e4ab55bcc328%2Fthumbnail%3Fsz%3Dfull%26feature%3Dopengraph" height="240" class="m-0" width="320"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://datastudio.google.com/reporting/62dc59b6-4dfb-4a01-848e-e4ab55bcc328" rel="noopener noreferrer" class="c-link"&gt;
            Olist Sales Analysis
          &lt;/a&gt;
        &lt;/h2&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
          datastudio.google.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;



</description>
      <category>analytics</category>
      <category>beginners</category>
      <category>data</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
