<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Phylis Jepchumba, MSc</title>
    <description>The latest articles on DEV Community by Phylis Jepchumba, MSc (@phylis).</description>
    <link>https://dev.to/phylis</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F671287%2F415e9ff5-6626-4176-a3a8-35873136b50f.jpg</url>
      <title>DEV Community: Phylis Jepchumba, MSc</title>
      <link>https://dev.to/phylis</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/phylis"/>
    <language>en</language>
    <item>
      <title>Power BI Desktop vs Power BI Service vs Mobile — Which One Do You Actually Need?</title>
      <dc:creator>Phylis Jepchumba, MSc</dc:creator>
      <pubDate>Mon, 15 Jun 2026 08:59:56 +0000</pubDate>
      <link>https://dev.to/phylis/power-bi-desktop-vs-power-bi-service-vs-mobile-which-one-do-you-actually-need-1o90</link>
      <guid>https://dev.to/phylis/power-bi-desktop-vs-power-bi-service-vs-mobile-which-one-do-you-actually-need-1o90</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Series:&lt;/strong&gt; 30 Days of Power BI | &lt;strong&gt;Day 2 of 30&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Level:&lt;/strong&gt; Beginner | &lt;strong&gt;Read time:&lt;/strong&gt; ~7 minutes&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Three products. One ecosystem. Let's clear the confusion.
&lt;/h2&gt;

&lt;p&gt;If you've visited the &lt;a href="https://powerbi.microsoft.com/" rel="noopener noreferrer"&gt;Power BI website&lt;/a&gt;, you've probably noticed that "Power BI" isn't just one thing. There's a Desktop app, a web service, a mobile app — and if you're new, it's easy to wonder: &lt;em&gt;which one do I actually need?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Today we answer that question clearly, so that when we install Power BI in Day 3, you know exactly what you're installing and why.&lt;/p&gt;




&lt;h2&gt;
  
  
  The quick answer
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;th&gt;What you use it for&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Power BI Desktop&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Building and designing reports&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Power BI Service&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Publishing, sharing, and collaborating&lt;/td&gt;
&lt;td&gt;Free (limited) / Paid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Power BI Mobile&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Viewing reports on your phone or tablet&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Think of it like Microsoft Word and OneDrive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Desktop&lt;/strong&gt; = Word (where you do the work)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Service&lt;/strong&gt; = OneDrive/SharePoint (where you store and share it)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mobile&lt;/strong&gt; = reading a document on your phone&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most of the work in this series happens in &lt;strong&gt;Power BI Desktop&lt;/strong&gt;. The other two come into play once you're ready to share your work.&lt;/p&gt;




&lt;h2&gt;
  
  
  Power BI Desktop — the workhorse
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://powerbi.microsoft.com/en-us/desktop/" rel="noopener noreferrer"&gt;Power BI Desktop&lt;/a&gt; is a &lt;strong&gt;free Windows application&lt;/strong&gt; you download and install on your computer. It's where you will spend 90% of your time as you learn.&lt;/p&gt;

&lt;h3&gt;
  
  
  What you can do in Desktop:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Connect to data sources (Excel, SQL, web, APIs)&lt;/li&gt;
&lt;li&gt;Clean and transform data using Power Query&lt;/li&gt;
&lt;li&gt;Build relationships between tables&lt;/li&gt;
&lt;li&gt;Write DAX formulas and create measures&lt;/li&gt;
&lt;li&gt;Design reports with charts, tables, maps, and cards&lt;/li&gt;
&lt;li&gt;Format and theme your visuals&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key things to know:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;Completely free&lt;/strong&gt; — no account required to use it&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Works offline&lt;/strong&gt; — no internet connection needed to build reports&lt;/li&gt;
&lt;li&gt;⚠️ &lt;strong&gt;Windows only&lt;/strong&gt; — there is no native macOS version. Mac users can access Power BI through the browser-based &lt;a href="https://app.powerbi.com/" rel="noopener noreferrer"&gt;Power BI Service&lt;/a&gt; or run Windows via a virtual machine&lt;/li&gt;
&lt;li&gt;⚠️ &lt;strong&gt;Not for sharing&lt;/strong&gt; — reports built in Desktop live on your machine. To share with others, you need the Service&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;📥 &lt;strong&gt;Download link:&lt;/strong&gt; &lt;a href="https://apps.microsoft.com/store/detail/power-bi-desktop/9NTXR16HNW1T" rel="noopener noreferrer"&gt;Power BI Desktop — Microsoft Store&lt;/a&gt; or directly from &lt;a href="https://powerbi.microsoft.com/en-us/desktop/" rel="noopener noreferrer"&gt;powerbi.microsoft.com&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Power BI Service — the collaboration hub
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://app.powerbi.com/" rel="noopener noreferrer"&gt;Power BI Service&lt;/a&gt; is the &lt;strong&gt;browser-based platform&lt;/strong&gt; where you publish, share, and manage your reports. You access it at &lt;code&gt;app.powerbi.com&lt;/code&gt; — no installation required.&lt;/p&gt;

&lt;h3&gt;
  
  
  What you can do in the Service:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Publish reports from Desktop to the web&lt;/li&gt;
&lt;li&gt;Share reports with colleagues and stakeholders&lt;/li&gt;
&lt;li&gt;Set up &lt;strong&gt;scheduled data refresh&lt;/strong&gt; (your report updates automatically)&lt;/li&gt;
&lt;li&gt;Create &lt;strong&gt;dashboards&lt;/strong&gt; by pinning visuals from multiple reports&lt;/li&gt;
&lt;li&gt;Manage &lt;strong&gt;workspaces&lt;/strong&gt; for team collaboration&lt;/li&gt;
&lt;li&gt;Embed reports into SharePoint, Teams, or websites&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://learn.microsoft.com/en-us/power-bi/create-reports/copilot-introduction" rel="noopener noreferrer"&gt;Power BI Copilot&lt;/a&gt; (AI features)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Free vs paid tiers:
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Free&lt;/th&gt;
&lt;th&gt;Pro (paid)&lt;/th&gt;
&lt;th&gt;Premium&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Publish reports&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Share with others&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Collaborate in workspaces&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scheduled refresh&lt;/td&gt;
&lt;td&gt;8x/day&lt;/td&gt;
&lt;td&gt;8x/day&lt;/td&gt;
&lt;td&gt;48x/day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Paginated reports&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;The &lt;strong&gt;free tier&lt;/strong&gt; is perfect for learning. You can publish and view your own reports without paying anything. You only need Pro when you want to share reports with other people.&lt;/p&gt;

&lt;p&gt;💡 &lt;strong&gt;Students and educators:&lt;/strong&gt; Microsoft offers &lt;a href="https://www.microsoft.com/en-us/education/products/office" rel="noopener noreferrer"&gt;free Power BI Pro through Microsoft 365 Education&lt;/a&gt; at many universities. Check if your institution qualifies.&lt;/p&gt;

&lt;p&gt;📖 Full pricing breakdown: &lt;a href="https://powerbi.microsoft.com/en-us/pricing/" rel="noopener noreferrer"&gt;Power BI pricing page&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Power BI Mobile — reports on the go
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://powerbi.microsoft.com/en-us/mobile/" rel="noopener noreferrer"&gt;Power BI Mobile&lt;/a&gt; is a free app available on &lt;a href="https://apps.apple.com/app/microsoft-power-bi/id929738808" rel="noopener noreferrer"&gt;iOS&lt;/a&gt; and &lt;a href="https://play.google.com/store/apps/details?id=com.microsoft.powerbim" rel="noopener noreferrer"&gt;Android&lt;/a&gt;. It connects to your Power BI Service account and lets you view reports and dashboards from your phone or tablet.&lt;/p&gt;

&lt;h3&gt;
  
  
  What it's good for:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Viewing and exploring published reports on a phone&lt;/li&gt;
&lt;li&gt;Receiving data alerts and notifications&lt;/li&gt;
&lt;li&gt;Presenting dashboards in meetings from a tablet&lt;/li&gt;
&lt;li&gt;Annotating reports and sharing snapshots&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What it's NOT good for:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Building or editing reports (not possible on mobile)&lt;/li&gt;
&lt;li&gt;Replacing Desktop or Service for any analytical work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For this series, &lt;strong&gt;you don't need the mobile app at all&lt;/strong&gt;. It's a viewer — useful once you're building things worth showing to others.&lt;/p&gt;




&lt;h2&gt;
  
  
  How the three work together
&lt;/h2&gt;

&lt;p&gt;Here's a typical real-world workflow showing how the three products connect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Your data source]
      ↓
[Power BI Desktop]        ← You build the report here
      ↓  (Publish)
[Power BI Service]        ← Your team views it here
      ↓  (Sync)
[Power BI Mobile]         ← Your manager checks it on their phone
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A real-world example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You're an analyst at an NGO. Every month, you pull a donor data CSV, clean it in Power BI Desktop, build a report showing donations by region and campaign, and publish it to the Service. Your programme director gets an email notification that the report is refreshed and opens it on their phone using the mobile app — no spreadsheet attachment needed.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What about Power BI Embedded and Report Server?
&lt;/h2&gt;

&lt;p&gt;You may come across two more terms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://azure.microsoft.com/en-us/products/power-bi-embedded" rel="noopener noreferrer"&gt;Power BI Embedded&lt;/a&gt;&lt;/strong&gt; — for developers who want to embed Power BI visuals inside their own applications (websites, internal tools). This is an Azure product aimed at developers, not analysts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://powerbi.microsoft.com/en-us/report-server/" rel="noopener noreferrer"&gt;Power BI Report Server&lt;/a&gt;&lt;/strong&gt; — an on-premise version of the Service for organisations that can't store data in the cloud. Think banks, governments, or hospitals with strict data residency policies.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You won't need either of these as a beginner. They're worth knowing exist, but they're not part of this series.&lt;/p&gt;




&lt;h2&gt;
  
  
  What about Microsoft Fabric?
&lt;/h2&gt;

&lt;p&gt;If you've been Googling Power BI recently, you've probably seen &lt;a href="https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview" rel="noopener noreferrer"&gt;Microsoft Fabric&lt;/a&gt; mentioned. Fabric is Microsoft's unified analytics platform that &lt;em&gt;includes&lt;/em&gt; Power BI alongside tools for data engineering, data warehousing, and real-time analytics.&lt;/p&gt;

&lt;p&gt;Think of Fabric as the bigger house that Power BI now lives in. Everything you learn about Power BI in this series is fully applicable inside Fabric. If you're just starting out, focus on Power BI — you'll naturally encounter Fabric as you grow.&lt;/p&gt;




&lt;h2&gt;
  
  
  So, which one do you actually need right now?
&lt;/h2&gt;

&lt;p&gt;For this series: &lt;strong&gt;Power BI Desktop only.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Download Power BI Desktop (we do this in Day 3)&lt;/li&gt;
&lt;li&gt;✅ Create a free Power BI Service account at &lt;a href="https://app.powerbi.com/" rel="noopener noreferrer"&gt;app.powerbi.com&lt;/a&gt; (you'll need a work or school email — personal Gmail/Hotmail accounts aren't accepted for the free tier)&lt;/li&gt;
&lt;li&gt;⏭️ Mobile app — skip it for now&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;No work or school email?&lt;/strong&gt; You can sign up for a &lt;a href="https://developer.microsoft.com/en-us/microsoft-365/dev-program" rel="noopener noreferrer"&gt;free Microsoft 365 Developer account&lt;/a&gt; which gives you a work email address and access to Power BI Pro for 90 days — perfect for learning.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Key takeaways from Day 2
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;Desktop&lt;/strong&gt; is where you build reports — free, Windows-only, works offline&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Service&lt;/strong&gt; is where you share and collaborate — browser-based, free tier available&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Mobile&lt;/strong&gt; is for viewing reports on your phone — not needed for building&lt;/li&gt;
&lt;li&gt;✅ The three products work together as a pipeline: build → publish → view&lt;/li&gt;
&lt;li&gt;✅ For this series, Power BI Desktop + a free Service account is all you need&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Useful resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;📌 &lt;a href="https://powerbi.microsoft.com/en-us/desktop/" rel="noopener noreferrer"&gt;Download Power BI Desktop&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📌 &lt;a href="https://app.powerbi.com/" rel="noopener noreferrer"&gt;Sign up for Power BI Service (free)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📌 &lt;a href="https://powerbi.microsoft.com/en-us/pricing/" rel="noopener noreferrer"&gt;Power BI pricing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📌 &lt;a href="https://apps.apple.com/app/microsoft-power-bi/id929738808" rel="noopener noreferrer"&gt;Power BI Mobile — iOS&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📌 &lt;a href="https://play.google.com/store/apps/details?id=com.microsoft.powerbim" rel="noopener noreferrer"&gt;Power BI Mobile — Android&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📌 &lt;a href="https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview" rel="noopener noreferrer"&gt;Microsoft Fabric overview&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📌 &lt;a href="https://developer.microsoft.com/en-us/microsoft-365/dev-program" rel="noopener noreferrer"&gt;Microsoft 365 Developer Program (free work email)&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Up next
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Day 3: Installing Power BI Desktop in 5 minutes — a step-by-step setup guide&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We're going hands-on. By the end of Day 3 you'll have Power BI Desktop installed, your Service account ready, and the sample data loaded — all set to build your first report.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this helpful? Drop a ❤️ and follow along — a new article every day for 30 days. Questions or stuck somewhere? Drop a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>powerplatform</category>
      <category>datavisualization</category>
      <category>beginners</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>#powerbi</title>
      <dc:creator>Phylis Jepchumba, MSc</dc:creator>
      <pubDate>Wed, 10 Jun 2026 08:35:38 +0000</pubDate>
      <link>https://dev.to/phylis/powerbi-3p3n</link>
      <guid>https://dev.to/phylis/powerbi-3p3n</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/phylis/what-is-power-bi-a-complete-beginners-guide-to-microsofts-data-platform-2ilf" class="crayons-story__hidden-navigation-link"&gt;What is Power BI? A Complete Beginner's Guide to Microsoft's Data Platform&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/phylis" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F671287%2F415e9ff5-6626-4176-a3a8-35873136b50f.jpg" alt="phylis profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/phylis" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Phylis Jepchumba, MSc
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Phylis Jepchumba, MSc
                
              
              &lt;div id="story-author-preview-content-3864020" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/phylis" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F671287%2F415e9ff5-6626-4176-a3a8-35873136b50f.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Phylis Jepchumba, MSc&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/phylis/what-is-power-bi-a-complete-beginners-guide-to-microsofts-data-platform-2ilf" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Jun 10&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/phylis/what-is-power-bi-a-complete-beginners-guide-to-microsofts-data-platform-2ilf" id="article-link-3864020"&gt;
          What is Power BI? A Complete Beginner's Guide to Microsoft's Data Platform
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/powerplatform"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;powerplatform&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/datavisualization"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;datavisualization&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/beginners"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;beginners&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/tutorial"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;tutorial&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/phylis/what-is-power-bi-a-complete-beginners-guide-to-microsofts-data-platform-2ilf" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/exploding-head-daceb38d627e6ae9b730f36a1e390fca556a4289d5a41abb2c35068ad3e2c4b5.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;6&lt;span class="hidden s:inline"&gt;&amp;nbsp;reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/phylis/what-is-power-bi-a-complete-beginners-guide-to-microsofts-data-platform-2ilf#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              &lt;span class="hidden s:inline"&gt;Add&amp;nbsp;Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            7 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>analytics</category>
      <category>beginners</category>
      <category>data</category>
      <category>microsoft</category>
    </item>
    <item>
      <title>What is Power BI? A Complete Beginner's Guide to Microsoft's Data Platform</title>
      <dc:creator>Phylis Jepchumba, MSc</dc:creator>
      <pubDate>Wed, 10 Jun 2026 08:22:32 +0000</pubDate>
      <link>https://dev.to/phylis/what-is-power-bi-a-complete-beginners-guide-to-microsofts-data-platform-2ilf</link>
      <guid>https://dev.to/phylis/what-is-power-bi-a-complete-beginners-guide-to-microsofts-data-platform-2ilf</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Series:&lt;/strong&gt; 30 Days of Power BI | &lt;strong&gt;Day 1 of 30&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Level:&lt;/strong&gt; Beginner | &lt;strong&gt;Read time:&lt;/strong&gt; ~8 minutes&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  You've heard the term. Now let's make it click.
&lt;/h2&gt;

&lt;p&gt;If you've spent any time around data, analytics, or business intelligence, you've probably heard someone mention Power BI. Maybe your manager asked for a "Power BI dashboard." Maybe you saw it on a job posting. Maybe you're just curious what it actually &lt;em&gt;does&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;By the end of this article, you'll know exactly what Power BI is, why so many organisations rely on it, and whether it's worth learning for your career. No jargon. No fluff.&lt;/p&gt;

&lt;p&gt;Let's get into it.&lt;/p&gt;




&lt;h2&gt;
  
  
  So, what exactly is Power BI?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://powerbi.microsoft.com/" rel="noopener noreferrer"&gt;Power BI&lt;/a&gt;&lt;/strong&gt; is a business intelligence and data visualisation tool built by Microsoft. It lets you connect to data — from Excel files, databases, cloud services, and more — and turn that data into interactive reports and dashboards.&lt;/p&gt;

&lt;p&gt;Think of it this way:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Raw data is like a pile of ingredients. Power BI is the kitchen that helps you cook it into something people can actually consume.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Instead of staring at rows and columns in a spreadsheet, your stakeholders get a clean, visual, interactive report they can explore with a few clicks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft8sbcbw61iuw429mh46q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft8sbcbw61iuw429mh46q.png" alt="Power BI Desktop Canvas"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why do organisations use Power BI?
&lt;/h2&gt;

&lt;p&gt;Here are the key reasons Power BI has become one of the most widely used BI tools in the world:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. It connects to almost everything
&lt;/h3&gt;

&lt;p&gt;Power BI can pull data from &lt;a href="https://learn.microsoft.com/en-us/power-bi/connect-data/power-bi-data-sources" rel="noopener noreferrer"&gt;hundreds of sources&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Excel and CSV files&lt;/li&gt;
&lt;li&gt;SQL Server, MySQL, PostgreSQL&lt;/li&gt;
&lt;li&gt;Cloud platforms (Azure, AWS, Google BigQuery)&lt;/li&gt;
&lt;li&gt;Web services (Salesforce, Google Analytics, SharePoint)&lt;/li&gt;
&lt;li&gt;APIs and web pages&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. It's part of the Microsoft ecosystem
&lt;/h3&gt;

&lt;p&gt;If your organisation already uses &lt;a href="https://www.microsoft.com/en-us/microsoft-365" rel="noopener noreferrer"&gt;Microsoft 365&lt;/a&gt; (Excel, Teams, SharePoint), Power BI fits right in. Data from Excel or Azure flows into Power BI naturally, and reports can be &lt;a href="https://learn.microsoft.com/en-us/power-bi/collaborate-share/service-embed-report-microsoft-teams" rel="noopener noreferrer"&gt;embedded directly in Teams&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. It's powerful but accessible
&lt;/h3&gt;

&lt;p&gt;You don't need to be a software engineer to use Power BI. Most tasks are drag-and-drop. But when you're ready to go deeper, Power BI has a full formula language called &lt;a href="https://learn.microsoft.com/en-us/dax/dax-overview" rel="noopener noreferrer"&gt;DAX (Data Analysis Expressions)&lt;/a&gt; and supports &lt;a href="https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-python-scripts" rel="noopener noreferrer"&gt;Python&lt;/a&gt; and &lt;a href="https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-r-scripts" rel="noopener noreferrer"&gt;R scripts&lt;/a&gt; for advanced analytics.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. It's affordable
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://powerbi.microsoft.com/en-us/desktop/" rel="noopener noreferrer"&gt;Power BI Desktop&lt;/a&gt; — the main tool for building reports — is &lt;strong&gt;completely free to download and use&lt;/strong&gt;. The paid tiers (&lt;a href="https://powerbi.microsoft.com/en-us/pricing/" rel="noopener noreferrer"&gt;Pro and Premium&lt;/a&gt;) unlock collaboration and enterprise features, but you can learn everything without spending a cent.&lt;/p&gt;




&lt;h2&gt;
  
  
  The three faces of Power BI
&lt;/h2&gt;

&lt;p&gt;Power BI isn't just one product. It's a family of tools that work together:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What it is&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://powerbi.microsoft.com/en-us/desktop/" rel="noopener noreferrer"&gt;Power BI Desktop&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Windows app for building reports&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://app.powerbi.com/" rel="noopener noreferrer"&gt;Power BI Service&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Web platform for publishing and sharing reports&lt;/td&gt;
&lt;td&gt;Free (limited) / Paid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://powerbi.microsoft.com/en-us/mobile/" rel="noopener noreferrer"&gt;Power BI Mobile&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;iOS and Android app to view reports on the go&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For this series, we'll spend most of our time in &lt;strong&gt;Power BI Desktop&lt;/strong&gt; — it's where all the building happens.&lt;/p&gt;




&lt;h2&gt;
  
  
  What can you actually build with it?
&lt;/h2&gt;

&lt;p&gt;Here are some real-world examples of what people build in Power BI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sales dashboards&lt;/strong&gt; — track revenue, targets, and regional performance at a glance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HR analytics&lt;/strong&gt; — monitor headcount, attrition rates, and hiring pipelines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Financial reports&lt;/strong&gt; — profit and loss summaries, budget vs actuals&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Marketing dashboards&lt;/strong&gt; — campaign performance, website traffic, lead conversions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operations reports&lt;/strong&gt; — supply chain tracking, inventory levels, delivery timelines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Student/academic dashboards&lt;/strong&gt; — enrolment trends, performance by cohort, survey results&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The underlying skills are the same across all of these. Once you learn Power BI, you can apply it to virtually any domain.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Tip:&lt;/strong&gt; Microsoft offers &lt;a href="https://learn.microsoft.com/en-us/power-bi/create-reports/sample-datasets" rel="noopener noreferrer"&gt;free sample datasets and reports&lt;/a&gt; you can download and explore right now — no setup required.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  How does Power BI actually work?
&lt;/h2&gt;

&lt;p&gt;At a high level, building a report in Power BI follows three stages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ Connect to data ]  →  [ Transform &amp;amp; model ]  →  [ Visualise &amp;amp; share ]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Stage 1: Connect to data
&lt;/h3&gt;

&lt;p&gt;You point Power BI at your data source — a file on your computer, a database, or a cloud service. Power BI pulls the data in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 2: Transform and model
&lt;/h3&gt;

&lt;p&gt;This is where you clean and shape the data. Remove blank rows. Rename confusing columns. Combine tables. Define relationships between datasets. Power BI has a built-in tool called &lt;strong&gt;&lt;a href="https://learn.microsoft.com/en-us/power-query/power-query-what-is-power-query" rel="noopener noreferrer"&gt;Power Query&lt;/a&gt;&lt;/strong&gt; that handles this without writing code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 3: Visualise and share
&lt;/h3&gt;

&lt;p&gt;Now you build the actual report — charts, tables, maps, KPI cards. You add filters and slicers so viewers can explore the data themselves. Then you publish it to &lt;a href="https://app.powerbi.com/" rel="noopener noreferrer"&gt;Power BI Service&lt;/a&gt; and share it with your team.&lt;/p&gt;




&lt;h2&gt;
  
  
  Power BI vs Excel — aren't they the same thing?
&lt;/h2&gt;

&lt;p&gt;This is one of the most common questions beginners ask. Short answer: &lt;strong&gt;no, but they complement each other well&lt;/strong&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Excel&lt;/th&gt;
&lt;th&gt;Power BI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Calculations, ad-hoc analysis, small datasets&lt;/td&gt;
&lt;td&gt;Interactive dashboards, large datasets, sharing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data volume&lt;/td&gt;
&lt;td&gt;Struggles above ~1 million rows&lt;/td&gt;
&lt;td&gt;Handles tens of millions of rows easily&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visualisation&lt;/td&gt;
&lt;td&gt;Basic charts&lt;/td&gt;
&lt;td&gt;Rich, interactive, drill-through visuals&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Collaboration&lt;/td&gt;
&lt;td&gt;Email attachments&lt;/td&gt;
&lt;td&gt;Live, always-updated shared reports&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Learning curve&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Low to start, deeper as you grow&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Many professionals use &lt;strong&gt;both&lt;/strong&gt; — Excel for quick calculations and Power BI for reporting and dashboards. Knowing both is a strong combination on a CV.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 Microsoft has a great official comparison here: &lt;a href="https://learn.microsoft.com/en-us/power-bi/connect-data/service-excel-workbook-files" rel="noopener noreferrer"&gt;Power BI and Excel — better together&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Is Power BI worth learning in 2026?
&lt;/h2&gt;

&lt;p&gt;The data speaks for itself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Power BI is consistently ranked among the &lt;strong&gt;top BI tools in the world&lt;/strong&gt; by &lt;a href="https://powerbi.microsoft.com/en-us/blog/microsoft-named-a-leader-in-the-2024-gartner-magic-quadrant-for-analytics-and-bi-platforms/" rel="noopener noreferrer"&gt;Gartner's Magic Quadrant for Analytics and BI Platforms&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;It is one of the &lt;strong&gt;most in-demand data skills&lt;/strong&gt; on job boards across Africa, Europe, and North America&lt;/li&gt;
&lt;li&gt;Microsoft has invested heavily in adding &lt;strong&gt;&lt;a href="https://learn.microsoft.com/en-us/power-bi/create-reports/copilot-introduction" rel="noopener noreferrer"&gt;AI and Copilot features&lt;/a&gt;&lt;/strong&gt; to Power BI, making it even more relevant going forward&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;&lt;a href="https://learn.microsoft.com/en-us/credentials/certifications/power-bi-data-analyst-associate/" rel="noopener noreferrer"&gt;PL-300 certification&lt;/a&gt;&lt;/strong&gt; (Microsoft Power BI Data Analyst) is widely recognised and can significantly boost your employability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Whether you're a business analyst, data analyst, product manager, HR professional, or someone pivoting into data — Power BI is one of the highest-ROI skills you can pick up right now.&lt;/p&gt;




&lt;h2&gt;
  
  
  What you'll need to follow this series
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;✅ A Windows PC (Power BI Desktop is Windows-only; Mac users can use a &lt;a href="https://learn.microsoft.com/en-us/power-bi/fundamentals/desktop-get-the-desktop" rel="noopener noreferrer"&gt;virtual machine&lt;/a&gt; or the browser-based &lt;a href="https://app.powerbi.com/" rel="noopener noreferrer"&gt;Power BI Service&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;✅ Power BI Desktop installed — we'll cover this step by step in &lt;strong&gt;Day 3&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;✅ Curiosity and consistency — that's genuinely all it takes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No prior experience with data, coding, or analytics is required. We start from zero.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's coming up in this series
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Days&lt;/th&gt;
&lt;th&gt;Focus&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Days 1–7&lt;/td&gt;
&lt;td&gt;Getting started — installation, interface tour, connecting data, first report&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Days 8–14&lt;/td&gt;
&lt;td&gt;Power Query and data modeling — cleaning, combining, structuring data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Days 15–21&lt;/td&gt;
&lt;td&gt;DAX and visualisations — formulas, interactivity, and chart design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Days 22–28&lt;/td&gt;
&lt;td&gt;Advanced features — dashboards, security, scheduling, AI tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Days 29–30&lt;/td&gt;
&lt;td&gt;Real-world projects and career tips&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every article includes practical steps you can follow along. By the end of 30 days, you'll have a genuine skill and a portfolio piece to show for it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key takeaways from Day 1
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;✅ Power BI is Microsoft's BI platform for connecting data, building reports, and sharing insights&lt;/li&gt;
&lt;li&gt;✅ It's made up of three tools: &lt;a href="https://powerbi.microsoft.com/en-us/desktop/" rel="noopener noreferrer"&gt;Desktop&lt;/a&gt; (free), &lt;a href="https://app.powerbi.com/" rel="noopener noreferrer"&gt;Service&lt;/a&gt; (web), and &lt;a href="https://powerbi.microsoft.com/en-us/mobile/" rel="noopener noreferrer"&gt;Mobile&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;✅ The workflow is: &lt;strong&gt;connect → transform → visualise → share&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;✅ It's different from Excel — they complement rather than replace each other&lt;/li&gt;
&lt;li&gt;✅ It's one of the most in-demand data skills on the job market right now&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Useful resources to bookmark
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;📌 &lt;a href="https://learn.microsoft.com/en-us/power-bi/" rel="noopener noreferrer"&gt;Power BI official documentation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📌 &lt;a href="https://powerbi.microsoft.com/en-us/desktop/" rel="noopener noreferrer"&gt;Download Power BI Desktop&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📌 &lt;a href="https://community.fabric.microsoft.com/t5/Power-BI-forums/ct-p/powerbi" rel="noopener noreferrer"&gt;Power BI community forum&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📌 &lt;a href="https://www.youtube.com/@GuyInACube" rel="noopener noreferrer"&gt;Guy in a Cube (YouTube)&lt;/a&gt; — one of the best free Power BI channels&lt;/li&gt;
&lt;li&gt;📌 &lt;a href="https://www.sqlbi.com/" rel="noopener noreferrer"&gt;SQLBI (DAX resource)&lt;/a&gt; — for when you're ready to go deep on DAX&lt;/li&gt;
&lt;li&gt;📌 &lt;a href="https://learn.microsoft.com/en-us/training/powerplatform/power-bi" rel="noopener noreferrer"&gt;Microsoft Learn — Power BI learning paths&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📌 &lt;a href="https://learn.microsoft.com/en-us/credentials/certifications/power-bi-data-analyst-associate/" rel="noopener noreferrer"&gt;PL-300 exam overview&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Up next
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Day 2: Power BI Desktop vs Power BI Service vs Mobile — which one do you actually need?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We'll break down the three products in detail so you know exactly what you're working with before we install anything.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this helpful? Drop a ❤️ and follow the series — a new article posts every day for 30 days. Got a question? Leave it in the comments and I'll answer every one.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>powerplatform</category>
      <category>datavisualization</category>
      <category>beginners</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Data Engineer vs. Data Scientist: What's the Difference? (2026 Guide for Beginners)</title>
      <dc:creator>Phylis Jepchumba, MSc</dc:creator>
      <pubDate>Fri, 05 Jun 2026 10:16:55 +0000</pubDate>
      <link>https://dev.to/phylis/data-engineer-vs-data-scientist-whats-the-difference-2026-guide-for-beginners-46md</link>
      <guid>https://dev.to/phylis/data-engineer-vs-data-scientist-whats-the-difference-2026-guide-for-beginners-46md</guid>
      <description>&lt;p&gt;If you're exploring a career in data, you've probably seen both titles everywhere — job boards, LinkedIn, bootcamp brochures. They both work with data, often sit on the same team, and sometimes even share the same tech stack.&lt;/p&gt;

&lt;p&gt;So what's the actual difference?&lt;/p&gt;

&lt;p&gt;This guide breaks it down simply, so you can figure out which path fits your skills and interests.&lt;/p&gt;




&lt;h2&gt;
  
  
  The One-Line Version
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Data Engineer&lt;/strong&gt; → builds the systems that collect, store, and move data.&lt;br&gt;
&lt;strong&gt;Data Scientist&lt;/strong&gt; → analyzes data and builds models to find patterns and make predictions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Think of it like building a city vs. navigating it. Data engineers lay the roads and pipelines. Data scientists drive on them to find answers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Side-by-Side Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Data Engineer&lt;/th&gt;
&lt;th&gt;Data Scientist&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Primary Focus&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Infrastructure &amp;amp; pipelines&lt;/td&gt;
&lt;td&gt;Analysis &amp;amp; ML models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Core Skills&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SQL, Python, Spark, Kafka&lt;/td&gt;
&lt;td&gt;Python/R, statistics, ML&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Day-to-Day&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ETL, data warehouses, orchestration&lt;/td&gt;
&lt;td&gt;Experiments, model training, dashboards&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reliable, scalable data systems&lt;/td&gt;
&lt;td&gt;Insights, predictions, reports&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Key Tools&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;dbt, Snowflake, Airflow, Databricks&lt;/td&gt;
&lt;td&gt;Jupyter, scikit-learn, Tableau, PyTorch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Avg. US Salary (2026)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$130k – $165k&lt;/td&gt;
&lt;td&gt;$120k – $160k&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Works Closely With&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Data scientists, DevOps, Analysts&lt;/td&gt;
&lt;td&gt;Data engineers, business stakeholders&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What Does a Data Engineer Actually Do?
&lt;/h2&gt;

&lt;p&gt;A data engineer's job is to make sure data is &lt;strong&gt;available, clean, and accessible&lt;/strong&gt; for everyone who needs it — analysts, data scientists, and business teams.&lt;/p&gt;

&lt;p&gt;Their typical day includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Designing and building &lt;strong&gt;ETL/ELT pipelines&lt;/strong&gt; (Extract, Transform, Load)&lt;/li&gt;
&lt;li&gt;Managing &lt;strong&gt;data warehouses&lt;/strong&gt; like Snowflake, BigQuery, or Redshift&lt;/li&gt;
&lt;li&gt;Orchestrating workflows with tools like &lt;strong&gt;Apache Airflow&lt;/strong&gt; or Prefect&lt;/li&gt;
&lt;li&gt;Ensuring &lt;strong&gt;data quality, reliability, and freshness&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Optimizing queries and storage for performance and cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In 2026, data engineers are also increasingly expected to support AI/ML workloads — building feature stores, managing vector databases, and deploying real-time streaming pipelines with tools like Apache Flink or Kafka Streams.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Does a Data Scientist Actually Do?
&lt;/h2&gt;

&lt;p&gt;A data scientist turns raw data into &lt;strong&gt;actionable insights&lt;/strong&gt;. They use statistical methods and machine learning to answer complex business questions.&lt;/p&gt;

&lt;p&gt;Their typical day includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exploratory data analysis (EDA) to uncover patterns&lt;/li&gt;
&lt;li&gt;Building and evaluating &lt;strong&gt;machine learning models&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Running &lt;strong&gt;A/B tests&lt;/strong&gt; and statistical experiments&lt;/li&gt;
&lt;li&gt;Creating dashboards and data visualizations&lt;/li&gt;
&lt;li&gt;Translating findings into plain language for non-technical stakeholders&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In 2026, many data scientists are also working with &lt;strong&gt;LLMs and generative AI&lt;/strong&gt; — fine-tuning models, building RAG pipelines, and evaluating AI outputs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Skills Overlap
&lt;/h2&gt;

&lt;p&gt;Both roles share some common ground, but differ significantly in depth:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;Data Engineer&lt;/th&gt;
&lt;th&gt;Data Scientist&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Python&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Core&lt;/td&gt;
&lt;td&gt;✅ Core&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SQL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Advanced&lt;/td&gt;
&lt;td&gt;✅ Intermediate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Statistics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Basic awareness&lt;/td&gt;
&lt;td&gt;✅ Advanced&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Machine Learning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Helpful to know&lt;/td&gt;
&lt;td&gt;✅ Core skill&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Modeling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Core&lt;/td&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cloud Platforms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Core&lt;/td&gt;
&lt;td&gt;Useful&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Visualization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The biggest takeaway: &lt;strong&gt;Python and SQL are table stakes for both roles.&lt;/strong&gt; Where they diverge is in statistical depth (scientists) vs. systems design (engineers).&lt;/p&gt;




&lt;h2&gt;
  
  
  Which Role Is Right for You?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Choose Data Engineering if you…
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Enjoy building systems and infrastructure&lt;/li&gt;
&lt;li&gt;Have a background in software or backend development&lt;/li&gt;
&lt;li&gt;Like writing production-grade code with clear outputs&lt;/li&gt;
&lt;li&gt;Prefer reliability engineering over statistical experimentation&lt;/li&gt;
&lt;li&gt;Get satisfaction from things running smoothly at scale&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Choose Data Science if you…
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Love statistics, math, and finding patterns in messy data&lt;/li&gt;
&lt;li&gt;Enjoy experimentation and hypothesis-driven work&lt;/li&gt;
&lt;li&gt;Want to work closely with business teams on strategy&lt;/li&gt;
&lt;li&gt;Are excited by machine learning and AI&lt;/li&gt;
&lt;li&gt;Like telling stories through data and visualization&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Can You Do Both?
&lt;/h2&gt;

&lt;p&gt;Yes — and the &lt;strong&gt;hybrid data professional&lt;/strong&gt; is one of the fastest-growing archetypes in 2026. Titles like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ML Engineer&lt;/strong&gt; (builds the systems that serve ML models)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analytics Engineer&lt;/strong&gt; (sits between data engineering and analysis — think dbt-heavy work)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI/Data Platform Engineer&lt;/strong&gt; (builds infrastructure specifically for AI workloads)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...all sit at the intersection of both roles.&lt;/p&gt;

&lt;p&gt;If you're just starting out, pick one lane and go deep first. Most practitioners naturally branch out after 2–3 years of hands-on experience.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Neither role is more important than the other — they're &lt;strong&gt;complementary&lt;/strong&gt;. One builds the foundation, the other extracts the value. Both are in high demand, well-compensated, and at the forefront of how modern companies operate.&lt;/p&gt;

&lt;p&gt;The best way to choose? Ask yourself: do you get more excited about &lt;strong&gt;building reliable systems&lt;/strong&gt; (engineer) or &lt;strong&gt;discovering patterns and building models&lt;/strong&gt; (scientist)?&lt;/p&gt;

&lt;p&gt;Either answer leads to a great career.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this helpful? Drop a 🦄 or leave a comment — I'm writing a whole series on navigating data careers in 2026.&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Tags
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;dataengineering&lt;/code&gt; &lt;code&gt;datascience&lt;/code&gt; &lt;code&gt;career&lt;/code&gt; &lt;code&gt;beginners&lt;/code&gt; &lt;code&gt;data&lt;/code&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>dataengineering</category>
      <category>beginners</category>
      <category>career</category>
    </item>
    <item>
      <title>Understanding Underfitting and Overfitting: An Introduction</title>
      <dc:creator>Phylis Jepchumba, MSc</dc:creator>
      <pubDate>Fri, 05 Jun 2026 06:37:07 +0000</pubDate>
      <link>https://dev.to/phylis/understanding-underfitting-and-overfitting-an-introduction-1agh</link>
      <guid>https://dev.to/phylis/understanding-underfitting-and-overfitting-an-introduction-1agh</guid>
      <description>&lt;p&gt;Have you ever trained a model that performed beautifully on your training data but fell apart the moment it saw new data? Or perhaps you built something so simple it couldn't even learn the training data properly? These are the classic traps of &lt;strong&gt;overfitting&lt;/strong&gt; and &lt;strong&gt;underfitting&lt;/strong&gt; — and every machine learning practitioner runs into them.&lt;/p&gt;

&lt;p&gt;In this article, we'll cover what they are, how to detect them, how to fix them, and where the bias-variance tradeoff ties it all together — with real-world examples and code throughout.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Model Fitting?
&lt;/h2&gt;

&lt;p&gt;Model fitting is the process of training a predictive model on a dataset to find the optimal parameters that best capture the underlying patterns in the data.&lt;/p&gt;

&lt;p&gt;The goal is simple: the model should &lt;strong&gt;generalize well to unseen data&lt;/strong&gt; — not just memorize the training examples.&lt;/p&gt;

&lt;p&gt;There are three possible outcomes when fitting a model:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Outcome&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Good fit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Captures underlying patterns, generalizes well&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Underfitting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Too simple, misses patterns even in training data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Overfitting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Too complex, memorizes noise, fails on new data&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What is Underfitting?
&lt;/h2&gt;

&lt;p&gt;Underfitting occurs when a model is &lt;strong&gt;too simple&lt;/strong&gt; to capture the underlying patterns in the data. It performs poorly on both the training set and on new, unseen data.&lt;/p&gt;

&lt;p&gt;Think of it like this: imagine asking a child to predict house prices and they only use the rule &lt;em&gt;"all houses cost $100,000."&lt;/em&gt; That model ignores all relevant features (size, location, age) and will be wrong almost every time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Does Underfitting Occur?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model is too simple&lt;/strong&gt;: A linear model trying to fit a curved, nonlinear relationship&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Too few features&lt;/strong&gt;: Important variables are left out&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Too much regularization&lt;/strong&gt;: Penalizing complexity so heavily that the model can't learn anything meaningful&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Insufficient training&lt;/strong&gt;: The model hasn't been trained long enough&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Real-World Example
&lt;/h3&gt;

&lt;p&gt;Suppose you're predicting whether an email is spam. If you only use the feature &lt;em&gt;"email length"&lt;/em&gt; and ignore word content, sender, and links, your model will underfit — it simply doesn't have enough signal to make good predictions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Detecting Underfitting
&lt;/h3&gt;

&lt;p&gt;A model that underfits will show &lt;strong&gt;high error on both training and validation data&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mean_squared_error&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="c1"&gt;# Generate non-linear data
&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ravel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Underfit model: linear model on non-linear data
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LinearRegression&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;train_error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;test_error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Train MSE: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;train_error&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# High
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Test MSE:  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;test_error&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# Also high → underfitting
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How to Fix Underfitting
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Use a more complex model&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PolynomialFeatures&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.pipeline&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;make_pipeline&lt;/span&gt;

&lt;span class="c1"&gt;# Upgrade to polynomial regression
&lt;/span&gt;&lt;span class="n"&gt;poly_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;make_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;PolynomialFeatures&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;degree&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nc"&gt;LinearRegression&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;poly_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;train_error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;poly_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;test_error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;poly_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Train MSE: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;train_error&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Lower
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Test MSE:  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;test_error&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# Also lower → better fit
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Add more relevant features&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# Before: only one feature
&lt;/span&gt;&lt;span class="n"&gt;df_underfit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;email_length&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;

&lt;span class="c1"&gt;# After: add meaningful features
&lt;/span&gt;&lt;span class="n"&gt;df_better&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;email_length&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;num_links&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;contains_free&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sender_known&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Reduce regularization strength&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Ridge&lt;/span&gt;

&lt;span class="c1"&gt;# Too much regularization → underfitting
&lt;/span&gt;&lt;span class="n"&gt;model_overreg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Ridge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Reduced regularization → better balance
&lt;/span&gt;&lt;span class="n"&gt;model_balanced&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Ridge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What is Overfitting?
&lt;/h2&gt;

&lt;p&gt;Overfitting occurs when a model learns the training data &lt;strong&gt;too well&lt;/strong&gt; — including its noise and random fluctuations — rather than the true underlying pattern. It performs great on training data but poorly on new data.&lt;/p&gt;

&lt;p&gt;Think of a student who memorizes every answer in a practice exam word-for-word, but can't answer anything when the wording changes slightly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Does Overfitting Occur?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model is too complex&lt;/strong&gt;: Too many parameters relative to training data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Too little training data&lt;/strong&gt;: The model memorizes rather than generalizes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Noisy data&lt;/strong&gt;: Random patterns in the data get learned as if they're real&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training too long&lt;/strong&gt;: The model starts fitting noise over time&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Real-World Example
&lt;/h3&gt;

&lt;p&gt;You're building a fraud detection model. If your model memorizes every specific transaction in your training set (exact amounts, timestamps, merchant IDs), it will flag as fraud things it hasn't seen before — even legitimate transactions — while missing new fraud patterns it wasn't explicitly trained on.&lt;/p&gt;

&lt;h3&gt;
  
  
  Detecting Overfitting
&lt;/h3&gt;

&lt;p&gt;An overfit model shows &lt;strong&gt;low training error but high validation error&lt;/strong&gt; — a clear gap between the two.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.tree&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DecisionTreeClassifier&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;make_classification&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;accuracy_score&lt;/span&gt;

&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;make_classification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_samples&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_features&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Overfit model: very deep decision tree
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DecisionTreeClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# No limit = memorizes everything
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Train Accuracy: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Near 1.0
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Test Accuracy:  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# Much lower
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Plotting learning curves&lt;/strong&gt; is one of the best visual tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;learning_curve&lt;/span&gt;

&lt;span class="n"&gt;train_sizes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;train_scores&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;val_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;learning_curve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;DecisionTreeClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cv&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scoring&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;train_sizes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_sizes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;train_scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Training Accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_sizes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;val_scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Validation Accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Training Set Size&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Learning Curve — Detecting Overfitting&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# A large gap between the two lines = overfitting
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How to Fix Overfitting
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Use Cross-Validation&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cross_val_score&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DecisionTreeClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cross_val_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cv&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scoring&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CV Scores: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mean: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; ± &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;std&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Apply Regularization (L1 / L2)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Lasso&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Ridge&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LogisticRegression&lt;/span&gt;

&lt;span class="c1"&gt;# L1 (Lasso) — drives some feature weights to zero
&lt;/span&gt;&lt;span class="n"&gt;lasso&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Lasso&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# L2 (Ridge) — shrinks all weights, prevents large coefficients
&lt;/span&gt;&lt;span class="n"&gt;ridge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Ridge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Logistic Regression with L2 regularization
&lt;/span&gt;&lt;span class="n"&gt;lr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LogisticRegression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;penalty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;l2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Lower C = more regularization
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Limit Model Complexity&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Constrain tree depth instead of letting it grow freely
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.tree&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DecisionTreeClassifier&lt;/span&gt;

&lt;span class="n"&gt;good_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DecisionTreeClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;min_samples_leaf&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;good_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Train: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;good_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Test:  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;good_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Gap is now much smaller
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. Data Augmentation (Image Example)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorflow.keras.preprocessing.image&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ImageDataGenerator&lt;/span&gt;

&lt;span class="n"&gt;datagen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ImageDataGenerator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;rotation_range&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;width_shift_range&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;height_shift_range&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;horizontal_flip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;zoom_range&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.15&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Artificially increases training diversity, reducing overfitting
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;5. Dropout (Neural Networks)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tensorflow&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Sequential&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Dropout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;   &lt;span class="c1"&gt;# Drop 40% of neurons during training
&lt;/span&gt;    &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Dropout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sigmoid&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;6. Early Stopping&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;early_stop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;EarlyStopping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;monitor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;val_loss&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;patience&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;             &lt;span class="c1"&gt;# Stop if val_loss doesn't improve for 5 epochs
&lt;/span&gt;    &lt;span class="n"&gt;restore_best_weights&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
          &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;early_stop&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Bias-Variance Tradeoff
&lt;/h2&gt;

&lt;p&gt;To truly understand underfitting and overfitting, you need to understand the &lt;strong&gt;bias-variance tradeoff&lt;/strong&gt; — one of the most fundamental concepts in machine learning.&lt;/p&gt;

&lt;p&gt;The total prediction error of a model can be broken down as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Total Error = Bias² + Variance + Irreducible Noise
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Term&lt;/th&gt;
&lt;th&gt;What it means&lt;/th&gt;
&lt;th&gt;Connection&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bias&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Error from wrong assumptions; model misses patterns&lt;/td&gt;
&lt;td&gt;High bias → underfitting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Variance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sensitivity to fluctuations in training data&lt;/td&gt;
&lt;td&gt;High variance → overfitting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Irreducible noise&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Noise inherent in the data; can't be reduced&lt;/td&gt;
&lt;td&gt;Always present&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The Tradeoff in Practice
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Simple model  →  High Bias, Low Variance  →  Underfitting
Complex model →  Low Bias, High Variance  →  Overfitting
Optimal model →  Balanced Bias &amp;amp; Variance →  Good generalization
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PolynomialFeatures&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.pipeline&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;make_pipeline&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mean_squared_error&lt;/span&gt;

&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ravel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;

&lt;span class="n"&gt;degrees&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;train_errors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;degrees&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;make_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;PolynomialFeatures&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nc"&gt;LinearRegression&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;train_errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
    &lt;span class="n"&gt;test_errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;degrees&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;train_errors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Training Error (Bias↓)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;degrees&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_errors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Test Error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Model Complexity (Polynomial Degree)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Mean Squared Error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Bias-Variance Tradeoff&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# Sweet spot is where test error is lowest
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The goal is to find the &lt;strong&gt;sweet spot&lt;/strong&gt; — a model complex enough to capture real patterns but not so complex it learns the noise.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Reference: Underfitting vs Overfitting
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Underfitting&lt;/th&gt;
&lt;th&gt;Overfitting&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Also called&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High bias&lt;/td&gt;
&lt;td&gt;High variance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Training error&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Validation error&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Too simple&lt;/td&gt;
&lt;td&gt;Too complex&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fix&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;More complexity, more features&lt;/td&gt;
&lt;td&gt;Regularization, more data, dropout&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Getting model fitting right is at the heart of machine learning. The key takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Underfitting&lt;/strong&gt; = model too simple → increase complexity or add features&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overfitting&lt;/strong&gt; = model too complex → regularize, add data, or simplify&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bias-variance tradeoff&lt;/strong&gt; = the fundamental tension between the two&lt;/li&gt;
&lt;li&gt;Always evaluate on a &lt;strong&gt;held-out validation set&lt;/strong&gt; — training accuracy alone tells you nothing about generalization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The sweet spot between underfitting and overfitting is where the most useful, reliable models live. With the detection techniques and fixes in this article, you have everything you need to find it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you found this helpful, drop a ❤️ and feel free to share! Questions or ideas? Leave a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>💡 Curious about Power Apps, Power Automate, Power BI, and Power Pages? I just published a beginner-friendly guide on why Microsoft Power Platform is reshaping the future of work. Check it out 👉</title>
      <dc:creator>Phylis Jepchumba, MSc</dc:creator>
      <pubDate>Sat, 04 Oct 2025 13:05:43 +0000</pubDate>
      <link>https://dev.to/phylis/-45pd</link>
      <guid>https://dev.to/phylis/-45pd</guid>
      <description>&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/phylis/why-everyone-is-talking-about-microsoft-power-platform-4fjj" class="crayons-story__hidden-navigation-link"&gt;Why Everyone Is Talking About Microsoft Power Platform&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/phylis" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F671287%2F415e9ff5-6626-4176-a3a8-35873136b50f.jpg" alt="phylis profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/phylis" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Phylis Jepchumba, MSc
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Phylis Jepchumba, MSc
                
              
              &lt;div id="story-author-preview-content-2891771" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/phylis" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F671287%2F415e9ff5-6626-4176-a3a8-35873136b50f.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Phylis Jepchumba, MSc&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/phylis/why-everyone-is-talking-about-microsoft-power-platform-4fjj" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Oct 4 '25&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/phylis/why-everyone-is-talking-about-microsoft-power-platform-4fjj" id="article-link-2891771"&gt;
          Why Everyone Is Talking About Microsoft Power Platform
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/microsoft"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;microsoft&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/powerautomate"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;powerautomate&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/powerapps"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;powerapps&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/powerplatform"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;powerplatform&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/phylis/why-everyone-is-talking-about-microsoft-power-platform-4fjj" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/exploding-head-daceb38d627e6ae9b730f36a1e390fca556a4289d5a41abb2c35068ad3e2c4b5.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;6&lt;span class="hidden s:inline"&gt;&amp;nbsp;reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/phylis/why-everyone-is-talking-about-microsoft-power-platform-4fjj#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              &lt;span class="hidden s:inline"&gt;Add&amp;nbsp;Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            6 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;




</description>
      <category>microsoft</category>
      <category>powerautomate</category>
      <category>powerapps</category>
      <category>powerplatform</category>
    </item>
    <item>
      <title>Why Everyone Is Talking About Microsoft Power Platform</title>
      <dc:creator>Phylis Jepchumba, MSc</dc:creator>
      <pubDate>Sat, 04 Oct 2025 13:05:16 +0000</pubDate>
      <link>https://dev.to/phylis/why-everyone-is-talking-about-microsoft-power-platform-4fjj</link>
      <guid>https://dev.to/phylis/why-everyone-is-talking-about-microsoft-power-platform-4fjj</guid>
      <description>&lt;h2&gt;
  
  
  Introduction: The Buzz Around Digital Transformation
&lt;/h2&gt;

&lt;p&gt;The world of work is changing faster than ever. Businesses are under pressure to digitize processes, cut costs, and deliver results quickly. Yet, traditional software development—relying solely on professional developers and lengthy coding cycles—can’t always keep up. This gap has created a demand for low-code/no-code platforms, enabling anyone, not just IT experts, to build apps, automate workflows, and make data-driven decisions.&lt;/p&gt;

&lt;p&gt;Microsoft Power Platform has quickly become one of the most talked-about toolkits for organizations of all sizes, because it empowers both technical and non-technical users to create powerful business solutions with little to no coding.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is the Microsoft Power Platform?
&lt;/h2&gt;

&lt;p&gt;The Microsoft Power Platform is not just one tool—it’s an ecosystem of integrated applications and services designed to help people build apps, automate processes, analyze data, and create digital experiences. Below is a breakdown of its major components and their features, uses, and subcategories:&lt;/p&gt;

&lt;p&gt;A. &lt;a href="https://www.microsoft.com/en-us/power-platform/products/power-apps" rel="noopener noreferrer"&gt;Power Apps&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Power Apps is Microsoft’s low-code app development environment. It allows users to create applications for web and mobile devices with minimal coding. It is built to integrate data from Microsoft Dataverse, SharePoint, Dynamics 365, and hundreds of other data sources.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fam1jlnmhhw97g4nnluvc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fam1jlnmhhw97g4nnluvc.png" alt="Power Apps" width="800" height="213"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Key Types of Power Apps:&lt;/p&gt;

&lt;p&gt;Canvas Apps&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Features: Drag-and-drop interface, pixel-perfect design control, customizable user interface.&lt;/li&gt;
&lt;li&gt;Uses: Best for apps where design flexibility is key (e.g., employee feedback forms, inspection apps, event registration apps).&lt;/li&gt;
&lt;li&gt;Composed of: Controls (text boxes, buttons, galleries), data connectors, formulas (similar to Excel).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Model-Driven Apps&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Features: Data-first approach, automatically generates responsive layouts based on Dataverse.&lt;/li&gt;
&lt;li&gt;Uses: Great for business process apps like case management, service requests, or CRM extensions.&lt;/li&gt;
&lt;li&gt;Composed of: Dataverse tables, business logic, pre-built components.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Power Pages (formerly Power Apps Portals)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Features: Allows external-facing websites with secure login.&lt;/li&gt;
&lt;li&gt;Uses: Customer portals, partner onboarding, grant application systems.&lt;/li&gt;
&lt;li&gt;Composed of: Page templates, Dataverse integration, role-based access controls.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;B. &lt;a href="https://www.microsoft.com/en/power-platform/products/power-automate" rel="noopener noreferrer"&gt;Power Automate&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Power Automate is the workflow automation tool in the platform. It helps automate repetitive tasks and connect systems together.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fagu350ydzj1p89ztnjci.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fagu350ydzj1p89ztnjci.png" alt="Power Automate" width="800" height="331"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Key Types of Flows in Power Automate:&lt;/p&gt;

&lt;p&gt;Cloud Flows&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Features: Automates processes between online apps and services.&lt;/li&gt;
&lt;li&gt;Uses: Sending automatic emails, creating notifications in Teams, syncing files across OneDrive and SharePoint.&lt;/li&gt;
&lt;li&gt;Composed of: Triggers (start an action), actions (steps executed), and conditions (logic).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Desktop Flows (Robotic Process Automation – RPA)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Features: Automates tasks on desktop applications, mimics human clicks/inputs.&lt;/li&gt;
&lt;li&gt;Uses: Automating legacy systems with no APIs, extracting data from PDFs or spreadsheets.&lt;/li&gt;
&lt;li&gt;Composed of: Recorded actions, scripts, connectors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Process Mining&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Features: Analyzes existing business processes by capturing and visualizing real workflows.&lt;/li&gt;
&lt;li&gt;Uses: Identify bottlenecks, optimize operations, and discover where automation can save time.&lt;/li&gt;
&lt;li&gt;Composed of: Process maps, KPIs, dashboards.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;C. &lt;a href="https://www.microsoft.com/en-us/power-platform/products/power-bi" rel="noopener noreferrer"&gt;Power BI&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Power BI is the data visualization and analytics tool of the Power Platform. It transforms raw data into interactive insights.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fla1tf8ruej4aa272079x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fla1tf8ruej4aa272079x.png" alt="Power BI" width="800" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Core Components of Power BI:&lt;/p&gt;

&lt;p&gt;Dashboards &amp;amp; Reports&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Features: Real-time visualizations, customizable tiles, filters, and drill-downs.&lt;/li&gt;
&lt;li&gt;Uses: Executive dashboards, sales tracking, financial reporting.&lt;/li&gt;
&lt;li&gt;Composed of: Visual elements (charts, graphs, KPIs), data models, DAX formulas.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Dataflows &amp;amp; Datamarts&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Features: Prepares and stores cleaned datasets for reporting.&lt;/li&gt;
&lt;li&gt;Uses: Centralizing data sources, enabling self-service analytics across teams.&lt;/li&gt;
&lt;li&gt;Composed of: ETL (Extract, Transform, Load) pipelines, storage in Dataverse or Azure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI Visuals and Analytics&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Features: Built-in machine learning and predictive analytics.&lt;/li&gt;
&lt;li&gt;Uses: Forecasting sales trends, sentiment analysis, anomaly detection.&lt;/li&gt;
&lt;li&gt;Composed of: AI models, natural language queries (“Q&amp;amp;A”), and cognitive services integrations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;D. &lt;a href="https://www.microsoft.com/en-us/power-platform/products/power-pages" rel="noopener noreferrer"&gt;Power Pages&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Power Pages is Microsoft’s low-code web development platform, designed for creating secure business websites.&lt;/p&gt;

&lt;p&gt;Features: Easy-to-use templates, responsive design, enterprise-grade security.&lt;/p&gt;

&lt;p&gt;Uses: Customer self-service portals, partner collaboration platforms, vendor onboarding portals.&lt;/p&gt;

&lt;p&gt;Composed of: Page designer, Dataverse integration, authentication and security layers (Azure AD, Microsoft Entra, etc.).&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It Matters: The Key Benefits
&lt;/h2&gt;

&lt;p&gt;So why is the Power Platform making so much noise in the business world? Here are the main reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Accessibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anyone can build—whether it’s an HR officer creating a leave-tracking app, or a finance manager designing a budget approval flow. This democratizes innovation and reduces IT bottlenecks.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Speed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Traditional app development can take months; Power Platform allows prototypes and production apps to be built in days or even hours.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It connects smoothly with tools businesses already use—Outlook, Teams, SharePoint, Excel, and countless others. Data flows seamlessly across systems.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Cost-effectiveness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of paying for expensive custom development, organizations can empower staff to create their own solutions, lowering costs while maintaining control.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;How It Works in Real Life&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Practical examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Power Apps in HR: An HR team creates a leave request app where employees submit requests through their phones. Managers receive notifications and approve or decline with one click. Data is stored in SharePoint.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Power Automate in Finance: Expense approvals are automated. When an employee uploads a receipt to SharePoint, Power Automate triggers a workflow that routes it for approval and sends notifications through Teams.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Power BI in Sales: A sales manager views real-time dashboards showing regional sales performance. Trends are visualized clearly, making it easier to adjust strategies instantly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Power Pages for Customer Service: An NGO builds a grant application portal where applicants apply online, and staff manage reviews from a single dashboard powered by Dataverse.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Power Platform vs. Traditional Development&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In the past, building business applications meant hiring developers, writing thousands of lines of code, and waiting months for a launch. Power Platform flips this script by enabling “citizen developers”—non-technical staff who know the business best—to build solutions themselves.&lt;/p&gt;

&lt;p&gt;Of course, IT departments still play a crucial role in ensuring security, governance, and scalability. But instead of being gatekeepers, they become enablers, helping business users innovate faster while maintaining oversight.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges &amp;amp; Considerations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;No platform is perfect. While the Power Platform is powerful, organizations must consider:&lt;/li&gt;
&lt;li&gt;Licensing costs: Depending on the number of users and apps, costs can add up. Careful planning is essential.&lt;/li&gt;
&lt;li&gt;Governance &amp;amp; security: Citizen development can create risks if apps are built without oversight. Microsoft provides strong tools for role-based access, data loss prevention, and audit logging, but IT must implement them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Training needs: While it’s low-code, there’s still a learning curve. Training ensures users build effective, scalable solutions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Future of Work with Power Platform
&lt;/h2&gt;

&lt;p&gt;The future of the Power Platform is tied closely to AI and automation. With the rise of Microsoft Copilot, users will increasingly be able to build apps, automate workflows, and analyze data through natural language prompts—making low-code even more accessible.&lt;/p&gt;

&lt;p&gt;Globally, low-code adoption is skyrocketing, and the Power Platform is positioned as one of the leaders. As businesses look for agility, cost savings, and empowerment, these tools will continue to dominate conversations around digital transformation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Why Everyone Is Talking About It
&lt;/h2&gt;

&lt;p&gt;Microsoft Power Platform is more than just a set of tools—it’s a shift in how organizations build, automate, and analyze. By combining Power Apps, Power Automate, Power BI, and Power Pages, it gives everyone the ability to innovate, from frontline staff to executives. Whether it’s simplifying HR processes, improving customer engagement, or unlocking data insights, the Power Platform is empowering people to create solutions faster, smarter, and more cost-effectively.&lt;/p&gt;

&lt;p&gt;Everyone is talking about it because it democratizes technology—putting digital transformation in the hands of all.&lt;/p&gt;

&lt;p&gt;✨ Thanks for reading! If you found this guide useful, feel free to drop a comment, share your thoughts, or let me know which Power Platform tool you’d like me to dive deeper into.&lt;/p&gt;

&lt;p&gt;📌 Stay tuned for my next article, where we’ll take a closer look at each component—Power Apps, Power Automate, Power BI, and Power Pages—exploring their features, use cases, and practical tips to help you get started.&lt;/p&gt;

</description>
      <category>microsoft</category>
      <category>powerautomate</category>
      <category>powerapps</category>
      <category>powerplatform</category>
    </item>
    <item>
      <title>🚀 How to Load Datasets Efficiently in Pandas: A Complete Guide 📊 Want to master data loading in Pandas? Whether you're working with CSV, Excel, JSON, SQL, or Parquet files, knowing how to efficiently read datasets is essential for data analytics.</title>
      <dc:creator>Phylis Jepchumba, MSc</dc:creator>
      <pubDate>Tue, 18 Feb 2025 09:42:04 +0000</pubDate>
      <link>https://dev.to/phylis/how-to-load-datasets-efficiently-in-pandas-a-complete-guide-want-to-master-data-loading-in-4cp8</link>
      <guid>https://dev.to/phylis/how-to-load-datasets-efficiently-in-pandas-a-complete-guide-want-to-master-data-loading-in-4cp8</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/phylis" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F671287%2F415e9ff5-6626-4176-a3a8-35873136b50f.jpg" alt="phylis"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/phylis/how-to-load-datasets-efficiently-in-pandas-a-complete-guide-2id9" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;How to Load Datasets Efficiently in Pandas: A Complete Guide&lt;/h2&gt;
      &lt;h3&gt;Phylis Jepchumba, MSc ・ Feb 18&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#datascience&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#pandas&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#machinelearning&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#bigdata&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>datascience</category>
      <category>pandas</category>
      <category>machinelearning</category>
      <category>bigdata</category>
    </item>
    <item>
      <title>How to Load Datasets Efficiently in Pandas: A Complete Guide</title>
      <dc:creator>Phylis Jepchumba, MSc</dc:creator>
      <pubDate>Tue, 18 Feb 2025 09:39:10 +0000</pubDate>
      <link>https://dev.to/phylis/how-to-load-datasets-efficiently-in-pandas-a-complete-guide-2id9</link>
      <guid>https://dev.to/phylis/how-to-load-datasets-efficiently-in-pandas-a-complete-guide-2id9</guid>
      <description>&lt;p&gt;&lt;em&gt;"Without data, you're just another person with an opinion."&lt;/em&gt; — &lt;strong&gt;W. Edwards Deming&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In today’s data-driven world, the ability to efficiently handle, analyze, and extract insights from large datasets is a key skill for data analysts, scientists, and engineers. The volume of data is growing exponentially, and making sense of it requires powerful tools that can handle structured and unstructured data seamlessly.&lt;/p&gt;

&lt;p&gt;Pandas is one of Python’s most powerful data analysis libraries. It simplifies working with structured data by providing robust tools for reading, manipulating, and analyzing datasets with minimal effort. Whether you're working with small datasets for exploratory analysis or massive datasets requiring performance optimization, Pandas ensures you can load and process data efficiently.&lt;/p&gt;

&lt;p&gt;Pandas also offers various functions to read datasets from multiple sources such as CSV, Excel, JSON, SQL, and Parquet files—each with unique advantages and performance considerations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What You'll Learn in This Guide:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to read different types of datasets into Pandas DataFrames.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end of this article, you will have a solid understanding of how to efficiently load datasets into Pandas, setting a strong foundation for your data analytics and machine learning projects.&lt;/p&gt;

&lt;p&gt;Let’s get started! 🚀&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Installing and Importing Pandas&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before we start loading datasets, make sure you have Pandas installed in your Python environment. If you haven’t installed it yet, you can do so using &lt;strong&gt;pip&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, import Pandas in your script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pandas is now ready to help us load datasets efficiently!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Reading Different Types of Datasets in Pandas&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pandas provides built-in functions to read various data formats and load them into a DataFrame—a structured, tabular representation of data with labeled rows and columns. Let’s explore how to read datasets from different sources into Pandas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.1 Reading CSV Files&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;CSV (Comma-Separated Values) is the most common format for structured data. It is widely used because it’s lightweight, easy to share, and readable by both humans and machines.&lt;/p&gt;

&lt;p&gt;To load a CSV file into Pandas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Replace with the actual file path
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;  &lt;span class="c1"&gt;# Display the first 5 rows
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📌 Key Parameters for read_csv():&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;delimiter=';' – Use if your file is semicolon-separated instead of commas.&lt;/li&gt;
&lt;li&gt;nrows=100 – Read only the first 100 rows for quick inspection.&lt;/li&gt;
&lt;li&gt;usecols=['Column1', 'Column2'] – Load specific columns instead of the entire dataset.&lt;/li&gt;
&lt;li&gt;dtype={'id': 'int32', 'price': 'float32'} – Define column data types to optimize memory usage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;📌 Handling Large CSV Files Efficiently&lt;/p&gt;

&lt;p&gt;For large files, reading everything at once can cause memory issues. A better approach is to load data in chunks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df_chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;large_data.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunksize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Load in chunks of 10,000 rows
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df_chunk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Process each chunk separately
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2.2 Reading Excel Files&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Excel files (XLS, XLSX) are commonly used for business and financial data. Pandas allows you to load Excel files using read_excel().&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_excel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data.xlsx&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sheet_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Sheet1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📌 Key Parameters for read_excel():&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;sheet_name&lt;/strong&gt;=None – Load all sheets as a dictionary of DataFrames.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;usecols&lt;/strong&gt;="A:D" – Load only specific columns (e.g., columns A to D).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;skiprows&lt;/strong&gt;=5 – Skip the first 5 rows if they contain metadata instead of actual data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;⚠️ &lt;strong&gt;Tip:&lt;/strong&gt; Excel files are slower to read compared to CSVs. If possible, convert your files to CSV or Parquet for better performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.3 Reading JSON Files&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;JSON (JavaScript Object Notation) is a structured format commonly used in web applications and APIs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data.json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📌 Handling Different JSON Structures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;orient='records' – If the JSON is structured as a list of dictionaries.&lt;/li&gt;
&lt;li&gt;orient='columns' – If the JSON has key-value pairs with column names as keys.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2.4 Reading SQL Databases&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pandas allows you to read data directly from SQL databases using read_sql_query().&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;
&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;database.db&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Connect to the database
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_sql_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM table_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📌 For Large Databases:&lt;/p&gt;

&lt;p&gt;Use chunksize to process data in smaller parts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df_iter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_sql_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM table_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunksize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df_iter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Process each chunk separately
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2.5 Reading Parquet Files&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Parquet is an optimized columnar storage format that is significantly faster than CSV for handling large datasets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_parquet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data.parquet&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📌 Why Use Parquet Over CSV?&lt;/p&gt;

&lt;p&gt;✔ Faster read/write speeds.&lt;br&gt;
✔ Supports compression, reducing file size.&lt;br&gt;
✔ Better for big data workflows (e.g., Apache Spark, AWS Athena).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.6 Reading Text and TSV Files&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For raw text files or Tab-Separated Values (TSV) files, use read_csv() with a custom delimiter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data.txt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delimiter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Tab-separated values
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📌 For Space-Separated Data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data.txt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delimiter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Efficient data loading is the foundation of data analytics and machine learning projects. Pandas provides powerful tools to read datasets from multiple sources, optimize performance, and handle large datasets efficiently.&lt;/p&gt;

&lt;p&gt;🚀 In our next article, we will explore how to handle missing values in Pandas! Stay tuned.&lt;/p&gt;

&lt;p&gt;👉 Have questions? Drop them in the comments below!&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>pandas</category>
      <category>machinelearning</category>
      <category>bigdata</category>
    </item>
    <item>
      <title>Are you a data analyst or aspiring to be one? Here are 7 must-know Python libraries that will help you clean, analyze, visualize, and model data like a pro! From Pandas for data manipulation to Scikit-learn for machine learning</title>
      <dc:creator>Phylis Jepchumba, MSc</dc:creator>
      <pubDate>Fri, 07 Feb 2025 08:45:11 +0000</pubDate>
      <link>https://dev.to/phylis/master-data-analytics-with-python-are-you-a-data-analyst-or-aspiring-to-be-one-here-are-7-31m6</link>
      <guid>https://dev.to/phylis/master-data-analytics-with-python-are-you-a-data-analyst-or-aspiring-to-be-one-here-are-7-31m6</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/phylis" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F671287%2F415e9ff5-6626-4176-a3a8-35873136b50f.jpg" alt="phylis"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/phylis/top-7-python-libraries-every-data-analyst-should-know-in-2025-2bce" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Top 7 Python Libraries Every Data Analyst Should Know in 2025&lt;/h2&gt;
      &lt;h3&gt;Phylis Jepchumba, MSc ・ Feb 7&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#python&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#machinelearning&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#datascience&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#data&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>data</category>
    </item>
    <item>
      <title>Top 7 Python Libraries Every Data Analyst Should Know in 2025</title>
      <dc:creator>Phylis Jepchumba, MSc</dc:creator>
      <pubDate>Fri, 07 Feb 2025 08:43:05 +0000</pubDate>
      <link>https://dev.to/phylis/top-7-python-libraries-every-data-analyst-should-know-in-2025-2bce</link>
      <guid>https://dev.to/phylis/top-7-python-libraries-every-data-analyst-should-know-in-2025-2bce</guid>
      <description>&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Python has become the go-to language for data analytics due to its simplicity, flexibility, and powerful ecosystem of libraries. In 2025, data analysts need to be well-versed with the best tools to handle large datasets, perform statistical analysis, and create meaningful visualizations. This article explores the top 7 Python libraries that every data analyst should master for efficient and insightful data analytics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://pandas.pydata.org/docs/user_guide/index.html" rel="noopener noreferrer"&gt;Pandas&lt;/a&gt;: The Backbone of Data Manipulation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pandas is the most widely used library for data manipulation and analysis in Python. It provides powerful data structures, such as DataFrames and Series, which allow analysts to clean, transform, and explore data efficiently.&lt;/p&gt;

&lt;p&gt;Key Features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handles missing data seamlessly&lt;/li&gt;
&lt;li&gt;Powerful data filtering, grouping, and aggregation functions&lt;/li&gt;
&lt;li&gt;Supports various file formats (CSV, Excel, SQL, JSON)&lt;/li&gt;
&lt;li&gt;Integration with NumPy for high-performance data operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://numpy.org/" rel="noopener noreferrer"&gt;NumPy &lt;/a&gt;– The Foundation of Numerical Computing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;NumPy (Numerical Python) is a fundamental library that supports large, multi-dimensional arrays and mathematical functions for array-based operations.&lt;br&gt;
Key Features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fast numerical computations using vectorized operations&lt;/li&gt;
&lt;li&gt;Supports linear algebra, Fourier transforms, and random number generation&lt;/li&gt;
&lt;li&gt;Forms the base for many data science libraries, including Pandas and SciPy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://matplotlib.org/stable/users/index" rel="noopener noreferrer"&gt;Matplotlib&lt;/a&gt; – The Classic Visualization Library&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Matplotlib is a versatile library for creating static, animated, and interactive visualizations in Python. It gives analysts full control over chart customization.&lt;br&gt;
Key Features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wide range of plot types (line, bar, scatter, histogram, etc.)&lt;/li&gt;
&lt;li&gt;Highly customizable plots with labels, titles, and legends&lt;/li&gt;
&lt;li&gt;Supports multiple file formats (PNG, PDF, SVG)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://seaborn.pydata.org/[](url)" rel="noopener noreferrer"&gt;Seaborn &lt;/a&gt;– Statistical Data Visualization Made Easy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Seaborn is built on top of Matplotlib and is specialized in statistical data visualization. It makes it easy to generate visually appealing and informative plots.&lt;br&gt;
Key Features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Elegant default styles for beautiful charts&lt;/li&gt;
&lt;li&gt;Built-in support for categorical, distribution, and regression plots&lt;/li&gt;
&lt;li&gt;Works seamlessly with Pandas DataFrames&lt;/li&gt;
&lt;li&gt;Heatmaps and pair plots for exploratory data analysis (EDA)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://scipy.org/" rel="noopener noreferrer"&gt;SciPy &lt;/a&gt;– Advanced Statistical and Mathematical Analysis&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;SciPy (Scientific Python) extends NumPy and provides powerful tools for scientific computing and advanced analytics. It is widely used for statistical modeling and optimization.&lt;br&gt;
Key Features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Functions for linear algebra, optimization, signal processing, and interpolation&lt;/li&gt;
&lt;li&gt;Built-in statistical distributions for hypothesis testing&lt;/li&gt;
&lt;li&gt;Image processing and fast Fourier transforms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://scikit-learn.org/stable/" rel="noopener noreferrer"&gt;Scikit-learn&lt;/a&gt; – Machine Learning for Data Analysts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Scikit-learn is the most popular Python library for machine learning and predictive analytics. While it's primarily used for ML, many data analysts use it for clustering, regression, and classification.&lt;br&gt;
Key Features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wide range of ML algorithms (decision trees, random forests, SVMs, etc.)&lt;/li&gt;
&lt;li&gt;Simple and intuitive API for data preprocessing and model training&lt;/li&gt;
&lt;li&gt;Tools for dimensionality reduction, feature selection, and hyperparameter tuning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.statsmodels.org/stable/index.html" rel="noopener noreferrer"&gt;Statsmodels&lt;/a&gt; – In-depth Statistical Analysis&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Statsmodels is designed for performing statistical tests and estimating models. It is essential for analysts working with regression analysis and hypothesis testing.&lt;br&gt;
Key Features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linear and generalized linear models (OLS, logistic regression)&lt;/li&gt;
&lt;li&gt;Time series analysis (AR, ARMA, ARIMA models)&lt;/li&gt;
&lt;li&gt;Extensive hypothesis testing functions (t-tests, ANOVA, chi-square tests)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These seven Python libraries provide the essential tools every data analyst needs to process, visualize, and analyze data efficiently in 2025. Whether you’re working on business intelligence, research, or predictive analytics, mastering these libraries will help you make data-driven decisions with confidence.&lt;/p&gt;

&lt;p&gt;I hope you enjoyed this article. We will explore each library in-depth in the next articles! Stay tuned. 🚀&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>data</category>
    </item>
    <item>
      <title>Crafting Compelling Data Narratives:A Comprehensive Look at Chart Types in Power BI.</title>
      <dc:creator>Phylis Jepchumba, MSc</dc:creator>
      <pubDate>Mon, 18 Mar 2024 14:24:47 +0000</pubDate>
      <link>https://dev.to/phylis/crafting-compelling-data-narrativesa-comprehensive-look-at-chart-types-in-power-bi-4740</link>
      <guid>https://dev.to/phylis/crafting-compelling-data-narrativesa-comprehensive-look-at-chart-types-in-power-bi-4740</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;Did you know that boring columns and rows can be used to convey compelling stories, narratives, and insights?. This article will help you understand what data storytelling is, what data visualization is, and the diverse world of chart types available in Power BI. Let's explore how these tools can unlock the potential within your data, turning raw numbers into impactful narratives that drive understanding and action.&lt;/p&gt;

&lt;h4&gt;
  
  
  What is data storytelling?
&lt;/h4&gt;

&lt;p&gt;Data storytelling is the practice of using data visualizations, narratives, and insights to communicate a coherent and compelling story from data. It goes beyond simply presenting data points and statistics; instead, it involves crafting a narrative that contextualizes the data, highlights key findings, and guides the audience through a meaningful interpretation of the information.&lt;/p&gt;

&lt;h4&gt;
  
  
  What is data visualization?
&lt;/h4&gt;

&lt;p&gt;Data visualization is the graphical representation of data and information. It involves the use of visual elements such as charts, graphs, maps, and dashboards to communicate complex datasets in a clear and accessible manner.&lt;/p&gt;

&lt;h4&gt;
  
  
  The importance of data visualization in data analysis.
&lt;/h4&gt;

&lt;p&gt;Enhanced Understanding: Data visualization makes complex data more understandable, aiding comprehension and interpretation.&lt;/p&gt;

&lt;p&gt;Insight Generation: Visual representations facilitate the extraction of insights and actionable intelligence from data, guiding decision-making processes.&lt;/p&gt;

&lt;p&gt;Effective Communication: Visualizations serve as powerful communication tools, enabling stakeholders to grasp key findings quickly and intuitively.&lt;/p&gt;

&lt;p&gt;Improved Decision Making: Data visualization supports informed and data-driven decision-making, leading to better outcomes and strategic planning.&lt;/p&gt;

&lt;h4&gt;
  
  
  How Power BI facilitates effective data visualization.
&lt;/h4&gt;

&lt;p&gt;Power BI is a business analytics service developed by Microsoft that enables users to visualize and analyze data from various sources in order to derive actionable insights.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F39h74fsfl7132lxgqytq.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F39h74fsfl7132lxgqytq.PNG" alt="Power BI Interface" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;It has a user-friendly interface and a wide range of visualization options its features include; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Drag-and-drop functionality that enables effortless creation and customization of charts, graphs, and dashboards&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Interactive capabilities, such as filtering, slicing, and drill-down functionalities, &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Power BI seamlessly integrates with various data sources&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Robust sharing and collaboration features that facilitate effective communication of insights across teams and organizations, fostering a culture of data-driven decision-making.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Types of Charts in Power BI&lt;/p&gt;

&lt;p&gt;Power BI offers a wide array of charts to effectively represent data and derive actionable insights. Let's explore the rich variety of chart types available within Power BI:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsn2s6dusfhumdj8rxi3h.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsn2s6dusfhumdj8rxi3h.PNG" alt="Visualization Charts" width="525" height="246"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  Bar Chart
&lt;/h5&gt;

&lt;p&gt;A bar chart is a type of graph that represents categorical data with rectangular bars. The length or height of each bar corresponds to the frequency or value of the category it represents.&lt;br&gt;
They are best for comparing the values of different categories or to show changes/trends  in data over time.&lt;/p&gt;

&lt;p&gt;Types of Bar Chart&lt;/p&gt;

&lt;h5&gt;
  
  
  Vertical/Column Chart
&lt;/h5&gt;

&lt;p&gt;The bars are oriented vertically, with each bar representing a category along the x-axis and the height of the bar indicating the value associated with that category. They are effective for visualizing discrete data and highlighting differences between categories.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnrd6dl0b3mjhxcwn1tl2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnrd6dl0b3mjhxcwn1tl2.png" alt="Vertical/Column Chart" width="521" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is recommended to use a Column chart for small data labels.&lt;/p&gt;

&lt;h5&gt;
  
  
  Horizontal/Bar Chart
&lt;/h5&gt;

&lt;p&gt;In a horizontal bar chart, the bars are oriented horizontally, with each bar representing a category along the y-axis and the length of the bar indicating the value associated with that category.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqdyfdojg2s5xrk5zrzix.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqdyfdojg2s5xrk5zrzix.PNG" alt="Horizontal/Bar Chart" width="796" height="571"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;They are the best for Large data labels.&lt;/p&gt;

&lt;p&gt;Clustered Column Chart&lt;/p&gt;

&lt;p&gt;A clustered column chart is a specific type of column chart where multiple data series are displayed side-by-side within each category&lt;/p&gt;

&lt;p&gt;Clustered Bar chart&lt;/p&gt;

&lt;h5&gt;
  
  
  Stacked Column chart
&lt;/h5&gt;

&lt;p&gt;A stacked column chart is a type of data visualization that displays multiple series of data as vertical bars, where the height of each bar represents the total value for that category or group. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flp13kme4yzxq4bcf3qug.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flp13kme4yzxq4bcf3qug.png" alt="Stacked Column Chart" width="603" height="571"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  Pie Chart
&lt;/h5&gt;

&lt;p&gt;A pie chart is a circular statistical graphic divided into slices to illustrate numerical proportions. Each slice represents a proportion of the whole, and the size of each slice is proportional to the quantity it represents. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnjdxhm2s4hi1tg53v3rs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnjdxhm2s4hi1tg53v3rs.png" alt="Pie Chart" width="505" height="329"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pie charts are typically used to show the composition or distribution of a categorical variable and also to comparing Proportions.&lt;/p&gt;

&lt;h5&gt;
  
  
  Doughnut Chart:
&lt;/h5&gt;

&lt;p&gt;A doughnut chart is a variant of the pie chart that displays data in a ring shape with a hole in the center. Similar to a pie chart, a doughnut chart divides the circle into segments to represent different categories or subgroups of data. However, unlike a pie chart where the entire circle is filled, a doughnut chart leaves a blank space in the center, creating a "doughnut" shape.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F145a1wfdpynadsy0b0ww.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F145a1wfdpynadsy0b0ww.png" alt="doughnut chart" width="505" height="329"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A doughnut chart is best suited for visualizing the relative proportions of different categories or subgroups within a dataset.&lt;/p&gt;

&lt;h5&gt;
  
  
  Line Chart
&lt;/h5&gt;

&lt;p&gt;A line chart is a type of data visualization that displays information as a series of data points (markers) connected by straight lines. These charts are particularly useful for showing trends or changes over time, as they allow viewers to see the progression of data points along an axis.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fst79g0s5ri1bsztcedjd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fst79g0s5ri1bsztcedjd.png" alt="Line chart" width="645" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  Funnel Chart
&lt;/h5&gt;

&lt;p&gt;A funnel chart is a type of data visualization that resembles a funnel, with progressively decreasing or increasing values represented by segments of varying sizes. Funnel charts are typically used to visualize stages in a process, such as a sales pipeline, marketing conversion funnel, or customer journey, where the number of items or the value decreases or increases as it moves through each stage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8a9b5ou5vmd4vtco3we4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8a9b5ou5vmd4vtco3we4.png" alt="Funnel Chart" width="521" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  Area Chart
&lt;/h5&gt;

&lt;p&gt;An area chart is a type of data visualization that represents data points on a graph, with the area below the line filled in with color to emphasize the magnitude of change over time or other categories. It's similar to a line chart, but the space between the line and the horizontal axis is filled, creating a visual representation of the cumulative total or the volume of data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feyexx6k7zsx9pb0uy0da.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feyexx6k7zsx9pb0uy0da.png" alt="Area Chart" width="603" height="571"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  Scatter Charts
&lt;/h5&gt;

&lt;p&gt;A scatter chart, also known as a scatter plot or scattergram, is a type of data visualization that displays individual data points as dots on a two-dimensional plane. Each dot represents the values of two variables, one plotted along the horizontal axis (X-axis) and the other plotted along the vertical axis (Y-axis). Scatter charts are commonly used to visualize the relationship or correlation between two variables.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbicdrugy2n87au30eh1p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbicdrugy2n87au30eh1p.png" alt="Scatter Chart" width="715" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  Gauge charts
&lt;/h5&gt;

&lt;p&gt;A gauge chart, also known as a dial chart or speedometer chart, is a type of data visualization that resembles a speedometer or gauge. It's used to display a single value within a predefined range, typically representing progress towards a goal or a key performance indicator (KPI). Gauge charts provide a visual representation of how close the current value is to the target or threshold.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpf92nnhupp63kniric5v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpf92nnhupp63kniric5v.png" alt="Gauge chart" width="500" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Best Practices for Data Visualization in Power BI
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Understand Your Audience&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tailor your visualizations to the needs and preferences of your audience. Consider their level of expertise, the questions they need to answer, and the insights they are seeking.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simplify and Clarify&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keep your visualizations simple and easy to understand. Avoid cluttering your visuals with unnecessary details or decorations. Focus on conveying the most important information clearly and concisely.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Choose the right chart type&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Select the most appropriate chart type for your data and the message you want to convey. Consider factors such as the type of data, the relationships you want to highlight, and the insights you want to communicate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- &lt;br&gt;
Use Color Effectively&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use color strategically to draw attention to important information and highlight key trends or patterns. Avoid using too many colors or overly bright colors, as this can distract from the data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Provide Context&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Include titles, labels, and annotations to provide context for your visualizations. Clearly label axes, provide explanations for any abbreviations or acronyms, and include relevant contextual information to help viewers understand the data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ensure Accessibility&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Make sure your visualizations are accessible to all users, including those with visual impairments or color vision deficiencies. Use high-contrast colors, provide alternative text for images, and avoid relying solely on color to convey information.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Standardize Formatting&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Maintain consistency in formatting across your reports to create a cohesive and professional look. Use consistent fonts, colors, and styles for titles, labels, and other elements to make your visualizations easier to read and understand.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use Interactivity Wisely&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Take advantage of Power BI's interactive features, such as tooltips, filters, and slicers, to enable users to explore the data in more detail. However, be mindful not to overwhelm users with too much interactivity or complexity.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Iterate and Test&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Experiment with different visualizations and layouts to find the most effective way to present your data. .&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tell a Story&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Arrange your visualizations in a logical order to tell a cohesive story and guide viewers through the data. Use narrative elements such as titles, annotations, and captions to provide context and guide interpretation.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>analytics</category>
      <category>data</category>
      <category>womenintech</category>
    </item>
  </channel>
</rss>
