<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jin</title>
    <description>The latest articles on DEV Community by Jin (@luca1iu).</description>
    <link>https://dev.to/luca1iu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg</url>
      <title>DEV Community: Jin</title>
      <link>https://dev.to/luca1iu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/luca1iu"/>
    <language>en</language>
    <item>
      <title>Why I Left China as a Data Analyst</title>
      <dc:creator>Jin</dc:creator>
      <pubDate>Sun, 05 Jul 2026 05:00:01 +0000</pubDate>
      <link>https://dev.to/luca1iu/why-i-left-china-as-a-data-analyst-2f1g</link>
      <guid>https://dev.to/luca1iu/why-i-left-china-as-a-data-analyst-2f1g</guid>
      <description>&lt;p&gt;In 2021, I graduated with my Master’s degree in &lt;em&gt;Industrial Engineering&lt;/em&gt; in Germany and decided to move back to China. During the final year of my degree, I taught myself Python and SQL on DataCamp. I used those skills to pass a data case study and landed my first job at a small SaaS startup in Shanghai. A year later, I moved to an American company, RRD, also in Shanghai.&lt;/p&gt;

&lt;p&gt;I worked there from 2022 to 2024. During those two years, I noticed a few undeniable trends in the data and tech industry. Eventually, these trends made me realize I needed to leave. Here is why.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Two Separate Software Ecosystems
&lt;/h2&gt;

&lt;p&gt;At my job, I used Microsoft Teams and Power BI. However, many of my friends in domestic companies used local Chinese office suites and BI tools.&lt;/p&gt;

&lt;p&gt;China has built its own independent software ecosystem. It works perfectly fine for those used to it, but it is completely separate from the global market. Because my skills and habits were rooted in global tools like Power BI, my employment options in China were almost entirely limited to foreign companies. That instantly shrank my job market.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The Great Tech Decoupling
&lt;/h2&gt;

&lt;p&gt;Between 2022 and 2024, the decoupling of global and domestic tech became obvious. Salesforce shut down its direct China operations, and Tableau made similar moves. Many companies were forced to adopt domestic ERP software.&lt;/p&gt;

&lt;p&gt;For a Data Analyst, the ERP system is your foundation. Domestic ERPs and SAP run on completely different logics. The same divide is happening with cloud infrastructure—global players like AWS, Azure, and GCP versus domestic Chinese clouds.&lt;/p&gt;

&lt;p&gt;I realized I was standing at a crossroads. I had to choose a path: adapt entirely to the Chinese software ecosystem, or stick with the international one. Trying to jump back and forth between the two just means a massive loss of time and high learning costs.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Budget Constraints Over Value Creation
&lt;/h2&gt;

&lt;p&gt;Profit margins for many companies in China are tight. Even in multinational companies, the high-profit departments usually stay abroad, leaving the Chinese branches with strict cost constraints.&lt;/p&gt;

&lt;p&gt;For example, we did not have the budget to give everyone a Power BI Pro license. Because of this, a significant part of my job turned into finding cheap workarounds. I had to figure out how to set up local servers for Power BI or build wrappers for Tableau just to save money. Instead of spending my time analyzing data and creating real business value, I was wasting energy trying to bypass budget rules using cheap alternatives.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. The AI Barrier
&lt;/h2&gt;

&lt;p&gt;When the AI boom started, the tools were immediately inaccessible in China. Using them requires extra effort: setting up VPNs, buying virtual foreign phone numbers, and navigating blocks.&lt;/p&gt;

&lt;p&gt;On top of that, a $20 monthly subscription for AI tools is expensive relative to local salaries. AI is developing at lightning speed. I didn't want my first step with every new technology to be researching how to secretly bypass regulations just to use it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Decision to Leave
&lt;/h2&gt;

&lt;p&gt;Ultimately, I decided to leave China. The choice was half for my career and half for my family.&lt;/p&gt;

&lt;p&gt;Today, I am back in Germany, working as a Data Analyst. Looking back, I am happy with my decision. I can focus my time on creating real value, and most importantly, I am staying seamlessly connected to the global tech frontier.&lt;/p&gt;




&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;


&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Jin&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Jin, a Business Intelligence Developer with a passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/Luca_DataTeam" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Connect with me on X&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>job</category>
      <category>career</category>
      <category>dataanalyst</category>
      <category>ai</category>
    </item>
    <item>
      <title>Where to Write Python in Azure - Building the Python ETL Pipeline</title>
      <dc:creator>Jin</dc:creator>
      <pubDate>Sun, 05 Jul 2026 05:00:00 +0000</pubDate>
      <link>https://dev.to/luca1iu/where-to-write-python-in-azure-building-the-python-etl-pipeline-2d73</link>
      <guid>https://dev.to/luca1iu/where-to-write-python-in-azure-building-the-python-etl-pipeline-2d73</guid>
      <description>&lt;p&gt;Many data analysts know how to read and process Excel files using Python and Pandas locally. But what happens when you move to the Azure cloud?&lt;/p&gt;

&lt;p&gt;When building a recent ETL pipeline, the target database was Azure SQL Database. Suddenly, running Python on my local machine was no longer an option because local scripts couldn't easily or securely connect to the cloud database via ODBC. I needed a place to write and execute Python directly in Azure, read Excel files, and schedule daily tasks.&lt;/p&gt;

&lt;p&gt;Here is the architecture I built, the services I tested, and the exact costs of my final solution.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F25v0qb3k03xt6trexz86.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F25v0qb3k03xt6trexz86.png" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Getting Business Data into the Cloud
&lt;/h3&gt;

&lt;p&gt;The data sources for this pipeline were monthly Excel files and mapping tables that business users manually updated.&lt;/p&gt;

&lt;p&gt;To bridge the gap between business operations and the cloud, I used &lt;strong&gt;Power Automate&lt;/strong&gt;. I set up a flow that automatically syncs the users' OneDrive folders to an Azure Storage Account every day. This allows business users to update mapping tables in a familiar environment (OneDrive), while seamlessly feeding the latest data into the data engineering pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: The Quest for the Right Compute
&lt;/h3&gt;

&lt;p&gt;Once the data was in the Azure Storage Account, I needed a compute service to process it and write the results to Azure SQL Database. I tried four different Azure services before finding the right fit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Azure Synapse Analytics&lt;/strong&gt; Synapse is powerful, but it is expensive. According to Microsoft’s documentation, Synapse uses a Massively Parallel Processing (MPP) architecture. For medium-sized Excel data, this is massive overkill. Paying for distributed computation when you don't need it simply isn't cost-effective.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Azure Machine Learning (Virtual Machine)&lt;/strong&gt; Next, I tried creating a VM in Azure ML. The developer experience was fantastic. By connecting via VS Code, I could easily read data from the Storage Account and write it to the SQL Database. However, it had one fatal flaw: scheduling. Setting up a simple daily automated run for a notebook in Azure ML is unnecessarily complicated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Azure Functions&lt;/strong&gt; Azure Functions are incredibly cheap. But as the data processing logic grew, I hit its limitations. Functions are great for lightweight, event-driven tasks, but they are not designed for managing complex ETL dependencies and heavy data transformations.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Final Solution: Azure Databricks Serverless
&lt;/h3&gt;

&lt;p&gt;Ultimately, I moved to Azure Databricks. Initially, I used a standard hybrid workspace, but the idle costs of keeping VMs running (or waiting for them to spin up) were too high.&lt;/p&gt;

&lt;p&gt;Then, I switched to &lt;strong&gt;Databricks Serverless&lt;/strong&gt; (hosted in the Germany West Central region). This solved everything. I had an excellent environment to write Python, seamless connections to Azure Storage and SQL Database, and built-in, reliable scheduling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Transparency: What Does It Actually Cost?
&lt;/h3&gt;

&lt;p&gt;One of the biggest concerns with Databricks is the cost. For this production pipeline, my Databricks service costs exactly &lt;strong&gt;€52 per month&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here is the breakdown of my real Azure bill:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Premium Interactive Serverless Compute DBU:&lt;/strong&gt; €42.24&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Premium Automated Serverless Compute DBU:&lt;/strong&gt; €8.27&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Premium Databricks Storage Unit DSU:&lt;/strong&gt; €0.11&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The largest chunk (€42.24) comes from &lt;em&gt;Interactive Compute&lt;/em&gt;—this is the cost generated when I am actively writing, testing, and debugging code.&lt;/p&gt;

&lt;p&gt;The actual production run—the &lt;em&gt;Automated Compute&lt;/em&gt;—only costs €8.27 per month. The pipeline is scheduled using a standard CRON expression (&lt;code&gt;0 0 5 ? * MON-FRI&lt;/code&gt;) to run every weekday at 5:00 AM. Because it is Serverless, I only pay for the exact seconds the compute is running to process the data, with zero idle costs on weekends.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Takeaway
&lt;/h3&gt;

&lt;p&gt;When building a data pipeline in Azure, finding the right place to write Python isn't just about code execution. It is a balancing act between developer experience (like VS Code integration), operational ease (simple scheduling), and cost control. For medium data workloads, Databricks Serverless currently hits that sweet spot perfectly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;


&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Jin&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Jin, a Business Intelligence Developer with a passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/Luca_DataTeam" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Connect with me on X&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>azure</category>
      <category>data</category>
      <category>python</category>
      <category>pipeline</category>
    </item>
    <item>
      <title>Stop Using Spark for Your Small Data - Why Azure Functions is the Right Tool for the Job</title>
      <dc:creator>Jin</dc:creator>
      <pubDate>Wed, 06 May 2026 09:22:57 +0000</pubDate>
      <link>https://dev.to/luca1iu/stop-using-spark-for-your-small-data-why-azure-functions-is-the-right-tool-for-the-job-4j66</link>
      <guid>https://dev.to/luca1iu/stop-using-spark-for-your-small-data-why-azure-functions-is-the-right-tool-for-the-job-4j66</guid>
      <description>&lt;p&gt;As a data analyst, my job is to get data from A to B, cleaned and ready for use. A common workflow for my team involves users uploading Excel files to a &lt;a href="https://www.microsoft.com/de-de/microsoft-365/onedrive/online-cloud-storage?market=de" rel="noopener noreferrer"&gt;OneDrive&lt;/a&gt; folder. A &lt;a href="//microsoft.com/de-de/power-platform/products/power-automate"&gt;Power Automate&lt;/a&gt; flow then syncs these files daily to a container in our &lt;a href="https://learn.microsoft.com/en-us/azure/storage/common/storage-account-overview" rel="noopener noreferrer"&gt;Azure Storage Account&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;From there, my responsibility begins:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read the new Excel file from Blob Storage using Python.&lt;/li&gt;
&lt;li&gt;Process the data (clean, transform, apply business logic).&lt;/li&gt;
&lt;li&gt;Write the final data to an Azure SQL Database.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I needed this to run on two triggers: a &lt;strong&gt;time schedule&lt;/strong&gt; (e.g., every morning at 7 AM) and an &lt;strong&gt;event-driven&lt;/strong&gt; trigger (i.e., as soon as a new file lands in the container).&lt;/p&gt;

&lt;p&gt;My first thought was to use the "big data" tools I'd heard of: &lt;a href="https://azure.microsoft.com/de-de/products/databricks" rel="noopener noreferrer"&gt;&lt;strong&gt;Azure Databricks&lt;/strong&gt;&lt;/a&gt; or &lt;a href="https://azure.microsoft.com/de-de/products/synapse-analytics" rel="noopener noreferrer"&gt;&lt;strong&gt;Azure Synapse Analytics&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  The "Big Tool" Trap
&lt;/h1&gt;

&lt;p&gt;On the surface, Databricks and Synapse are perfect.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They let me write Python in a &lt;strong&gt;Notebook&lt;/strong&gt;, which I'm very comfortable with.&lt;/li&gt;
&lt;li&gt;They have easy-to-use &lt;strong&gt;trigger&lt;/strong&gt; and &lt;strong&gt;monitoring&lt;/strong&gt; tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I set up a proof-of-concept, and it worked. But I quickly realized a problem. My Excel files are 10MB, not 10TB.&lt;/p&gt;

&lt;p&gt;Using a full Spark cluster (which is what both Databricks and Synapse Notebooks run on) was like &lt;strong&gt;using a sledgehammer to crack a nut&lt;/strong&gt;. I was paying for a powerful, multi-node cluster (which took 5-10 minutes to "cold start") just to run a Python script that finished in 30 seconds. The cost was going to be far too high for such a simple task.&lt;/p&gt;

&lt;h1&gt;
  
  
  The "Right Tool": Azure Functions
&lt;/h1&gt;

&lt;p&gt;After some research, I found the perfect tool for small-to-medium data tasks: &lt;strong&gt;Azure Functions&lt;/strong&gt;.&lt;br&gt;
Azure Functions, when used on a "Consumption Plan," is a true "serverless" service. This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It's cheap:&lt;/strong&gt; You get a generous free grant every month, and after that, you pay &lt;em&gt;only&lt;/em&gt; for the seconds your code is actually running. For my task, the cost is practically $0.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It's fast:&lt;/strong&gt; It starts in seconds (or less), not minutes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It's perfect for triggers:&lt;/strong&gt; It has built-in triggers for exactly my needs (Timer and Blob Storage).&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  The (Small) Learning Curve
&lt;/h1&gt;

&lt;p&gt;The one trade-off is that it's &lt;em&gt;slightly&lt;/em&gt; more complex than a notebook. You can't just write and run your code in a web browser. The modern, recommended workflow is to use &lt;strong&gt;Visual Studio Code (VS Code)&lt;/strong&gt; to develop your code locally and then "deploy" (push) it to the cloud.&lt;/p&gt;

&lt;p&gt;This "local development" workflow is a best practice. It means you have a copy of your code, can use source control (like Git), and can test everything on your machine before it goes live.&lt;/p&gt;
&lt;h1&gt;
  
  
  More Than Just Timers
&lt;/h1&gt;

&lt;p&gt;My needs were simple, but Azure Functions has triggers for almost anything. The most popular ones include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Timer Trigger:&lt;/strong&gt; Runs on a schedule (e.g., &lt;code&gt;0 7 * * 1&lt;/code&gt; for 7 AM every Monday).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blob Trigger:&lt;/strong&gt; Runs when a new file is uploaded to a storage container.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTTP Trigger:&lt;/strong&gt; Runs when it receives a web request (creating a simple API).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Queue Trigger:&lt;/strong&gt; Runs when a new message is added to a storage queue.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can see the full list on the official &lt;a href="https://learn.microsoft.com/en-us/azure/azure-functions/functions-triggers-bindings" rel="noopener noreferrer"&gt;Microsoft Azure Functions Triggers and Bindings documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Databricks and Synapse are amazing, powerful tools, but they are not the answer for everything. For our team's daily Excel processing, using them was costing us time and money.&lt;/p&gt;

&lt;p&gt;By investing a little time to learn the VS Code + Azure Functions workflow, we built a solution that is faster, more efficient, and costs a fraction of the price. &lt;strong&gt;Don't pay for a Spark cluster when all you need is a 30-second Python script.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;


&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Jin&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Jin, a Business Intelligence Developer with a passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/Luca_DataTeam" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Connect with me on X&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>azure</category>
      <category>dataanalyst</category>
      <category>functions</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Data Analyst: Does Your Work Actually Matter?</title>
      <dc:creator>Jin</dc:creator>
      <pubDate>Wed, 06 May 2026 09:22:37 +0000</pubDate>
      <link>https://dev.to/luca1iu/data-analyst-does-your-work-actually-matter-3in2</link>
      <guid>https://dev.to/luca1iu/data-analyst-does-your-work-actually-matter-3in2</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;I recently saw a question on Reddit that stopped me in my tracks: "Do you feel your work in data analysis is valuable to the organization you work for?"&lt;/p&gt;

&lt;p&gt;It is the question that haunts every data analyst.&lt;/p&gt;

&lt;p&gt;We spend hours cleaning data and building complex dashboards. We send them out into the void. And then... silence. We wonder: Is anyone actually reading this? Does this dashboard change anything?&lt;/p&gt;

&lt;p&gt;If you are just answering ad-hoc requests, the answer is often "no."&lt;/p&gt;

&lt;h1&gt;
  
  
  The Trap of "Saving Time"
&lt;/h1&gt;

&lt;p&gt;Many analysts get stuck in the "automation trap." A colleague from another department asks you to automate their manual workflow. You do it. They are happy because they save two hours a week.&lt;/p&gt;

&lt;p&gt;You feel useful. But does the company see the value?&lt;/p&gt;

&lt;p&gt;Often, they don't. From a management perspective, that colleague’s salary is already paid. Unless that saved time is directly used to generate new revenue, your automation didn't change the company's bottom line. You just made someone's life easier.&lt;/p&gt;

&lt;p&gt;That is nice, but it isn't necessarily &lt;em&gt;valuable&lt;/em&gt; in a way leaders notice.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Shift: Stop Doing Projects, Start Building Products
&lt;/h1&gt;

&lt;p&gt;If you want your work to matter, you need to stop acting like an IT support desk and start acting like a Product Owner.&lt;/p&gt;

&lt;p&gt;What is the difference?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;A Data Project&lt;/strong&gt; has a start and an end date. It is usually a one-time request. The goal is "delivery." Once you hand over the dashboard or report, you are done. It quickly becomes outdated.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;A Data Product&lt;/strong&gt; is a living tool. It doesn't just report the past; it helps shape future decisions. It evolves. Its goal is not "delivery," but measurable "business impact" (like saving money or reducing risk).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Real-World Example: The SpendCube
&lt;/h1&gt;

&lt;p&gt;Let’s look at a real example from my work with a purchasing department.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "Project" Approach:&lt;/strong&gt; &lt;br&gt;
The department asks for a report on last month's spending. I pull the data, send an Excel file, and close the ticket. &lt;br&gt;
&lt;em&gt;Result:&lt;/em&gt; They look at what happened. Nothing changes. The value is low.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "Product" Approach (The SpendCube Dashboard):&lt;/strong&gt; &lt;br&gt;
I build a live dashboard that doesn't just show &lt;em&gt;what&lt;/em&gt; was spent, but actively highlights &lt;em&gt;where&lt;/em&gt; we are overspending against budget in real-time. It identifies specific suppliers where we could negotiate better contracts tomorrow. &lt;br&gt;
&lt;em&gt;Result:&lt;/em&gt; The dashboard isn't just a report; it is a tool they use to actively save the company money. It contributes directly to the P&amp;amp;L (Profit and Loss).&lt;/p&gt;
&lt;h1&gt;
  
  
  How to Make Your Work Valuable
&lt;/h1&gt;

&lt;p&gt;If you are tired of wondering if your work matters, change your approach.&lt;/p&gt;

&lt;p&gt;Don't just accept tasks. When someone asks for a dashboard, ask them: "What decision will you make with this data?" If they can't answer, the dashboard probably isn't necessary.&lt;/p&gt;

&lt;p&gt;Move away from automating tasks and start building data products that solve real business problems. When your work directly helps the company save money or make money, you never have to ask if you are valuable. You already know the answer.&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;


&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Jin&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Jin, a Business Intelligence Developer with a passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/Luca_DataTeam" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Connect with me on X&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>career</category>
      <category>data</category>
      <category>dataanalyst</category>
    </item>
    <item>
      <title>How to Fix "command 'claude-vscode.editor.openLast' not found" in VS Code</title>
      <dc:creator>Jin</dc:creator>
      <pubDate>Wed, 06 May 2026 08:06:22 +0000</pubDate>
      <link>https://dev.to/luca1iu/how-to-fix-command-claude-vscodeeditoropenlast-not-found-in-vs-code-13e9</link>
      <guid>https://dev.to/luca1iu/how-to-fix-command-claude-vscodeeditoropenlast-not-found-in-vs-code-13e9</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;When trying to use the Claude Code extension in VS Code, you might run into this error preventing it from opening (2.1.129):&lt;/p&gt;

&lt;p&gt;&lt;code&gt;command 'claude-vscode.editor.openLast' not found&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution
&lt;/h2&gt;

&lt;p&gt;The fix is simple: you need to downgrade the extension to a specific stable version (2.1.128).&lt;/p&gt;

&lt;p&gt;Here are the exact steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Uninstall your current Claude VS Code extension.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click the Gear (Settings) icon on the Claude extension page in VS Code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Select "Install Another Version..." from the dropdown menu.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Choose version 2.1.128 from the list.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reload VS Code.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's it! The error should be gone and Claude will work properly again.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vscode</category>
      <category>claude</category>
    </item>
    <item>
      <title>How to Store JSON and XML in SQL Databases</title>
      <dc:creator>Jin</dc:creator>
      <pubDate>Fri, 13 Mar 2026 15:37:17 +0000</pubDate>
      <link>https://dev.to/luca1iu/how-to-store-json-and-xml-in-sql-databases-491m</link>
      <guid>https://dev.to/luca1iu/how-to-store-json-and-xml-in-sql-databases-491m</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In the era of big data and diverse data formats, the ability to store and query semi-structured data like JSON (JavaScript Object Notation) and XML (eXtensible Markup Language) in SQL databases has become increasingly important. This article explores how to effectively store and manage JSON and XML data in SQL databases, along with the pros and cons of each approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding JSON and XML
&lt;/h2&gt;

&lt;h4&gt;
  
  
  JSON
&lt;/h4&gt;

&lt;p&gt;JSON is a lightweight data interchange format that is easy for humans to read and write, and easy for machines to parse and generate. It is often used in web applications for data exchange between clients and servers.&lt;/p&gt;

&lt;h4&gt;
  
  
  XML
&lt;/h4&gt;

&lt;p&gt;XML is a markup language that defines rules for encoding documents in a format that is both human-readable and machine-readable. It is widely used for data representation and exchange, especially in web services.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storing JSON in SQL Databases
&lt;/h2&gt;

&lt;p&gt;Many modern SQL databases, such as PostgreSQL, MySQL, and SQL Server, provide native support for JSON data types.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Store JSON
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Using JSON Data Type: Some databases allow you to define a column with a JSON data type.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;   &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;Products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="n"&gt;ProductID&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ProductData&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
   &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;Inserting JSON Data:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;   &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;Products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ProductID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ProductData&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'{"name": "Laptop", "price": 999.99}'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Querying JSON Data
&lt;/h3&gt;

&lt;p&gt;You can use built-in functions to query JSON data.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ProductData&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;ProductName&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Products&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ProductID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Storing XML in SQL Databases
&lt;/h2&gt;

&lt;p&gt;SQL databases also support XML data types, allowing you to store and query XML documents.&lt;/p&gt;
&lt;h4&gt;
  
  
  How to Store XML
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Using XML Data Type: Define a column with an XML data type.
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;   &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="n"&gt;OrderID&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;OrderDetails&lt;/span&gt; &lt;span class="n"&gt;xml&lt;/span&gt;
   &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;Inserting XML Data:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;   &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OrderID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OrderDetails&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;order&amp;gt;&amp;lt;item&amp;gt;Book&amp;lt;/item&amp;gt;&amp;lt;quantity&amp;gt;2&amp;lt;/quantity&amp;gt;&amp;lt;/order&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Querying XML Data
&lt;/h4&gt;

&lt;p&gt;You can use XPath and XQuery to extract data from XML columns.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;OrderDetails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'(/order/item)[1]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'varchar(100)'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;ItemName&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;OrderID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Pros and Cons of Storing JSON and XML
&lt;/h2&gt;
&lt;h4&gt;
  
  
  Pros
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Flexibility: Both JSON and XML allow for flexible data structures, making it easy to store complex data.&lt;/li&gt;
&lt;li&gt;Interoperability: They are widely used formats, making it easier to integrate with other systems and APIs.&lt;/li&gt;
&lt;li&gt;Schema-less: You can store data without a predefined schema, which is useful for evolving data models.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Cons
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Performance: Querying semi-structured data can be slower than querying structured data, especially for large datasets.&lt;/li&gt;
&lt;li&gt;Complexity: Managing and querying JSON and XML data can add complexity to your database operations.&lt;/li&gt;
&lt;li&gt;Storage Overhead: JSON and XML formats can consume more storage space compared to traditional relational data.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Storing JSON and XML in SQL databases provides a powerful way to handle semi-structured data. By leveraging the native support for these formats in modern SQL databases, you can efficiently store, query, and manage complex data structures. Understanding the advantages and limitations of each format will help you make informed decisions about how to best utilize them in your applications.&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;


&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Jin&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Jin, a Business Intelligence Developer with a passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/Luca_DataTeam" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Connect with me on X&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>tutorial</category>
      <category>data</category>
    </item>
    <item>
      <title>Fixing Azure SQL Connection Errors in Azure Scheduled Python Job</title>
      <dc:creator>Jin</dc:creator>
      <pubDate>Fri, 27 Feb 2026 13:37:00 +0000</pubDate>
      <link>https://dev.to/luca1iu/fixing-azure-sql-connection-errors-in-azure-scheduled-python-job-3ldk</link>
      <guid>https://dev.to/luca1iu/fixing-azure-sql-connection-errors-in-azure-scheduled-python-job-3ldk</guid>
      <description>&lt;p&gt;As a Data Analyst, I recently faced a frustrating issue while automating a daily data processing task in Azure.&lt;/p&gt;

&lt;p&gt;The goal was simple: run a scheduled job every morning to process data and sync it to an Azure SQL Database. When I ran the code manually, it worked perfectly. But when the scheduled job (via Azure Functions or Synapse) triggered at 6:00 AM, it crashed immediately.&lt;/p&gt;

&lt;p&gt;Here is the solution to fixing the "Database not available" error without increasing your Azure bill.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Problem
&lt;/h1&gt;

&lt;p&gt;The job failed consistently with &lt;strong&gt;Error 40613&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(pyodbc.Error) ('HY000', "[HY000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Database 'xxxxxxx' on server 'xxxxxxxxxxxxxxxxxx' is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of '{...}'. (40613) (SQLDriverConnect)") (Background on this error at: https://sqlalche.me/e/20/dbapi)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Why this happens
&lt;/h2&gt;

&lt;p&gt;I am using the &lt;strong&gt;Azure SQL Database Serverless&lt;/strong&gt; tier. To save costs, this tier features &lt;strong&gt;Auto-pause&lt;/strong&gt;. If no one uses the database for a set period (e.g., 1 hour), Azure puts it to sleep.&lt;/p&gt;

&lt;p&gt;When my scheduled job runs in the morning, the database is cold. It takes approximately &lt;strong&gt;60 to 90 seconds&lt;/strong&gt; for Azure to spin the compute back up. The default Python connection string gives up before the database is ready.&lt;/p&gt;
&lt;h1&gt;
  
  
  The Expensive Fix (Don't do this)
&lt;/h1&gt;

&lt;p&gt;My first instinct was to disable Auto-pause.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;strong&gt;Azure Portal&lt;/strong&gt; &amp;gt; &lt;strong&gt;SQL Database&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Compute + storage&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Uncheck &lt;strong&gt;Enable auto-pause&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The result:&lt;/strong&gt; The error stopped, but my costs tripled. I was paying for compute 24/7 for a job that only runs for 10 minutes a day. This is not efficient.&lt;/p&gt;
&lt;h1&gt;
  
  
  The Smart Fix: Intelligent Retry Logic
&lt;/h1&gt;

&lt;p&gt;Instead of keeping the server running all night, we should write code that is patient enough to wait for the server to wake up.&lt;/p&gt;

&lt;p&gt;I wrote a custom wrapper for the SQLAlchemy engine that handles the specific behavior of Azure Serverless cold starts.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Code
&lt;/h3&gt;

&lt;p&gt;Here is the robust connection function. It attempts to connect, and if it detects the database is sleeping, it waits and retries until the server is back online.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy.exc&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OperationalError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;InterfaceError&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;connect_sql_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delay_seconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Attempts to connect to the database. 
    If the database is in serverless pause state, it retries until it wakes up.

    max_retries: Default 10. Covers ~5 minutes of startup time.
    delay_seconds: Default 30s. Wait time between attempts.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# Replace with your credentials or use Environment Variables (Recommended)
&lt;/span&gt;    &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your-server.database.windows.net&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;database&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your-database&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your-username&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;password&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your-password&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; 

    &lt;span class="c1"&gt;# LoginTimeout=30 gives the driver time to negotiate the handshake
&lt;/span&gt;    &lt;span class="n"&gt;connection_string&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mssql+pyodbc://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;@&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;?driver=ODBC+Driver+18+for+SQL+Server&amp;amp;LoginTimeout=30&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Create the engine with connection pooling enabled
&lt;/span&gt;    &lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;connection_string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;fast_executemany&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Optimized for bulk inserts
&lt;/span&gt;        &lt;span class="n"&gt;pool_pre_ping&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# Checks connection health before usage
&lt;/span&gt;        &lt;span class="n"&gt;pool_recycle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1800&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Attempting to connect to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Try to execute a simple query to wake the DB
&lt;/span&gt;            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT 1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;gt;&amp;gt; Success: Database is connected and awake!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;

        &lt;span class="nf"&gt;except &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OperationalError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;InterfaceError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Attempt &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; failed. Database might be auto-paused.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error details: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Waiting &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;delay_seconds&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; seconds for wake-up...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delay_seconds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# If we reach here, the database is genuinely down or credentials are wrong
&lt;/span&gt;    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;gt;&amp;gt; Failed to wake up the database after multiple attempts.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Loop:&lt;/strong&gt; It tries to run &lt;code&gt;SELECT 1&lt;/code&gt;. This is a lightweight query that forces Azure to trigger the resume process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Trap:&lt;/strong&gt; If it catches an &lt;code&gt;OperationalError&lt;/code&gt; (which covers the 40613 code), it pauses the script for 30 seconds using &lt;code&gt;time.sleep()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Success:&lt;/strong&gt; Once Azure allocates the compute (usually after attempt 2 or 3), the connection succeeds, and the function returns the active &lt;code&gt;engine&lt;/code&gt; object for your pipeline to use.&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;
  
  
  Summary
&lt;/h1&gt;

&lt;p&gt;Don't change your infrastructure to fit your code; change your code to fit the infrastructure. By handling the "cold start" in Python, you keep the cost benefits of Serverless architecture while maintaining the reliability of a Production environment.&lt;/p&gt;

&lt;p&gt;Happy coding!&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;


&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Jin&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Jin, a Business Intelligence Developer with a passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;🚀 Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/Luca_DataTeam" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;🎃 Connect with me on X&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>azure</category>
      <category>database</category>
      <category>automation</category>
      <category>python</category>
    </item>
    <item>
      <title>How to Install Python Package in Azure Synapse for Apache Spark pools</title>
      <dc:creator>Jin</dc:creator>
      <pubDate>Tue, 06 Jan 2026 21:58:00 +0000</pubDate>
      <link>https://dev.to/luca1iu/how-to-install-python-package-in-azure-synapse-for-apache-spark-pools-4pjj</link>
      <guid>https://dev.to/luca1iu/how-to-install-python-package-in-azure-synapse-for-apache-spark-pools-4pjj</guid>
      <description>&lt;h2&gt;
  
  
  Efficiently Installing Python Packages in Azure Synapse Analytics
&lt;/h2&gt;

&lt;p&gt;When working in Azure Synapse notebooks, you can use the %pip command (e.g., %pip install pandas) in a code cell to install packages. However, this method is temporary. The package is only installed for the current notebook session and must be re-installed every time the session starts.&lt;/p&gt;

&lt;p&gt;This repetition can lead to significant delays in notebook execution and is inefficient for frequently run jobs.&lt;/p&gt;

&lt;p&gt;A more permanent and efficient solution is to install packages directly onto the Apache Spark pool. This approach ensures the libraries are pre-installed and automatically available in every session attached to that pool.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Install Packages at the Spark Pool Level
&lt;/h2&gt;

&lt;p&gt;This method involves uploading a requirements.txt file that specifies the packages and versions you need.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to your Azure Synapse workspace in the Azure portal.&lt;/li&gt;
&lt;li&gt;Navigate to the "Manage" section on the left-hand side.&lt;/li&gt;
&lt;li&gt;Select "Apache Spark pools" under the "Analytics pools" section.&lt;/li&gt;
&lt;li&gt;Choose the Spark pool where you want to install the package.&lt;/li&gt;
&lt;li&gt;move your mouth to the three dots on the right side of the Spark pool and click on "Packages".&lt;/li&gt;
&lt;li&gt;upload &lt;code&gt;requirements.txt&lt;/code&gt; file which contains the list of packages you want to install. &lt;/li&gt;
&lt;li&gt;Click Apply to save the changes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Futjmsqs39tv57h4az884.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Futjmsqs39tv57h4az884.png" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Spark pool will update and automatically install the specified packages. This may take a few minutes. Once complete, all notebooks attached to this pool will have access to these libraries by default.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to generate &lt;code&gt;requirements.txt&lt;/code&gt; file
&lt;/h2&gt;

&lt;p&gt;The requirements.txt file is a simple text file that lists the packages to be installed. You can easily generate this file from your local Python environment.&lt;/p&gt;

&lt;p&gt;Open your terminal or command prompt and run the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip freeze &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This command captures all packages and their exact versions from your current environment and saves them into a file named requirements.txt. Uploading this file ensures that the exact same package versions are installed in your Synapse environment, providing consistency and preventing dependency conflicts.&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;


&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Jin&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Jin, a Business Intelligence Developer with a passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;🚀 Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/Luca_DataTeam" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;🎃 Connect with me on X&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>azure</category>
      <category>tutorial</category>
      <category>python</category>
      <category>data</category>
    </item>
    <item>
      <title>How to Calculate a Dynamic Truncated Mean in Power BI Using DAX</title>
      <dc:creator>Jin</dc:creator>
      <pubDate>Tue, 06 Jan 2026 21:57:00 +0000</pubDate>
      <link>https://dev.to/luca1iu/how-to-calculate-a-dynamic-truncated-mean-in-power-bi-using-dax-gij</link>
      <guid>https://dev.to/luca1iu/how-to-calculate-a-dynamic-truncated-mean-in-power-bi-using-dax-gij</guid>
      <description>&lt;h2&gt;
  
  
  Why You Need a Truncated Mean
&lt;/h2&gt;

&lt;p&gt;In data analysis, the standard AVERAGE function is a workhorse, but it has a significant weakness: it is highly susceptible to distortion from outliers. A single extreme value, whether high or low, can skew the entire result, misrepresenting the data's true central tendency.&lt;/p&gt;

&lt;p&gt;This is where the truncated mean becomes essential. It provides a more robust measure of average by excluding a specified percentage of the smallest and largest values from the calculation.&lt;/p&gt;

&lt;p&gt;While modern Power BI models have a built-in TRIMMEAN function, this function is often unavailable when using a Live Connection to an older Analysis Services (SSAS) model. This article provides a robust, manual DAX pattern that replicates this functionality and remains fully dynamic, responding to all slicers and filters in your report.&lt;/p&gt;

&lt;h2&gt;
  
  
  The DAX Solution for a Dynamic Truncated Mean
&lt;/h2&gt;

&lt;p&gt;This measure calculates a 20% truncated mean by removing the bottom 10% and top 10% of values before averaging the remaining 80%.&lt;/p&gt;

&lt;p&gt;You can paste this code directly into the "New Measure" formula bar.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Trimmed Mean (20%) = 
VAR TargetTable = 'FactTable'
VAR TargetColumn = 'FactTable'[MeasureColumn]
VAR LowerPercentile = 0.10 // Defines the bottom 10% to trim
VAR UpperPercentile = 0.90 // Defines the top 10% to trim (1.0 - 0.10)

// 1. Find the value at the 10th percentile
VAR MinThreshold =
    PERCENTILEX.INC(
        FILTER( 
            TargetTable, 
            NOT( ISBLANK( TargetColumn ) ) 
        ),
        TargetColumn,
        LowerPercentile
    )

// 2. Find the value at the 90th percentile
VAR MaxThreshold =
    PERCENTILEX.INC(
        FILTER( 
            TargetTable, 
            NOT( ISBLANK( TargetColumn ) ) 
        ),
        TargetColumn,
        UpperPercentile
    )

// 3. Calculate the average, including only values between the thresholds
RETURN
CALCULATE(
    AVERAGEX(
        FILTER(
            TargetTable,
            TargetColumn &amp;gt;= MinThreshold &amp;amp;&amp;amp;
            TargetColumn &amp;lt;= MaxThreshold
        ),
        TargetColumn
    )
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Deconstructing the DAX Logic
&lt;/h2&gt;

&lt;p&gt;This formula works in three distinct steps, all of which execute within the current filter context (e.g., whatever slicers the user has selected).&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Define Key Variables&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;TargetTable&lt;/code&gt; &amp;amp; &lt;code&gt;TargetColumn&lt;/code&gt;: We assign the table and column names to variables for clean, reusable code. You must change 'FactTable'[MeasureColumn] to match your data model.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;LowerPercentile&lt;/code&gt; / &lt;code&gt;UpperPercentile&lt;/code&gt;: We define the boundaries. 0.10 and 0.90 mean we are trimming the bottom 10% and top 10%. To trim 5% from each end (a 10% total trim), you would use 0.05 and 0.95.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  2. Find the Percentile Thresholds
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;MinThreshold&lt;/code&gt; &amp;amp; &lt;code&gt;MaxThreshold&lt;/code&gt;: These variables store the actual values that correspond to our percentile boundaries.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PERCENTILEX.INC&lt;/code&gt;: We use this "iterator" function because it allows us to first FILTER the table.&lt;/li&gt;
&lt;li&gt;`FILTER(..., NOT(ISBLANK(...))): This is a crucial step. We calculate the percentiles only for rows where our target column is not blank. This prevents BLANK() values from skewing the percentile calculation.&lt;/li&gt;
&lt;li&gt;The result is that &lt;code&gt;MinThreshold&lt;/code&gt; holds the value of the 10th percentile (e.g., 4.5) and &lt;code&gt;MaxThreshold&lt;/code&gt; holds the value of the 90th percentile (e.g., 88.2) for the currently visible data.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  3. Calculate the Final Average
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;RETURN CALCULATE(...)&lt;/code&gt;: The CALCULATE function is the key to making the measure dynamic. It ensures the entire calculation respects the filters applied by any slicers or visuals in the report.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;AVERAGEX(FILTER(...))&lt;/code&gt;: The core of the calculation. We use AVERAGEX to iterate over a table.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;FILTER(...)&lt;/code&gt;: We filter our TargetTable a final time. This filter is the "trim." It keeps only the rows where the value in TargetColumn is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Greater than or equal to&lt;/strong&gt; our MinThreshold&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AND&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Less than or equal to&lt;/strong&gt; our MaxThreshold&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;AVERAGEX(..., TargetColumn)&lt;/code&gt;: &lt;code&gt;AVERAGEX&lt;/code&gt; then calculates the simple average of &lt;code&gt;TargetColumn&lt;/code&gt; for only the rows that passed the filter.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;By implementing this DAX pattern, you create a robust, dynamic, and outlier-resistant KPI. This measure provides a more accurate picture of your data's central tendency and will correctly re-calculate on the fly as users interact with your Power BI report.&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;


&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Jin&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Jin, a Business Intelligence Developer with a passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;🚀 Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/Luca_DataTeam" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;🎃 Connect with me on X&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>powerbi</category>
      <category>tutorial</category>
      <category>dax</category>
      <category>data</category>
    </item>
    <item>
      <title>Data Security in SQL: Encryption, Roles, and Permissions</title>
      <dc:creator>Jin</dc:creator>
      <pubDate>Tue, 09 Dec 2025 16:45:00 +0000</pubDate>
      <link>https://dev.to/luca1iu/data-security-in-sql-encryption-roles-and-permissions-17g</link>
      <guid>https://dev.to/luca1iu/data-security-in-sql-encryption-roles-and-permissions-17g</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In today's digital age, data security is paramount. SQL databases often store sensitive information, making it crucial to implement robust security measures. This article explores three key strategies for securing data in SQL: encryption, roles, and permissions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Encrypting Sensitive Columns
&lt;/h2&gt;

&lt;p&gt;Encryption is the process of converting data into a coded format to prevent unauthorized access. In SQL, encrypting sensitive columns such as passwords and credit card data is essential.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Encrypt Data in SQL
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Choose an Encryption Algorithm&lt;/strong&gt;: Common algorithms include AES (Advanced Encryption Standard) and RSA (Rivest-Shamir-Adleman).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement Column-Level Encryption&lt;/strong&gt;: Use SQL commands to encrypt specific columns. For example:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;   &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;Users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="n"&gt;UserID&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;Username&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
       &lt;span class="n"&gt;Password&lt;/span&gt; &lt;span class="nb"&gt;varbinary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;ENCRYPTED&lt;/span&gt; &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;ENCRYPTION&lt;/span&gt;
   &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Manage Encryption Keys&lt;/strong&gt;: Store and manage encryption keys securely, using a key management system.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Using Roles and Permissions Effectively
&lt;/h2&gt;

&lt;p&gt;Roles and permissions control who can access or modify data within the database. Properly configured roles and permissions are vital for data security.&lt;/p&gt;
&lt;h4&gt;
  
  
  Setting Up Roles
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Define Roles: Identify different user roles (e.g., admin, user, guest) and their access needs.&lt;/li&gt;
&lt;li&gt;Create Roles in SQL:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;   &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="k"&gt;admin&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
   &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="k"&gt;user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Assigning Permissions
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Grant Permissions&lt;/strong&gt;: Assign specific permissions to roles. For example:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;   &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;Users&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="k"&gt;user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
   &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="k"&gt;PRIVILEGES&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;Users&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="k"&gt;admin&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Review and Update Regularly&lt;/strong&gt;: Regularly audit permissions to ensure they align with current security policies.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Masking Sensitive Data with Views
&lt;/h2&gt;

&lt;p&gt;Data masking involves creating a version of the data that obscures sensitive information, allowing users to work with data without exposing sensitive details.&lt;br&gt;
Implementing Data Masking&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create Views: Use SQL views to present masked data. For example:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;    &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;MaskedUsers&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;UserID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'****'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;Password&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;Control Access to Views: Ensure only authorized users can access the views.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Securing data in SQL databases requires a multi-faceted approach. By encrypting sensitive columns, using roles and permissions effectively, and masking data with views, you can significantly enhance your database's security. Implement these strategies to protect your data from unauthorized access and breaches.&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;


&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Jin&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Jin, a Business Intelligence Developer with a passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;🚀 Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>database</category>
      <category>tutorial</category>
      <category>sql</category>
      <category>data</category>
    </item>
    <item>
      <title>Stuck in a Version Trap - How I Used Azure ML to Deploy an Azure Function</title>
      <dc:creator>Jin</dc:creator>
      <pubDate>Mon, 08 Dec 2025 09:52:00 +0000</pubDate>
      <link>https://dev.to/luca1iu/stuck-in-a-version-trap-how-i-used-azure-ml-to-deploy-an-azure-function-19ke</link>
      <guid>https://dev.to/luca1iu/stuck-in-a-version-trap-how-i-used-azure-ml-to-deploy-an-azure-function-19ke</guid>
      <description>&lt;p&gt;As a developer, there is no worse feeling than being completely blocked. This is the story of how I got stuck in a "version trap" between my company PC, VS Code, and Azure... and how I used a cloud VM to escape.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Date:&lt;/strong&gt; November 17, 2025&lt;/p&gt;

&lt;h1&gt;
  
  
  The Version Trap
&lt;/h1&gt;

&lt;p&gt;My goal was to create a new Azure Function in Python. I checked the Azure Portal, and I was excited to see that the Function App runtime now &lt;strong&gt;supports Python 3.13&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;My company laptop has Python 3.13 installed, so I thought this would be easy. I opened VS Code, installed the Azure Functions extension, and tried to create a new project.&lt;/p&gt;

&lt;p&gt;When the extension asked me to select my Python interpreter, I pointed it to my &lt;code&gt;Python313\python.exe&lt;/code&gt;. Immediately, I hit a wall:&lt;/p&gt;

&lt;p&gt;Error: &lt;code&gt;Python version 3.13.8 does not match supported versions...&lt;/code&gt; &lt;/p&gt;

&lt;p&gt;The problem is that the &lt;strong&gt;cloud runtime&lt;/strong&gt; (in Azure) is updated &lt;em&gt;before&lt;/em&gt; the &lt;strong&gt;local development tools&lt;/strong&gt; (the VS Code extension and Core Tools). My local tools were out of sync with the cloud and didn't recognize 3.13 as valid yet.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Real-World Constraint: The Corporate PC
&lt;/h1&gt;

&lt;p&gt;The standard solution is simple: "Just install a supported version, like Python 3.11."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My problem:&lt;/strong&gt; I can't. This is a locked-down company laptop. Installing new software requires a multi-day approval process with the IT department. (My &lt;em&gt;other&lt;/em&gt; local Python 3.11 installation was also broken and missing key modules like &lt;code&gt;pip&lt;/code&gt; and &lt;code&gt;venv&lt;/code&gt;, but I couldn't get admin rights to fix it.)&lt;/p&gt;

&lt;p&gt;I was completely blocked. I couldn't develop locally.&lt;/p&gt;

&lt;h1&gt;
  
  
  The "Aha!" Moment: Use a Cloud Dev Box
&lt;/h1&gt;

&lt;p&gt;As a Data Analyst, I already have access to an &lt;strong&gt;Azure ML (Machine Learning) Compute Instance&lt;/strong&gt;. I realized: &lt;em&gt;that compute instance is just a fully-featured Linux VM in the cloud that I control.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;What if I treated my Azure ML instance as my &lt;em&gt;new&lt;/em&gt; "local" development machine?&lt;/p&gt;

&lt;h1&gt;
  
  
  The Solution: Deploying from Azure ML to Azure Functions
&lt;/h1&gt;

&lt;p&gt;This workflow completely bypassed my locked-down company PC and was surprisingly simple.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Connect VS Code to the Azure ML Instance&lt;/strong&gt; This is the most important step. In VS Code, I installed the &lt;strong&gt;Azure Machine Learning&lt;/strong&gt; extension. In its panel, I found my Compute Instance, right-clicked, and selected "Connect to Compute Instance." VS Code reloaded in a "Remote SSH" session, and my VS Code terminal was now a terminal &lt;em&gt;inside my cloud VM&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Create the Project &lt;em&gt;on the ML Instance&lt;/em&gt;&lt;/strong&gt; Now, inside this remote session, I opened a folder &lt;em&gt;on the ML instance&lt;/em&gt; and ran the &lt;code&gt;F1&lt;/code&gt; &amp;gt; &lt;code&gt;Azure Functions: Create New Project...&lt;/code&gt; command. The VM already had Python 3.10 installed, so the tools were perfectly happy. I also created my &lt;code&gt;TimerTrigger&lt;/code&gt; function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Set Up the Environment (The "F5" Fix)&lt;/strong&gt; My code needs &lt;code&gt;pandas&lt;/code&gt; and &lt;code&gt;pyodbc&lt;/code&gt;. I opened the VS Code terminal (which is connected to my ML instance) and ran these commands to create a virtual environment and install my packages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a virtual environment using the VM's Python 3.10&lt;/span&gt;
python3.10 &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv

&lt;span class="c"&gt;# Activate it&lt;/span&gt;
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate

&lt;span class="c"&gt;# Install my packages&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Step 4: Debug "Remotely"&lt;/strong&gt; This is the magic part. I pressed &lt;strong&gt;F5&lt;/strong&gt;. The code &lt;em&gt;ran on the ML instance&lt;/em&gt;, but the debugger connected to my local VS Code. I could set breakpoints and inspect variables just as if it were running on my own laptop. I successfully debugged my function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5: Deploy from Cloud to Cloud&lt;/strong&gt; Once I was happy with my code, I clicked on the Azure extension icon (inside my remote VS Code session). I found my target Function App, right-clicked, and selected &lt;strong&gt;"Deploy to Function App..."&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;VS Code packaged all the code &lt;em&gt;from my Azure ML instance&lt;/em&gt; and deployed it directly &lt;em&gt;to my Azure Functions app&lt;/em&gt;. My local PC was just a "thin client" for the whole process.&lt;/p&gt;
&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Don't let a locked-down corporate PC block you from getting work done. If your local tools are out of date or broken, you can use any cloud VM (like an Azure ML Compute Instance) as a powerful, modern development environment. By using the VS Code Remote-SSH features, you can get the best of both worlds.&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;


&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Jin&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Jin, a Business Intelligence Developer with a passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;🚀 Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>azure</category>
      <category>dataanalyst</category>
      <category>dataengineering</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>10 Essential Data Science Algorithms &amp; Techniques</title>
      <dc:creator>Jin</dc:creator>
      <pubDate>Mon, 08 Dec 2025 09:51:00 +0000</pubDate>
      <link>https://dev.to/luca1iu/10-essential-data-science-algorithms-techniques-58bp</link>
      <guid>https://dev.to/luca1iu/10-essential-data-science-algorithms-techniques-58bp</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;The world of data science can seem intimidating, filled with complex equations and advanced statistical concepts. Many aspiring data scientists feel they need to be a "math master" before even beginning. But here's a secret: while a deep understanding of the mathematical foundations of every algorithm is certainly powerful, it's not a prerequisite to becoming an effective data scientist.&lt;/p&gt;

&lt;p&gt;What truly matters is developing an intuitive understanding of what these powerful algorithms do, when to unleash them, and why one might be chosen over another. Think of it less like building an engine from scratch, and more like knowing which tool to pick from a well-stocked toolbox to get the job done right. This article will cut through the jargon and introduce you to 10 essential algorithms and techniques—the workhorses of data science—equipping you with the practical knowledge you need to start building intelligent solutions today.&lt;/p&gt;

&lt;h1&gt;
  
  
  I. Foundational Supervised Learning
&lt;/h1&gt;

&lt;p&gt;Supervised Learning is the most common type of machine learning. It's like learning with a teacher or flashcards. You give the algorithm a dataset where you already know the correct answers (called "labels").&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Linear Regression
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is&lt;/strong&gt;: Linear Regression is a fundamental algorithm that finds the best-fit straight line showing the relationship between variables. Its goal is to predict a continuous numerical value (e.g., a house price, a person's weight, or sales) based on one or more input features (e.g., house size, a person's height, or ad spending).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;When your goal is to predict a continuous number (e.g., forecasting sales, estimating a price).&lt;/li&gt;
&lt;li&gt;When you need to understand the strength and direction of the relationship between variables (e.g., "How much does ad spending really impact sales?").&lt;/li&gt;
&lt;li&gt;As a simple, fast baseline to compare against more complex models.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense":&lt;/strong&gt; You should think of Linear Regression immediately when your primary question is "How much...?" or "What value...?" and you have a numerical target to predict. If you suspect the relationship between your inputs and output is relatively simple (e.g., "more square footage = higher house price"), and you value speed and interpretability (it's easy to explain why it made a prediction), it's your perfect starting point.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most common tool is Scikit-learn (sklearn).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features (e.g., [[square_feet, num_bedrooms]])
# y = your target (e.g., [price])
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LinearRegression&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Train the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Check the relationship (e.g., the slope of the line)
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Coefficients: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;coef_&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  2. Logistic Regression
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; Despite its name, Logistic Regression is used for classification tasks. Its goal is to predict the probability that an input belongs to a specific category(e.g., spam vs. not spam, disease vs. no disease) based on input features.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When your goal is to predict a category(e.g., spam/not spam, fraud/not fraud, pass/fail). This is most common for binary problems.&lt;/li&gt;
&lt;li&gt;When you need the probability of an outcome(e.g., what is the likelihood this customer will click the ad?).&lt;/li&gt;
&lt;li&gt;As a simple, fast and highly interpretable baseline for classification. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Data Scientist's "Sense": You should think of Logistic Regression immediately when your primary question is "Is it A or B?" "Will this happen?" or "What's the probability of...?" for a categorical outcome. It's the classification equivalent of Linear Regression—your first, most straightforward tool for the job. Its ability to provide probabilities makes it more useful than just a "yes" or "no" answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most common tool is Scikit-learn (sklearn).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LogisticRegression&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features (e.g., [[hours_studied, past_failures]])
# y = your target (e.g., [pass, fail])
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LogisticRegression&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Train the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get class predictions (e.g., 'pass' or 'fail')
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Get the probabilities
&lt;/span&gt;&lt;span class="n"&gt;probabilities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict_proba&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  3. K-Nearest Neighbors (KNN)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; KNN is a simple and intuitive algorithm that classifies a new data point based on its 'neighbors', it finds the 'k' closest data points from the training set and makes a prediction based on their majority vote. If K=5 and 3 out of 5 neighbors are 'spam', the new point is classified as 'spam'.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For classification (and regression) tasks where the underlying data relationships are complex but "similarity" is a good predictor (e.g., "birds of a feather flock together").&lt;/li&gt;
&lt;li&gt;As a simple, "non-parametric" or "lazy" model, meaning it makes no assumptions about the underlying data distribution. It doesn't "learn" a line; it just memorizes the data.&lt;/li&gt;
&lt;li&gt;For tasks like recommendation engines (e.g., "users similar to you also liked...").&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense"&lt;/strong&gt;: You should think of KNN when your features are in a similar scale (e.g., all numbers from 1-10) and you believe the core idea "tell me who your friends are, and I'll tell you who you are" applies to your data. It's great when you have well-defined, distinct clusters in your data. It's often outperformed by more advanced models but is a fantastic, simple baseline, especially if you don't have a lot of features.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most common tool is Scikit-learn (sklearn).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.neighbors&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;KNeighborsClassifier&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features
# y = your target classes
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model (e.g., we'll look at 5 neighbors)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KNeighborsClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_neighbors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Train the model (it just stores the data)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get class predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  4. Support Vector Machines (SVM)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; SVM is a powerful classification algorithm that finds the optimal "hyperplane" (a boundary line) that best separates data points into different classes. Its main goal is to find the line that has the largest possible "margin" or buffer zone between the closest points of each class. These closest points are called the "support vectors."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For complex classification tasks where classes are well-defined but may not be separable by a simple straight line.&lt;/li&gt;
&lt;li&gt;In high-dimensional spaces (data with many features), such as text classification (where every word is a feature) or image recognition.&lt;/li&gt;
&lt;li&gt;When you need a model that is robust against overfitting, especially in cases with many features.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense":&lt;/strong&gt; You should think of SVM when you need a highly accurate classifier and believe a clear separating boundary exists, even if it's complex. If Logistic Regression is too simple, but a Neural Network seems like overkill, SVM is your strong, sophisticated middle-ground. It's particularly powerful for text classification and other "wide" data problems (more columns/features than rows).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most common tool is Scikit-learn (sklearn).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.svm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SVC&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features
# y = your target classes
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model
# (kernel='linear' is a straight line, 'rbf' is more complex)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SVC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kernel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rbf&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 

&lt;span class="c1"&gt;# 2. Train the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get class predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h1&gt;
  
  
  II. Ensemble Methods(The Power-Players)
&lt;/h1&gt;

&lt;p&gt;Ensemble Methods are techniques that combine multiple machine learning models to produce one, superior model. Instead of relying on a single "expert," this method gets the "opinion" (prediction) from a diverse group of models and combines them.&lt;/p&gt;
&lt;h2&gt;
  
  
  5. Decision Trees
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A Decision Tree is an intuitive algorithm that works like a flowchart. It asks a series of sequential "if-then-else" questions about your data's features, splitting the data at each step. This process continues until it reaches a "leaf node" that provides a final prediction (either a class or a numerical value).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For both classification (e.g., "survived" or "died") and regression (e.g., "predict price") tasks.&lt;/li&gt;
&lt;li&gt;When the most important requirement is interpretability. You can visually see and explain every step the model took to reach its decision.&lt;/li&gt;
&lt;li&gt;As the fundamental building block for more powerful ensemble models like Random Forests and XGBoost.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense":&lt;/strong&gt; You should think of a Decision Tree whenever a non-technical stakeholder needs to understand why a prediction is being made. It's the "white-box" model. While often not the most accurate on its own (it can easily "overfit" or memorize the data), it's the perfect tool for explaining complex relationships in a simple, visual way and serves as a great baseline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most common tool is Scikit-learn (sklearn).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# For classification:
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.tree&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DecisionTreeClassifier&lt;/span&gt;

&lt;span class="c1"&gt;# For regression:
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.tree&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DecisionTreeRegressor&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features
# y = your target classes
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model (e.g., limit depth to prevent overfitting)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DecisionTreeClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Train the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get class predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  6. Random Forests
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A Random Forest is an ensemble algorithm. It builds a large number of individual Decision Trees during training. For a new prediction, each tree "votes," and the Random Forest outputs the most popular class (for classification) or the average (for regression) from all the trees. It uses randomness when building the trees to ensure they are all different, which makes the combined model much more powerful and accurate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For both classification and regression tasks where you need high accuracy and robustness.&lt;/li&gt;
&lt;li&gt;When you want to prevent overfitting, which is a common problem with single Decision Trees.&lt;/li&gt;
&lt;li&gt;To get a good "out-of-the-box" model with very little tuning required.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense":&lt;/strong&gt; This is the go-to, workhorse algorithm. You should think of Random Forest when a single Decision Tree isn't accurate enough. It's the "wisdom of the crowd" approach—one tree might be wrong, but the average of 1,000 trees is highly reliable. It's almost always a strong first choice when you need a high-performance model and don't want to spend a lot of time on complex tuning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most common tool is Scikit-learn (sklearn).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# For classification:
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.ensemble&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RandomForestClassifier&lt;/span&gt;

&lt;span class="c1"&gt;# For regression:
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.ensemble&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RandomForestRegressor&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features
# y = your target
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model (e.g., build 100 trees)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RandomForestClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_estimators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Train the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get class predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  7. Gradient Boosting Machines (GBM)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; GBM is a powerful ensemble technique that builds models (typically decision trees) sequentially. Unlike Random Forest which builds trees independently, GBM builds one tree at a time, where each new tree's job is to correct the errors and weaknesses of all the trees that came before it. It's a "boosting" method because it incrementally "boosts" the model's performance by focusing on its past mistakes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For classification and regression tasks where high accuracy is the top priority.&lt;/li&gt;
&lt;li&gt;When you are willing to spend more time tuning parameters to get the best possible performance.&lt;/li&gt;
&lt;li&gt;When a Random Forest model is performing well, but you need an extra performance boost.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense":&lt;/strong&gt; You should think of GBM when "good" isn't good enough and you need "great." It's the "team of experts" approach: the first tree makes a guess, the second tree corrects the first tree's mistakes, the third corrects the remaining mistakes, and so on. It's extremely powerful but can overfit if not tuned carefully (e.g., by limiting the number of trees or their depth). It's the direct predecessor to XGBoost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most common tool is Scikit-learn (sklearn).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# For classification:
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.ensemble&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GradientBoostingClassifier&lt;/span&gt;

&lt;span class="c1"&gt;# For regression:
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.ensemble&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GradientBoostingRegressor&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features
# y = your target
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model (e.g., build 100 trees sequentially)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GradientBoostingClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_estimators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Train the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get class predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  8. XGBoost(Extreme Gradient Boosting)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; XGBoost is not a new algorithm, but a specific implementation of Gradient Boosting (GBM) that has been heavily optimized for speed, efficiency, and performance. Like GBM, it builds trees sequentially to correct errors, but it includes several clever tricks (like parallel processing and built-in "regularization") that make it faster and generally more accurate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When maximum predictive accuracy is the absolute top priority.&lt;/li&gt;
&lt;li&gt;On structured or tabular data (like spreadsheets or database tables).&lt;/li&gt;
&lt;li&gt;In data science competitions (like Kaggle), where it is famous for being a dominant, winning algorithm.&lt;/li&gt;
&lt;li&gt;When you need a model that's both high-performing and computationally efficient (faster than standard GBM).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense":&lt;/strong&gt; You should think of XGBoost as the default "go-to" algorithm for high-performance modeling on tabular data. It's the "race car" version of Gradient Boosting. If your Random Forest or basic GBM model is good, XGBoost is what you use to make it great. It's the first thing most data scientists try when they are serious about winning a competition or squeezing every last drop of accuracy out of their data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: It uses its own dedicated library, xgboost.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;xgboost&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;

&lt;span class="c1"&gt;# For classification:
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;XGBClassifier&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# For regression:
# model = xgb.XGBRegressor()
&lt;/span&gt;
&lt;span class="c1"&gt;# X = your features
# y = your target
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model
# (XGBoost has many tuning parameters, but defaults work well)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;XGBClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;use_label_encoder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eval_metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;logloss&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Train the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get class predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h1&gt;
  
  
  III. Unsupervised Learning &amp;amp; Deep Learning
&lt;/h1&gt;

&lt;p&gt;Unsupervised Learning is a type of machine learning where the algorithm is given data without any labels or correct answers. It's like "learning without a teacher."&lt;/p&gt;

&lt;p&gt;Deep Learning is a specific, advanced subfield of machine learning that uses "deep" Neural Networks—networks with many layers. These layers allow the model to learn incredibly complex, hierarchical patterns directly from raw data&lt;/p&gt;
&lt;h2&gt;
  
  
  9. K-Means Clustering
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; K-Means is the most popular unsupervised algorithm. This means it's used when you don't have a target variable or pre-defined labels. Its goal is to find hidden structures in data by automatically grouping similar data points into "K" (a number you choose) distinct clusters. It works by finding "centroids" (the center point of a cluster) and assigning each data point to the nearest one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When you have unlabeled data and want to discover its natural groupings.&lt;/li&gt;
&lt;li&gt;For customer segmentation (e.g., finding different types of shoppers).&lt;/li&gt;
&lt;li&gt;For anomaly detection (points far from any cluster center can be outliers).&lt;/li&gt;
&lt;li&gt;To simplify a dataset by grouping similar items.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense":&lt;/strong&gt; You should think of K-Means immediately when your primary question is "What are the natural groups in my data?" or "How can I segment this?" It's not for predicting a known answer, but for discovering unknown patterns. It's the go-to tool for exploratory analysis when you need to understand your data's inherent structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most common tool is Scikit-learn (sklearn).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.cluster&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;KMeans&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features (unlabeled data)
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model (e.g., we want to find 3 clusters)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KMeans&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_clusters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Train the model (it finds the clusters)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get the cluster labels for each data point
&lt;/span&gt;&lt;span class="n"&gt;cluster_labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;labels_&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Get the center point of each cluster
&lt;/span&gt;&lt;span class="n"&gt;centroids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cluster_centers_&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  10. Neural Networks
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A Neural Network is a powerful algorithm inspired by the structure of the human brain. It's built from layers of interconnected "nodes" or "neurons" that process information. "Deep Learning" simply refers to Neural Networks that have many layers ("deep" networks), allowing them to learn extremely complex, hierarchical patterns from vast amounts of data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When working with unstructured data like images (e.g., object recognition), text (e.g., translation, sentiment analysis), and audio (e.g., speech-to-text).&lt;/li&gt;
&lt;li&gt;For highly complex problems where other models (like XGBoost) are not powerful enough.&lt;/li&gt;
&lt;li&gt;When peak performance is the primary goal, and "explainability" (interpretability) is less of a concern.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense":&lt;/strong&gt; You should think of Neural Networks as your heavy-duty, specialized tool. While XGBoost dominates on tabular (spreadsheet) data, Deep Learning is the undisputed champion for perception and language tasks. If your problem involves "seeing" (images), "hearing" (audio), or "understanding" (text), a Neural Network is almost always the right choice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most popular libraries are Keras (often with TensorFlow) and PyTorch.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# A simple example using Keras (with TensorFlow backend)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorflow.keras.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Sequential&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorflow.keras.layers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Dense&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features
# y = your target
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model (a simple, sequential stack of layers)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Sequential&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_shape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],)))&lt;/span&gt; &lt;span class="c1"&gt;# Input layer
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;                            &lt;span class="c1"&gt;# Hidden layer
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sigmoid&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;                          &lt;span class="c1"&gt;# Output layer (for classification)
&lt;/span&gt;
&lt;span class="c1"&gt;# 2. Compile the model (set up the learning process)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;adam&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;binary_crossentropy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Train the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Get predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;We've journeyed through 10 essential algorithms and techniques, from the foundational simplicity of Linear Regression to the advanced power of Deep Learning. Remember, the goal isn't to become a theoretical mathematician overnight, but to cultivate a practical intuition for these tools.&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;


&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Jin&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Jin, a Business Intelligence Developer with a passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;🚀 Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>algorithms</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
