<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dhanvina N</title>
    <description>The latest articles on DEV Community by Dhanvina N (@ndhanvina).</description>
    <link>https://dev.to/ndhanvina</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1976429%2F54c250a2-1a38-4d6c-bb7e-4f1978bee4a2.jpeg</url>
      <title>DEV Community: Dhanvina N</title>
      <link>https://dev.to/ndhanvina</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ndhanvina"/>
    <language>en</language>
    <item>
      <title>Machine Learning Roadmap</title>
      <dc:creator>Dhanvina N</dc:creator>
      <pubDate>Tue, 02 Dec 2025 16:46:58 +0000</pubDate>
      <link>https://dev.to/ndhanvina/machine-learning-roadmap-4d7o</link>
      <guid>https://dev.to/ndhanvina/machine-learning-roadmap-4d7o</guid>
      <description>&lt;p&gt;&lt;em&gt;A Complete Foundation for Becoming a Strong ML Engineer&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffj4jcjs0bafvynt9dq5i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffj4jcjs0bafvynt9dq5i.png" alt=" " width="800" height="723"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Machine Learning stands on &lt;strong&gt;three fundamental pillars&lt;/strong&gt;:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Mathematics&lt;/strong&gt;
&lt;/h3&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Statistics&lt;/strong&gt;
&lt;/h3&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Programming&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Without these foundations, ML becomes just black-box code.&lt;br&gt;&lt;br&gt;
With them, you understand how models work — and how to optimize them.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Math You Actually Need
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Linear Algebra — Matrix Manipulation&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;vectors
&lt;/li&gt;
&lt;li&gt;matrices
&lt;/li&gt;
&lt;li&gt;dot products
&lt;/li&gt;
&lt;li&gt;eigenvalues
&lt;/li&gt;
&lt;li&gt;PCA &amp;amp; SVD
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Calculus — Optimization &amp;amp; Gradients&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;derivatives
&lt;/li&gt;
&lt;li&gt;partial derivatives
&lt;/li&gt;
&lt;li&gt;chain rule
&lt;/li&gt;
&lt;li&gt;gradient descent
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Probability — Modelling Uncertainty&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;distributions
&lt;/li&gt;
&lt;li&gt;random variables
&lt;/li&gt;
&lt;li&gt;Bayes theorem
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Statistics — Understanding Data&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;mean, variance
&lt;/li&gt;
&lt;li&gt;hypothesis testing
&lt;/li&gt;
&lt;li&gt;confidence intervals
&lt;/li&gt;
&lt;li&gt;correlation
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Data Manipulation Skills
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;NumPy&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;vectorization
&lt;/li&gt;
&lt;li&gt;broadcasting
&lt;/li&gt;
&lt;li&gt;matrix ops
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Pandas&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;cleaning data
&lt;/li&gt;
&lt;li&gt;merging
&lt;/li&gt;
&lt;li&gt;grouping
&lt;/li&gt;
&lt;li&gt;time-series ops
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Matplotlib&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;histograms
&lt;/li&gt;
&lt;li&gt;2D/3D plots
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Seaborn&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;heatmaps
&lt;/li&gt;
&lt;li&gt;pairplots
&lt;/li&gt;
&lt;li&gt;correlations
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Core Machine Learning Branches
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Supervised Learning&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Regression, classification, neural networks.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Unsupervised Learning&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Clustering, dimensionality reduction, anomaly detection.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Reinforcement Learning&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Agents, robotics, decision-making.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Learn ML the Right Way
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Project-Based Learning&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Build small projects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;classifiers
&lt;/li&gt;
&lt;li&gt;clustering visualizations
&lt;/li&gt;
&lt;li&gt;NLP pipelines
&lt;/li&gt;
&lt;li&gt;recommender systems
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Read Latest Hugging Face Research&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Stay updated with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;new models
&lt;/li&gt;
&lt;li&gt;tutorials
&lt;/li&gt;
&lt;li&gt;research summaries
&lt;/li&gt;
&lt;li&gt;benchmarks
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🏁 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Machine Learning is built on math, statistics, programming, data skills, and real-world projects.&lt;br&gt;&lt;br&gt;
Master the foundations and you become a strong ML engineer.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>python</category>
      <category>software</category>
    </item>
    <item>
      <title>AI Engineering Roadmap</title>
      <dc:creator>Dhanvina N</dc:creator>
      <pubDate>Tue, 02 Dec 2025 16:16:34 +0000</pubDate>
      <link>https://dev.to/ndhanvina/ai-engineering-roadmap-264i</link>
      <guid>https://dev.to/ndhanvina/ai-engineering-roadmap-264i</guid>
      <description>&lt;p&gt;&lt;em&gt;A complete, practical, industry-level roadmap for becoming an AI engineer who can build real products using LLMs, RAG, agents, fine-tuning, and cloud deployment.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsa97mttnx6p4cza4mb0w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsa97mttnx6p4cza4mb0w.png" alt=" " width="800" height="391"&gt;&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;AI Engineering is evolving faster than any other technical field — and the role today is very different from the classic “data scientist” or “ML researcher.”&lt;br&gt;&lt;br&gt;
In 2025, &lt;strong&gt;AI Engineers are builders.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
They take powerful pretrained models and turn them into real, production-ready AI systems.&lt;/p&gt;

&lt;p&gt;If you’re starting your journey, this roadmap breaks down &lt;strong&gt;exactly what you need to learn&lt;/strong&gt;, &lt;strong&gt;what you don’t&lt;/strong&gt;, and &lt;strong&gt;how to build a portfolio that gets you hired.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No fluff. Only actionable skills.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔥 What AI Engineering &lt;em&gt;Really&lt;/em&gt; Is
&lt;/h2&gt;

&lt;p&gt;AI Engineering is &lt;strong&gt;not&lt;/strong&gt; about training huge models from scratch.&lt;br&gt;&lt;br&gt;
You do &lt;strong&gt;not&lt;/strong&gt; need deep mathematical knowledge, GPU clusters, or research-level ML.&lt;/p&gt;

&lt;p&gt;Instead, AI Engineering focuses on:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Adapting Pretrained Models&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Modern AI models (GPT, Llama, Mistral, CLIP, SAM, Whisper, etc.) are already incredibly powerful.&lt;br&gt;&lt;br&gt;
Your job is to integrate them, adapt them, and make them useful for real-world problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Prompt Engineering&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Knowing how to write precise, structured, reproducible prompts is a core engineering skill — not a trick.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Retrieval-Augmented Generation (RAG)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Connecting LLMs to external data sources to produce reliable, up-to-date answers.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Fine-Tuning &amp;amp; LoRA&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Lightweight, efficient ways of customizing a model for a specific domain without retraining everything.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5. AI Agents &amp;amp; Orchestration&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Agents that reason, plan, take actions, call tools, and work with other agents.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 Core Skills Every AI Engineer Must Have
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Programming (Python — Production Level)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;You must write clean, modular, scalable code.&lt;br&gt;&lt;br&gt;
Understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OOP fundamentals
&lt;/li&gt;
&lt;li&gt;Async programming
&lt;/li&gt;
&lt;li&gt;Dependency management
&lt;/li&gt;
&lt;li&gt;Testing (pytest)
&lt;/li&gt;
&lt;li&gt;Code quality &amp;amp; engineering patterns
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Version Control (Git)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Not optional.&lt;br&gt;&lt;br&gt;
Branches, PRs, merge strategies, semantic commits.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. APIs&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Both:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Using&lt;/strong&gt; external APIs (OpenAI, HuggingFace, Replicate, Gemini, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creating&lt;/strong&gt; your own REST APIs using FastAPI or Flask
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Machine Learning Basics&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Enough to understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Types of models
&lt;/li&gt;
&lt;li&gt;Overfitting/underfitting
&lt;/li&gt;
&lt;li&gt;Train/val/test splits
&lt;/li&gt;
&lt;li&gt;Evaluation metrics
&lt;/li&gt;
&lt;li&gt;When and why to use fine-tuning
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5. Experimentation with Models&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Try different:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LLMs
&lt;/li&gt;
&lt;li&gt;Vision models
&lt;/li&gt;
&lt;li&gt;Speech-to-text &amp;amp; text-to-speech models
&lt;/li&gt;
&lt;li&gt;Embedding models
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Know their strengths, weaknesses, and costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;6. Deployment&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;You should know how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Containerize apps (Docker)
&lt;/li&gt;
&lt;li&gt;Build scalable inference APIs
&lt;/li&gt;
&lt;li&gt;Use load balancers &amp;amp; autoscaling
&lt;/li&gt;
&lt;li&gt;Handle model caching &amp;amp; batching
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;7. Cloud Platforms&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Choose one and get good at it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS
&lt;/li&gt;
&lt;li&gt;Azure
&lt;/li&gt;
&lt;li&gt;GCP
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Focus on the services that matter for AI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;S3 / Blob Storage
&lt;/li&gt;
&lt;li&gt;Lambda
&lt;/li&gt;
&lt;li&gt;EC2
&lt;/li&gt;
&lt;li&gt;ECS / EKS
&lt;/li&gt;
&lt;li&gt;API Gateway
&lt;/li&gt;
&lt;li&gt;CloudWatch
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;8. Monitoring &amp;amp; Logging&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A real AI system must log:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input/output
&lt;/li&gt;
&lt;li&gt;Latencies
&lt;/li&gt;
&lt;li&gt;Failures
&lt;/li&gt;
&lt;li&gt;Drift
&lt;/li&gt;
&lt;li&gt;Usage analytics
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tools preferred:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prometheus
&lt;/li&gt;
&lt;li&gt;Grafana
&lt;/li&gt;
&lt;li&gt;Langfuse
&lt;/li&gt;
&lt;li&gt;Mlflow
&lt;/li&gt;
&lt;li&gt;Weights &amp;amp; Biases (optional)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🌟 Building a Strong Portfolio (Your Golden Ticket to Jobs)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. End-to-End Projects&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Employers love projects that show:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;UI
&lt;/li&gt;
&lt;li&gt;API
&lt;/li&gt;
&lt;li&gt;Model adaptation
&lt;/li&gt;
&lt;li&gt;Deployment
&lt;/li&gt;
&lt;li&gt;Monitoring
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Build real, useful AI systems such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PDF chatbot with AI search
&lt;/li&gt;
&lt;li&gt;AI video analysis tool
&lt;/li&gt;
&lt;li&gt;Multi-agent workflow automations
&lt;/li&gt;
&lt;li&gt;Voice assistant for your domain
&lt;/li&gt;
&lt;li&gt;AI dashboard with monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. UI + API Skills&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A good AI engineer builds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A clean, functional frontend (React/Next.js)&lt;/li&gt;
&lt;li&gt;A robust backend (FastAPI/Django)&lt;/li&gt;
&lt;li&gt;A scalable inference system (Docker + Cloud)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. GitHub&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Your GitHub should be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clean
&lt;/li&gt;
&lt;li&gt;Documented
&lt;/li&gt;
&lt;li&gt;Organized by projects
&lt;/li&gt;
&lt;li&gt;With clear READMEs
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Technical Blog Posts&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Writing is a superpower.&lt;br&gt;&lt;br&gt;
Publish what you learn on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Medium
&lt;/li&gt;
&lt;li&gt;Dev.to
&lt;/li&gt;
&lt;li&gt;Hashnode
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Topics you can write about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How you built your system
&lt;/li&gt;
&lt;li&gt;Mistakes you made
&lt;/li&gt;
&lt;li&gt;What you learned
&lt;/li&gt;
&lt;li&gt;Costs &amp;amp; optimizations
&lt;/li&gt;
&lt;li&gt;Benchmarks
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🏁 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;AI Engineering is &lt;strong&gt;not&lt;/strong&gt; an academic field.&lt;br&gt;&lt;br&gt;
It’s a &lt;strong&gt;builder’s discipline&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you know how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pick the right pretrained model
&lt;/li&gt;
&lt;li&gt;adapt it
&lt;/li&gt;
&lt;li&gt;deploy it
&lt;/li&gt;
&lt;li&gt;scale it
&lt;/li&gt;
&lt;li&gt;monitor it
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You’re already ahead of 95% of people.&lt;/p&gt;

&lt;p&gt;This roadmap is your guide, now start building.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>software</category>
    </item>
    <item>
      <title>Linear Regression</title>
      <dc:creator>Dhanvina N</dc:creator>
      <pubDate>Tue, 02 Dec 2025 07:32:54 +0000</pubDate>
      <link>https://dev.to/ndhanvina/linear-regression-k23</link>
      <guid>https://dev.to/ndhanvina/linear-regression-k23</guid>
      <description>&lt;h1&gt;
  
  
  Linear Regression Explained Simply (Using Only 3 Houses)
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Step 1: Imagine you have only 3 houses
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;House size (x)&lt;/th&gt;
&lt;th&gt;Real price (y)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2.9&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;x = size in thousands of square feet&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;y = price in hundreds of thousands of dollars&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your goal: &lt;strong&gt;predict the price from the size using a straight line.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: What does a straight line look like?
&lt;/h2&gt;

&lt;p&gt;Every straight-line prediction model follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;predicted price = (some number) × size + (another number)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We name these numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;w&lt;/strong&gt; → weight/slope&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;b&lt;/strong&gt; → bias/intercept&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the model is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ŷ = w × x + b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where &lt;strong&gt;ŷ&lt;/strong&gt; means &lt;em&gt;predicted y&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: Pick a random line to start
&lt;/h2&gt;

&lt;p&gt;Let's guess:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;w = 0.5
b = 1.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now compute predictions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;x (size)&lt;/th&gt;
&lt;th&gt;Prediction ŷ = 0.5x + 1.0&lt;/th&gt;
&lt;th&gt;Real y&lt;/th&gt;
&lt;th&gt;Error (ŷ − y)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1.5&lt;/td&gt;
&lt;td&gt;1.5&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2.0&lt;/td&gt;
&lt;td&gt;2.2&lt;/td&gt;
&lt;td&gt;-0.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3.0&lt;/td&gt;
&lt;td&gt;2.9&lt;/td&gt;
&lt;td&gt;+0.1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Sometimes we predict low, sometimes high.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: Convert “a bit wrong” into ONE number
&lt;/h2&gt;

&lt;p&gt;We need a single value describing how bad the line is.&lt;br&gt;
But errors cancel out (e.g., -0.2 + 0.1 ≠ helpful).&lt;/p&gt;

&lt;p&gt;So we use two tricks:&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Square the errors
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;0² = 0&lt;/li&gt;
&lt;li&gt;(-0.2)² = 0.04&lt;/li&gt;
&lt;li&gt;(0.1)² = 0.01&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  2. Take the average
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MSE = (0 + 0.04 + 0.01) / 3
    = 0.05 / 3
    ≈ 0.0167
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This is the &lt;strong&gt;Mean Squared Error (MSE).&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Smaller MSE = better line.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  Step 5: Try another line and compare
&lt;/h2&gt;

&lt;p&gt;New guess:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;w = 0.8
b = 0.7
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;x&lt;/th&gt;
&lt;th&gt;Prediction ŷ = 0.8x + 0.7&lt;/th&gt;
&lt;th&gt;Real y&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Squared Error&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1.5&lt;/td&gt;
&lt;td&gt;1.5&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2.3&lt;/td&gt;
&lt;td&gt;2.2&lt;/td&gt;
&lt;td&gt;+0.1&lt;/td&gt;
&lt;td&gt;0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3.1&lt;/td&gt;
&lt;td&gt;2.9&lt;/td&gt;
&lt;td&gt;+0.2&lt;/td&gt;
&lt;td&gt;0.04&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MSE = (0 + 0.01 + 0.04) / 3
    ≈ 0.0167
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same as before — not better yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 6: The goal
&lt;/h2&gt;

&lt;p&gt;Try many combinations of &lt;strong&gt;w&lt;/strong&gt; and &lt;strong&gt;b&lt;/strong&gt; until you find the ones that give the &lt;strong&gt;smallest possible MSE&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That best pair:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(best w, best b)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;is the &lt;strong&gt;optimal straight line for your data&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final takeaway
&lt;/h2&gt;

&lt;p&gt;Linear regression is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Find the straight line that makes the average squared error as small as possible.”&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h1&gt;
  
  
  Why Not Brute Force Linear Regression? Introducing Gradient Descent
&lt;/h1&gt;

&lt;p&gt;When we try millions of combinations of &lt;strong&gt;w&lt;/strong&gt; and &lt;strong&gt;b&lt;/strong&gt; to find the best line, we are doing &lt;strong&gt;brute force&lt;/strong&gt; search.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why brute force is a bad idea?
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;It takes too long.&lt;/strong&gt;&lt;br&gt;
Trying millions of pairs of parameters becomes extremely slow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Small datasets → maybe okay.&lt;br&gt;
Real datasets → impossible.&lt;/strong&gt;&lt;br&gt;
With 3 houses, brute force is fine.&lt;br&gt;
With 100,000 houses, a computer would struggle.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So instead of guessing randomly, we use a far smarter method.&lt;/p&gt;




&lt;h1&gt;
  
  
  The Clever Trick: Ask the Loss Function for Directions
&lt;/h1&gt;

&lt;p&gt;We treat the &lt;strong&gt;MSE (Mean Squared Error)&lt;/strong&gt; like a &lt;strong&gt;landscape&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every pair &lt;code&gt;(w, b)&lt;/code&gt; is a point on the surface.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;height&lt;/strong&gt; of that point is the MSE at those parameter values.&lt;/li&gt;
&lt;li&gt;The lowest height = the best line.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it as a &lt;strong&gt;bowl-shaped valley&lt;/strong&gt;.&lt;br&gt;
Your job is to walk to the bottom.&lt;/p&gt;

&lt;p&gt;But here’s the key idea:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Mathematics can tell you exactly which direction is &lt;em&gt;downhill&lt;/em&gt; from where you stand.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That direction is called the &lt;strong&gt;gradient&lt;/strong&gt;.&lt;/p&gt;


&lt;h1&gt;
  
  
  What the Gradient Tells Us
&lt;/h1&gt;

&lt;p&gt;At your current values:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;w = 0.5  
b = 1.0  
MSE ≈ 0.0167
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We ask:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. “If I increase &lt;strong&gt;w&lt;/strong&gt; a tiny bit (+0.01), does MSE go up or down?”
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;MSE goes &lt;strong&gt;up&lt;/strong&gt; → the slope is positive → move &lt;strong&gt;w downward&lt;/strong&gt; (decrease w).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. “If I increase &lt;strong&gt;b&lt;/strong&gt; a tiny bit (+0.01), does MSE go up or down?”
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;MSE goes &lt;strong&gt;down&lt;/strong&gt; → the slope is negative → move &lt;strong&gt;b upward&lt;/strong&gt; (increase b).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the gradient tells us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Move &lt;strong&gt;w&lt;/strong&gt; slightly down.&lt;/li&gt;
&lt;li&gt;Move &lt;strong&gt;b&lt;/strong&gt; slightly up.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  The Update Rule (Gradient Descent)
&lt;/h1&gt;

&lt;p&gt;We update both parameters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new w = current w − (learning rate × slope_w)
new b = current b − (learning rate × slope_b)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Recalculate the new MSE&lt;/li&gt;
&lt;li&gt;Recalculate the slopes&lt;/li&gt;
&lt;li&gt;Take another step downhill&lt;/li&gt;
&lt;li&gt;Repeat 20–100 times&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Instead of testing millions of combinations, we follow the downhill slope directly to the minimum.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why Linear Regression Still Feels Like a Mystery (And What Is Actually Happening)
&lt;/h1&gt;

&lt;p&gt;Now you know the basic idea of linear regression, but the internal mechanics can still feel mysterious.&lt;br&gt;
This walkthrough removes the mystery by showing exactly what is happening inside gradient descent, step by step, using the same 3-house example.&lt;/p&gt;


&lt;h1&gt;
  
  
  Our Dataset
&lt;/h1&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;x (size)&lt;/th&gt;
&lt;th&gt;y (real price)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2.9&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We start with a random guess:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;w = 0.5
b = 1.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  Step 1: Make Predictions
&lt;/h1&gt;

&lt;p&gt;Using ŷ = w·x + b:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;House 1 → 0.5×1 + 1.0 = 1.5&lt;/li&gt;
&lt;li&gt;House 2 → 0.5×2 + 1.0 = 2.0&lt;/li&gt;
&lt;li&gt;House 3 → 0.5×3 + 1.0 = 3.0&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Step 2: Compute Errors
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;error = y − ŷ
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;House 1: 1.5 − 1.5 = 0&lt;/li&gt;
&lt;li&gt;House 2: 2.2 − 2.0 = +0.2&lt;/li&gt;
&lt;li&gt;House 3: 2.9 − 3.0 = −0.1&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These errors determine how we must adjust w and b.&lt;/p&gt;




&lt;h1&gt;
  
  
  Step 3: What Happens if w Changes a Little?
&lt;/h1&gt;

&lt;p&gt;Increase w slightly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new w = 0.51
b stays = 1.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;New predictions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;House 1 → 1.51&lt;/li&gt;
&lt;li&gt;House 2 → 2.02&lt;/li&gt;
&lt;li&gt;House 3 → 3.06&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;New errors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;House 1: −0.01&lt;/li&gt;
&lt;li&gt;House 2: +0.18&lt;/li&gt;
&lt;li&gt;House 3: −0.16&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The overall squared error becomes slightly larger.&lt;br&gt;
Conclusion: &lt;strong&gt;increasing w makes the model worse → w should be decreased&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That “how much worse” is exactly the &lt;strong&gt;gradient with respect to w&lt;/strong&gt;.&lt;/p&gt;


&lt;h1&gt;
  
  
  Step 4: The Gradient Formula (No Mystery Anymore)
&lt;/h1&gt;

&lt;p&gt;For linear regression, the exact slope (gradient) of MSE tells us how to update w:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gradient_w = −2 × average(x × error)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compute it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;House 1 → 1 × 0     = 0&lt;/li&gt;
&lt;li&gt;House 2 → 2 × 0.2   = 0.4&lt;/li&gt;
&lt;li&gt;House 3 → 3 × (−0.1) = −0.3&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sum = 0 + 0.4 − 0.3 = +0.1&lt;br&gt;
Average = 0.1 / 3 = 0.033&lt;br&gt;
Apply −2:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gradient_w ≈ −0.066
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This negative gradient means: decreasing w reduces the error.&lt;/p&gt;




&lt;h1&gt;
  
  
  Step 5: Gradient for b
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gradient_b = −2 × average(error)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Average error = (0 + 0.2 − 0.1) / 3 = 0.033&lt;br&gt;
Apply −2:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gradient_b ≈ −0.066
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same direction: decreasing b reduces error.&lt;/p&gt;




&lt;h1&gt;
  
  
  Step 6: Update w and b
&lt;/h1&gt;

&lt;p&gt;General update rule:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new_value = old_value − learning_rate × gradient
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With a moderate learning rate (for demonstration):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;w_new ≈ 0.73
b_new ≈ 0.93
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After just one update step, the MSE drops from about &lt;strong&gt;0.0167&lt;/strong&gt; to &lt;strong&gt;0.008&lt;/strong&gt;.&lt;br&gt;
The model is already noticeably better.&lt;/p&gt;


&lt;h1&gt;
  
  
  What Gradient Descent Is Really Doing
&lt;/h1&gt;

&lt;p&gt;Gradient descent repeatedly performs these simple steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Compute each prediction ŷ&lt;/li&gt;
&lt;li&gt;Compute each error (y − ŷ)&lt;/li&gt;
&lt;li&gt;Multiply errors by x to understand how each house influences w&lt;/li&gt;
&lt;li&gt;Average those influence values&lt;/li&gt;
&lt;li&gt;Adjust w toward lower error&lt;/li&gt;
&lt;li&gt;Adjust b using the average error&lt;/li&gt;
&lt;li&gt;Repeat 50–200 times&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is the entire mechanism behind linear regression training.&lt;/p&gt;


&lt;h1&gt;
  
  
  Linear Regression Explained in Complete Beginner Mode
&lt;/h1&gt;


&lt;h2&gt;
  
  
  Part 1: What Linear Regression Is Trying to Do
&lt;/h2&gt;

&lt;p&gt;We have houses.&lt;br&gt;
For each house we know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;x&lt;/strong&gt; = size of the house&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;y&lt;/strong&gt; = real selling price&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We want a straight line that predicts price from size:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;predicted price = w × x + b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;w&lt;/strong&gt; = how much price increases when size increases by 1&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;b&lt;/strong&gt; = base price when size is zero&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our goal is simple:&lt;br&gt;
&lt;strong&gt;Find the best possible w and b.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Part 2: How We Measure “Best”
&lt;/h2&gt;

&lt;p&gt;We measure how wrong our line is using &lt;strong&gt;MSE (Mean Squared Error)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For each house:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Predict the price: ŷ = w×x + b&lt;/li&gt;
&lt;li&gt;Compute error: y − ŷ&lt;/li&gt;
&lt;li&gt;Square the error: (y − ŷ)²&lt;/li&gt;
&lt;li&gt;Add squared errors for all houses&lt;/li&gt;
&lt;li&gt;Divide by number of houses&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Formula:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MSE = (1/N) × Σ (y − ŷ)²
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Smaller MSE = a better line.&lt;/p&gt;

&lt;p&gt;This is the only quantity we try to minimize.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 3: The Key Idea — Nudge w and b in the Right Direction
&lt;/h2&gt;

&lt;p&gt;We want to adjust w and b so that MSE gets smaller.&lt;/p&gt;

&lt;p&gt;Imagine nudging w slightly upward (by something tiny like +0.0001).&lt;br&gt;
Two possibilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If MSE increases → wrong direction; w should move down&lt;/li&gt;
&lt;li&gt;If MSE decreases → correct direction; w should move up&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The amount MSE changes when w changes slightly is the &lt;strong&gt;slope&lt;/strong&gt; or &lt;strong&gt;gradient&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Same idea applies to b.&lt;/p&gt;


&lt;h2&gt;
  
  
  Part 4: Deriving the Gradient in Simple Arithmetic
&lt;/h2&gt;

&lt;p&gt;Start with one house.&lt;br&gt;
Its squared error is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(y − (w×x + b))²
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;e = y − (w×x + b)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then squared error = e².&lt;/p&gt;

&lt;p&gt;How does e² change when w changes slightly?&lt;/p&gt;

&lt;p&gt;A basic math rule:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;change in (e²) = 2 × e × (change in e)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What is the change in e when w increases?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;e = y − w×x − b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If w increases, w×x increases, so e decreases:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;change in e = −x
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Thus:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;change in (e²) = 2 × e × (−x) = −2 × e × x
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is for one house.&lt;/p&gt;

&lt;p&gt;For all houses, we sum and average:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gradient_w = (1/N) × Σ [ −2 × (y − ŷ) × x ]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gradient for &lt;strong&gt;b&lt;/strong&gt; is easier:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;change in e when b changes = −1
gradient_b = (1/N) × Σ [ −2 × (y − ŷ) ]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are the exact formulas every linear regression library uses.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 5: Final Gradient Formulas
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gradient_w = −(2/N) × Σ [ (y − ŷ) × x ]
gradient_b = −(2/N) × Σ (y − ŷ)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To reduce MSE, we move &lt;strong&gt;opposite&lt;/strong&gt; the gradient:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;w ← w − learning_rate × gradient_w
b ← b − learning_rate × gradient_b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or, expanding the negatives:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;w ← w + learning_rate × (2/N) × Σ [ (y − ŷ) × x ]
b ← b + learning_rate × (2/N) × Σ (y − ŷ)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the complete update rule used in gradient descent.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 6: Full Example Done Completely by Hand
&lt;/h2&gt;

&lt;p&gt;Our dataset:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;x&lt;/th&gt;
&lt;th&gt;y&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2.9&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Start with a very poor guess:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;w = 0
b = 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1: Predictions
&lt;/h3&gt;

&lt;p&gt;All predictions are zero:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ŷ1 = 0
ŷ2 = 0
ŷ3 = 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Errors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1.5, 2.2, 2.9
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Update w
&lt;/h3&gt;

&lt;p&gt;Compute average of (error × x):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(1.5×1 + 2.2×2 + 2.9×3) / 3
= (1.5 + 4.4 + 8.7) / 3
= 14.6 / 3
≈ 4.867
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With learning rate = 0.1 and factor 2:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new w ≈ 0 + 0.1 × 2 × 4.867 ≈ 0.973
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Update b
&lt;/h3&gt;

&lt;p&gt;Average error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(1.5 + 2.2 + 2.9) / 3 = 7.6 / 3 ≈ 2.533
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Update:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new b ≈ 0 + 0.1 × 2 × 2.533 ≈ 0.507
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After just one update, parameters jump from (0, 0) → approximately (0.97, 0.51).&lt;br&gt;
This is already much closer to the optimal line.&lt;/p&gt;

&lt;p&gt;Repeat 20–50 steps and the updates stabilize.&lt;br&gt;
Those final w and b are the best-fitting straight line for the data.&lt;/p&gt;

&lt;p&gt;This is exactly what happens inside any machine learning library when you call &lt;code&gt;.fit()&lt;/code&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  Summary in Plain Language
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Start with random w and b.&lt;/li&gt;
&lt;li&gt;Compute predictions for all houses.&lt;/li&gt;
&lt;li&gt;Compute errors (y − ŷ).&lt;/li&gt;
&lt;li&gt;To update &lt;strong&gt;w&lt;/strong&gt;: multiply each error by its x, average them, and nudge w in that direction.&lt;/li&gt;
&lt;li&gt;To update &lt;strong&gt;b&lt;/strong&gt;: average all errors and nudge b in that direction.&lt;/li&gt;
&lt;li&gt;Repeat until nothing changes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There is no hidden machinery.&lt;br&gt;
Only simple arithmetic repeated many times.&lt;/p&gt;


&lt;h1&gt;
  
  
  Two Ways to Solve Linear Regression: Gradient Descent vs the Closed-Form Formula
&lt;/h1&gt;

&lt;p&gt;Now that gradient descent makes sense, it is important to know that there is actually another method to compute the best line. In fact, for simple linear regression, there is a formula that gives the perfect answer in one step with no looping at all.&lt;/p&gt;

&lt;p&gt;There are two approaches:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Gradient Descent → takes many small steps, works for any model&lt;/li&gt;
&lt;li&gt;Closed-Form Solution (Ordinary Least Squares) → gives exact w and b instantly&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For basic linear regression, the closed-form method is faster, simpler, and exact.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Closed-Form Formula (One-Step Solution)
&lt;/h2&gt;

&lt;p&gt;For simple linear regression with one feature x, the optimal slope and intercept are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;w = Σ[(x − x_mean)(y − y_mean)] / Σ[(x − x_mean)²]
b = y_mean − w × x_mean
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This computes the best-fit line in one calculation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Applying the Formula to Our Example
&lt;/h2&gt;

&lt;p&gt;Dataset:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;x&lt;/th&gt;
&lt;th&gt;y&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2.9&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Step 1: Compute Means
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x_mean = (1 + 2 + 3) / 3 = 2
y_mean = (1.5 + 2.2 + 2.9) / 3 = 2.2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Build the Deviation Table
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;x&lt;/th&gt;
&lt;th&gt;y&lt;/th&gt;
&lt;th&gt;x−2&lt;/th&gt;
&lt;th&gt;y−2.2&lt;/th&gt;
&lt;th&gt;(x−2)(y−2.2)&lt;/th&gt;
&lt;th&gt;(x−2)²&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1.5&lt;/td&gt;
&lt;td&gt;-1&lt;/td&gt;
&lt;td&gt;-0.7&lt;/td&gt;
&lt;td&gt;0.7&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2.2&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2.9&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.7&lt;/td&gt;
&lt;td&gt;0.7&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Step 3: Sum the Required Columns
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Σ(x−mean)(y−mean) = 1.4
Σ(x−mean)² = 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Apply the Formula
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;w = 1.4 / 2 = 0.7
b = 2.2 − 0.7 × 2 = 0.8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Final model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;price = 0.7 × size + 0.8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Check Against Data
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;x = 1 → 0.7 + 0.8 = 1.5&lt;/li&gt;
&lt;li&gt;x = 2 → 1.4 + 0.8 = 2.2&lt;/li&gt;
&lt;li&gt;x = 3 → 2.1 + 0.8 = 2.9&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The line fits all three points exactly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Formula Works (Intuition)
&lt;/h2&gt;

&lt;p&gt;The numerator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Σ[(x − x_mean)(y − y_mean)]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;measures how much x and y move together.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If x is above average and y is also above average → positive contribution&lt;/li&gt;
&lt;li&gt;If they move in opposite directions → negative contribution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The denominator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Σ[(x − x_mean)²]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;measures how much x varies on its own.&lt;/p&gt;

&lt;p&gt;Thus:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;w = (movement together) / (movement of x alone)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the slope is fixed, the intercept b simply shifts the line so it passes through the point:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(x_mean, y_mean)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Matrix Version (for multiple features)
&lt;/h2&gt;

&lt;p&gt;In general linear algebra form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;w = (XᵀX)⁻¹ Xᵀ y
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the Ordinary Least Squares (OLS) solution.&lt;br&gt;
For one feature, it reduces exactly to the two formulas we computed.&lt;/p&gt;


&lt;h2&gt;
  
  
  Summary: Gradient Descent vs Closed-Form
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Steps Required&lt;/th&gt;
&lt;th&gt;Loop Needed&lt;/th&gt;
&lt;th&gt;Exact?&lt;/th&gt;
&lt;th&gt;Works for Huge Data?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gradient Descent&lt;/td&gt;
&lt;td&gt;Many small updates&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Approximate&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Closed-Form OLS&lt;/td&gt;
&lt;td&gt;One computation&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Exact&lt;/td&gt;
&lt;td&gt;Only if data fits RAM&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For simple linear regression, the closed-form method is ideal.&lt;br&gt;
For complex models (neural networks, large datasets, many parameters), gradient descent is required.&lt;/p&gt;


&lt;h1&gt;
  
  
  Why We Still Use Gradient Descent When a Perfect Closed-Form Formula Exists
&lt;/h1&gt;

&lt;p&gt;After learning the closed-form solution for linear regression, it is natural to wonder:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“If we can compute w and b instantly, why do we ever bother with gradient descent?”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The short answer:&lt;br&gt;
The closed-form formula is excellent for small problems, but it breaks down completely once the model or dataset becomes large.&lt;br&gt;
Gradient descent, in contrast, scales to extremely large modern machine-learning problems.&lt;/p&gt;


&lt;h2&gt;
  
  
  Comparison: Closed-Form vs Gradient Descent
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Situation&lt;/th&gt;
&lt;th&gt;Closed-Form (OLS)&lt;/th&gt;
&lt;th&gt;Gradient Descent&lt;/th&gt;
&lt;th&gt;Winner&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1 feature, 100 data points&lt;/td&gt;
&lt;td&gt;Instant, exact&lt;/td&gt;
&lt;td&gt;Works but slower&lt;/td&gt;
&lt;td&gt;Closed-form&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10 features, 1M data points&lt;/td&gt;
&lt;td&gt;Works&lt;/td&gt;
&lt;td&gt;Works&lt;/td&gt;
&lt;td&gt;Both fine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1,000+ features&lt;/td&gt;
&lt;td&gt;Must compute a large XᵀX matrix → high memory&lt;/td&gt;
&lt;td&gt;Computes updates step-by-step → efficient&lt;/td&gt;
&lt;td&gt;Gradient descent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100,000+ features (e.g., text embeddings)&lt;/td&gt;
&lt;td&gt;XᵀX is enormous → cannot fit in RAM&lt;/td&gt;
&lt;td&gt;Still works with manageable memory&lt;/td&gt;
&lt;td&gt;Gradient descent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Neural networks (millions/billions of parameters)&lt;/td&gt;
&lt;td&gt;No closed-form solution exists&lt;/td&gt;
&lt;td&gt;Designed to optimize such models&lt;/td&gt;
&lt;td&gt;Gradient descent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Streaming/online data&lt;/td&gt;
&lt;td&gt;Must recompute from scratch&lt;/td&gt;
&lt;td&gt;Updates incrementally&lt;/td&gt;
&lt;td&gt;Gradient descent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add regularization (L1/L2)&lt;/td&gt;
&lt;td&gt;Closed-form becomes more complex&lt;/td&gt;
&lt;td&gt;Gradient descent only needs a small modification&lt;/td&gt;
&lt;td&gt;Gradient descent (usually simpler)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h2&gt;
  
  
  Why Closed-Form Breaks in Real Life
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Example: Large tabular dataset
&lt;/h3&gt;

&lt;p&gt;A housing dataset with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1,000,000 houses&lt;/li&gt;
&lt;li&gt;500 features&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;XᵀX becomes a &lt;strong&gt;500 × 500&lt;/strong&gt; matrix → manageable.&lt;/p&gt;

&lt;p&gt;But modern machine learning rarely has 500 features.&lt;br&gt;
Instead, consider:&lt;/p&gt;
&lt;h3&gt;
  
  
  Example: Image or text models
&lt;/h3&gt;

&lt;p&gt;A feature vector might have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;100,000 dimensions (e.g., bag-of-words, embeddings)&lt;/li&gt;
&lt;li&gt;or millions of parameters (neural networks)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The closed-form formula requires:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(XᵀX)⁻¹
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But XᵀX becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;100,000 × 100,000 matrix (10 billion entries)&lt;/li&gt;
&lt;li&gt;completely impossible to store or invert&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gradient descent does &lt;strong&gt;not&lt;/strong&gt; require any matrix inversion.&lt;br&gt;
It only needs to compute simple operations on the dataset in batches.&lt;/p&gt;

&lt;p&gt;This is why every modern machine learning framework—TensorFlow, PyTorch, JAX—uses gradient-based optimization.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Simple Way to Remember It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Closed-form (OLS):&lt;/strong&gt;&lt;br&gt;
Works perfectly, but only for small, simple linear models.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gradient descent:&lt;/strong&gt;&lt;br&gt;
Works for linear models, logistic regression, deep learning, transformers, large-scale systems—essentially everything.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;scikit-learn’s &lt;code&gt;LinearRegression()&lt;/code&gt; uses the closed-form solution because typical tabular datasets are small enough.&lt;br&gt;
TensorFlow and PyTorch use gradient-based methods exclusively because they target large, complex models.&lt;/p&gt;




</description>
      <category>ai</category>
      <category>beginners</category>
      <category>tutorial</category>
      <category>python</category>
    </item>
    <item>
      <title>Automate Your Data Workflows: Why Pressing Download Button Isn’t Always Enough!</title>
      <dc:creator>Dhanvina N</dc:creator>
      <pubDate>Sun, 25 Aug 2024 14:22:57 +0000</pubDate>
      <link>https://dev.to/ndhanvina/automate-your-data-workflows-why-pressing-download-button-isnt-always-enough-1cj7</link>
      <guid>https://dev.to/ndhanvina/automate-your-data-workflows-why-pressing-download-button-isnt-always-enough-1cj7</guid>
      <description>&lt;p&gt;Ever found yourself downloading datasets from Kaggle or other online sources, only to get bogged down by repetitive tasks like data cleaning and splitting? Imagine if you could automate these processes, making data management as breezy as a click of a button! That’s where Apache Airflow comes into play. Let’s dive into how you can set up an automated pipeline for handling massive datasets, complete with a NAS (Network-Attached Storage) for seamless data management. 🚀&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Automate?
&lt;/h3&gt;

&lt;p&gt;Before we dive into the nitty-gritty, let’s explore why automating data workflows can save you time and sanity:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reduce Repetition:&lt;/strong&gt; Automate repetitive tasks to focus on more exciting aspects of your project.&lt;br&gt;
&lt;strong&gt;Increase Efficiency:&lt;/strong&gt; Quickly handle updates or new data without manual intervention.&lt;br&gt;
&lt;strong&gt;Ensure Consistency:&lt;/strong&gt; Maintain consistent data processing standards every time.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step-by-Step Guide to Your Data Pipeline
&lt;/h3&gt;

&lt;p&gt;Let’s walk through setting up a data pipeline using Apache Airflow, focusing on automating dataset downloads, data cleaning, and splitting—all while leveraging your NAS for storage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File structure&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/your_project/
│
├── dags/
│   └── kaggle_data_pipeline.py      # Airflow DAG script for automation
│
├── scripts/
│   ├── cleaning_script.py           # Data cleaning script
│   └── split_script.py              # Data splitting script
│
├── data/
│   ├── raw/                        # Raw dataset files
│   ├── processed/                 # Cleaned and split dataset files
│   └── external/                  # External files or archives
│
├── airflow_config/
│   └── airflow.cfg                 # Airflow configuration file (if customized)
│
├── Dockerfile                       # Optional: Dockerfile for containerizing
├── docker-compose.yml               # Optional: Docker Compose configuration
├── requirements.txt                # Python dependencies for your project
└── README.md                       # Project documentation

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;1. Set Up Apache Airflow&lt;/strong&gt;&lt;br&gt;
First things first, let’s get Airflow up and running.&lt;/p&gt;

&lt;p&gt;Install Apache Airflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Create and activate a virtual environment
python3 -m venv airflow_env
source airflow_env/bin/activate

# Install Airflow
pip install apache-airflow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Initialize the Airflow Database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;airflow db init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create an Admin User:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;airflow users create --username admin --firstname Admin --lastname User --role Admin --email admin@example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start Airflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;airflow webserver --port 8080
airflow scheduler
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Access Airflow UI: Go to &lt;a href="http://localhost:8080" rel="noopener noreferrer"&gt;http://localhost:8080&lt;/a&gt; in your web browser.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Connect Your NAS&lt;/strong&gt;&lt;br&gt;
Mount NAS Storage: Ensure your NAS is mounted on your system. For instance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo mount -t nfs &amp;lt;NAS_IP&amp;gt;:/path/to/nas /mnt/nas
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Create Your Data Pipeline DAG&lt;/strong&gt;&lt;br&gt;
Create a Python file (e.g., kaggle_data_pipeline.py) in the ~/airflow/dags directory with the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime, timedelta
import os
import subprocess

# Default arguments
default_args = {
    'owner': 'your_name',
    'depends_on_past': False,
    'start_date': datetime(2024, 8, 1),
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}

# Define the DAG
dag = DAG(
    'kaggle_data_pipeline',
    default_args=default_args,
    description='Automated Pipeline for Kaggle Datasets',
    schedule_interval=timedelta(days=1),
)

# Define Python functions for each task
def download_data(**kwargs):
    # Replace with your Kaggle dataset URL and credentials
    subprocess.run(["kaggle", "datasets", "download", "-d", "&amp;lt;DATASET_ID&amp;gt;", "-p", "/mnt/nas/data"])

def extract_data(**kwargs):
    # Extract data if it's in a compressed format
    subprocess.run(["unzip", "/mnt/nas/data/dataset.zip", "-d", "/mnt/nas/data"])

def clean_data(**kwargs):
    # Example cleaning script call
    subprocess.run(["python", "/path/to/cleaning_script.py", "--input", "/mnt/nas/data"])

def split_data(**kwargs):
    # Example splitting script call
    subprocess.run(["python", "/path/to/split_script.py", "--input", "/mnt/nas/data"])

# Define tasks
download_task = PythonOperator(
    task_id='download_data',
    python_callable=download_data,
    dag=dag,
)

extract_task = PythonOperator(
    task_id='extract_data',
    python_callable=extract_data,
    dag=dag,
)

clean_task = PythonOperator(
    task_id='clean_data',
    python_callable=clean_data,
    dag=dag,
)

split_task = PythonOperator(
    task_id='split_data',
    python_callable=split_data,
    dag=dag,
)

# Set task dependencies
download_task &amp;gt;&amp;gt; extract_task &amp;gt;&amp;gt; clean_task &amp;gt;&amp;gt; split_task
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Create Data Processing Scripts&lt;/strong&gt;&lt;br&gt;
scripts/cleaning_script.py&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import argparse
import os

def clean_data(input_path):
    # Implement your data cleaning logic here
    print(f"Cleaning data in {input_path}...")

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--input', required=True, help="Path to the data directory")
    args = parser.parse_args()

    clean_data(args.input)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;scripts/split_script.py&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import argparse
import os

def split_data(input_path):
    # Implement your data splitting logic here
    print(f"Splitting data in {input_path}...")

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--input', required=True, help="Path to the data directory")
    args = parser.parse_args()

    split_data(args.input)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Dockerize Your Setup&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM apache/airflow:2.5.1

USER root

# Install any additional packages
RUN pip install kaggle

# Copy DAGs and scripts
COPY dags/ /opt/airflow/dags/
COPY scripts/ /opt/airflow/scripts/

USER airflow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;docker-compose.yml&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;version: '3'
services:
  airflow-webserver:
    image: apache/airflow:2.5.1
    ports:
      - "8080:8080"
    environment:
      - AIRFLOW__CORE__SQL_ALCHEMY_DATABASE_URI=sqlite:///airflow.db
      - AIRFLOW__CORE__EXECUTOR=LocalExecutor
    volumes:
      - ./dags:/opt/airflow/dags
      - ./scripts:/opt/airflow/scripts
    command: webserver

  airflow-scheduler:
    image: apache/airflow:2.5.1
    environment:
      - AIRFLOW__CORE__SQL_ALCHEMY_DATABASE_URI=sqlite:///airflow.db
      - AIRFLOW__CORE__EXECUTOR=LocalExecutor
    volumes:
      - ./dags:/opt/airflow/dags
      - ./scripts:/opt/airflow/scripts
    command: scheduler
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run Your Pipeline&lt;br&gt;
Start Airflow Services:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker-compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Monitor Pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Access the Airflow UI at http://localhost:8080 to trigger and monitor the pipeline
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub Actions Setup&lt;br&gt;
GitHub Actions allows you to automate workflows directly within your GitHub repository. Here’s how you can set it up to run your Dockerized pipeline:&lt;/p&gt;

&lt;p&gt;Create GitHub Actions Workflow&lt;br&gt;
Create a .github/workflows Directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mkdir -p .github/workflows
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a Workflow File:&lt;/p&gt;

&lt;p&gt;.github/workflows/ci-cd.yml&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;name: CI/CD Pipeline

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2

      - name: Build and push Docker image
        uses: docker/build-push-action@v4
        with:
          context: .
          push: true
          tags: your_dockerhub_username/your_image_name:latest

      - name: Run Docker container
        run: |
          docker run -d --name airflow_container -p 8080:8080 your_dockerhub_username/your_image_name:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. What’s Happening Here?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;download_data: Automatically downloads the dataset from Kaggle to your NAS.&lt;/li&gt;
&lt;li&gt;extract_data: Unzips the dataset if needed.&lt;/li&gt;
&lt;li&gt;clean_data: Cleans the data using your custom script.&lt;/li&gt;
&lt;li&gt;split_data: Splits the data into training, validation, and testing sets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;5. Run and Monitor Your Pipeline&lt;/strong&gt;&lt;br&gt;
Access the Airflow UI to manually trigger the DAG or monitor its execution.&lt;br&gt;
Check Logs for detailed information on each task.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Optimize and Scale&lt;/strong&gt;&lt;br&gt;
As your dataset grows or your needs change:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Adjust Task Parallelism: Configure Airflow to handle multiple tasks concurrently.&lt;/li&gt;
&lt;li&gt;Enhance Data Cleaning: Update your cleaning and splitting scripts as needed.&lt;/li&gt;
&lt;li&gt;Add More Tasks: Integrate additional data processing steps into your pipeline.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Automating your data workflows with Apache Airflow can transform how you manage and process datasets. From downloading and cleaning to splitting and scaling, Airflow’s orchestration capabilities streamline your data pipeline, allowing you to focus on what really matters—analyzing and deriving insights from your data.&lt;/p&gt;

&lt;p&gt;So, set up your pipeline today, kick back, and let Airflow do the heavy lifting!&lt;/p&gt;

</description>
      <category>dataops</category>
      <category>aiops</category>
      <category>mlops</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
