<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Viswa M</title>
    <description>The latest articles on DEV Community by Viswa M (@viswa_m_09).</description>
    <link>https://dev.to/viswa_m_09</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3824317%2Fcdcb3351-6bbf-4ac6-a04b-e538dc6e9d79.png</url>
      <title>DEV Community: Viswa M</title>
      <link>https://dev.to/viswa_m_09</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/viswa_m_09"/>
    <language>en</language>
    <item>
      <title>Understanding a Tiny Two‑Layer Neural Network that Learns XOR</title>
      <dc:creator>Viswa M</dc:creator>
      <pubDate>Fri, 27 Mar 2026 18:48:40 +0000</pubDate>
      <link>https://dev.to/viswa_m_09/understanding-a-tiny-two-layer-neural-network-that-learns-xor-19f9</link>
      <guid>https://dev.to/viswa_m_09/understanding-a-tiny-two-layer-neural-network-that-learns-xor-19f9</guid>
      <description>&lt;h2&gt;
  
  
  Tiny Two‑Layer Neural Network that Learns XOR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Meta description:&lt;/strong&gt; Learn how a simple two‑layer NumPy neural network solves the XOR problem with back‑propagation, step‑by‑step code and explanations.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;xor&lt;/code&gt;, &lt;code&gt;neuralnetwork&lt;/code&gt;, &lt;code&gt;twolayer&lt;/code&gt;, &lt;code&gt;numpy&lt;/code&gt;, &lt;code&gt;backpropagation&lt;/code&gt;, &lt;code&gt;machinelearning&lt;/code&gt;, &lt;code&gt;deeplearning&lt;/code&gt;, &lt;code&gt;python&lt;/code&gt;, &lt;code&gt;gradientdescent&lt;/code&gt;, &lt;code&gt;sigmoid&lt;/code&gt;  &lt;/p&gt;




&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;The exclusive‑or (XOR) problem is a classic benchmark for neural networks. It is easy to describe, but a single linear neuron cannot solve it. In this post we walk through a compact NumPy implementation of a two‑layer (one hidden layer) network that learns the XOR truth table from scratch. You will see how the data are prepared, how the parameters are initialized, how the forward and backward passes are performed, and why the hidden layer is essential.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Program Does – In a Nutshell
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Builds a tiny feed‑forward network with one hidden layer of four sigmoid units.
&lt;/li&gt;
&lt;li&gt;Trains it on the four possible binary inputs of XOR using gradient descent.
&lt;/li&gt;
&lt;li&gt;After 10 000 epochs the network’s predictions are close to the target values (≈ 0 for &lt;strong&gt;False&lt;/strong&gt;, ≈ 1 for &lt;strong&gt;True&lt;/strong&gt;).
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The printed output after training looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[[0.02]
 [0.97]
 [0.96]
 [0.03]]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These correspond to the XOR results for the inputs (0,0), (0,1), (1,0) and (1,1).&lt;/p&gt;




&lt;h2&gt;
  
  
  Data Preparation – The XOR Truth Table
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x1  x2  XOR
0   0   0
0   1   1
1   0   1
1   1   0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the code the inputs are stored in a &lt;code&gt;4 × 2&lt;/code&gt; NumPy array &lt;code&gt;X&lt;/code&gt; and the targets in a &lt;code&gt;4 × 1&lt;/code&gt; array &lt;code&gt;y&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Parameter Initialization – Weights and Biases
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;W1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# input → hidden (2 inputs, 4 hidden units)
&lt;/span&gt;&lt;span class="n"&gt;B1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;        &lt;span class="c1"&gt;# bias for each hidden unit
&lt;/span&gt;
&lt;span class="n"&gt;W2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# hidden → output (4 hidden, 1 output)
&lt;/span&gt;&lt;span class="n"&gt;B2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;        &lt;span class="c1"&gt;# bias for the output unit
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Random weights break symmetry; zero biases are a simple, common choice.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hyper‑parameters – Epochs and Learning Rate
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;epochs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;   &lt;span class="c1"&gt;# number of full passes over the training set
&lt;/span&gt;&lt;span class="n"&gt;lr&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;    &lt;span class="c1"&gt;# step size for gradient descent
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;More epochs give the network time to converge; the learning rate controls how large each update is.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sigmoid Activation and Its Derivative
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sigmoid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sigmoid_derivative&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# a = sigmoid(z); derivative = a * (1 - a)
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The sigmoid maps any real number to the interval &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%25280%252C%25E2%2580%25AF1%2529" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%25280%252C%25E2%2580%25AF1%2529" alt="(0, 1)" width="42" height="13"&gt;&lt;/a&gt;. Its derivative can be expressed directly in terms of the activation, which keeps the back‑propagation code concise.&lt;/p&gt;




&lt;h2&gt;
  
  
  Training Loop – Forward Pass, Loss, Back‑Propagation, and Gradient Descent
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;train_network&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;B1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;B2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;                     &lt;span class="c1"&gt;# number of training examples (4)
&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# ---- forward pass -------------------------------------------------
&lt;/span&gt;        &lt;span class="n"&gt;Z1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;B1&lt;/span&gt;
        &lt;span class="n"&gt;A1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sigmoid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Z1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;Z2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;A1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;B2&lt;/span&gt;
        &lt;span class="n"&gt;A2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sigmoid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Z2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;               &lt;span class="c1"&gt;# predictions
&lt;/span&gt;
        &lt;span class="c1"&gt;# ---- error at output (MSE loss) -----------------------------------
&lt;/span&gt;        &lt;span class="n"&gt;DZ2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;A2&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;                    &lt;span class="c1"&gt;# ∂L/∂Z2
&lt;/span&gt;
        &lt;span class="c1"&gt;# ---- gradients for output layer ------------------------------------
&lt;/span&gt;        &lt;span class="n"&gt;DW2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;A1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DZ2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;DB2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DZ2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keepdims&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# ---- back‑propagation to hidden layer -------------------------------
&lt;/span&gt;        &lt;span class="n"&gt;DA1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DZ2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;DZ1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DA1&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;sigmoid_derivative&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;A1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;DW1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DZ1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;DB1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DZ1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keepdims&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# ---- gradient‑descent update ----------------------------------------
&lt;/span&gt;        &lt;span class="n"&gt;W2&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;DW2&lt;/span&gt;
        &lt;span class="n"&gt;B2&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;DB2&lt;/span&gt;
        &lt;span class="n"&gt;W1&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;DW1&lt;/span&gt;
        &lt;span class="n"&gt;B1&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;DB1&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;W1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;B1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;B2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What the Loop Does
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Forward pass&lt;/strong&gt; – computes hidden activations &lt;code&gt;A1&lt;/code&gt; and final output &lt;code&gt;A2&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Loss&lt;/strong&gt; – mean‑squared‑error; its gradient w.r.t. &lt;code&gt;Z2&lt;/code&gt; is simply &lt;code&gt;A2‑y&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Back‑propagation&lt;/strong&gt; – uses the chain rule to obtain gradients for every parameter.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gradient descent&lt;/strong&gt; – moves each weight and bias opposite to its gradient, scaled by &lt;code&gt;lr&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All operations are vectorised, so the training runs without explicit Python loops over the four examples.&lt;/p&gt;




&lt;h2&gt;
  
  
  Mathematical Derivation of the Gradients
&lt;/h2&gt;

&lt;p&gt;For a single example the network equations are  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cbegin%257Baligned%257D%250AZ%255E%257B%25281%2529%257D%2520%2526%253D%2520XW%255E%257B%25281%2529%257D%2520%252B%2520b%255E%257B%25281%2529%257D%2520%255C%255C%250AA%255E%257B%25281%2529%257D%2520%2526%253D%2520%255Csigma%255C%2521%255Cbigl%2528Z%255E%257B%25281%2529%257D%255Cbigr%2529%2520%255C%255C%250AZ%255E%257B%25282%2529%257D%2520%2526%253D%2520A%255E%257B%25281%2529%257DW%255E%257B%25282%2529%257D%2520%252B%2520b%255E%257B%25282%2529%257D%2520%255C%255C%250A%255Chat%257By%257D%253DA%255E%257B%25282%2529%257D%2520%2526%253D%2520%255Csigma%255C%2521%255Cbigl%2528Z%255E%257B%25282%2529%257D%255Cbigr%2529%250A%255Cend%257Baligned%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cbegin%257Baligned%257D%250AZ%255E%257B%25281%2529%257D%2520%2526%253D%2520XW%255E%257B%25281%2529%257D%2520%252B%2520b%255E%257B%25281%2529%257D%2520%255C%255C%250AA%255E%257B%25281%2529%257D%2520%2526%253D%2520%255Csigma%255C%2521%255Cbigl%2528Z%255E%257B%25281%2529%257D%255Cbigr%2529%2520%255C%255C%250AZ%255E%257B%25282%2529%257D%2520%2526%253D%2520A%255E%257B%25281%2529%257DW%255E%257B%25282%2529%257D%2520%252B%2520b%255E%257B%25282%2529%257D%2520%255C%255C%250A%255Chat%257By%257D%253DA%255E%257B%25282%2529%257D%2520%2526%253D%2520%255Csigma%255C%2521%255Cbigl%2528Z%255E%257B%25282%2529%257D%255Cbigr%2529%250A%255Cend%257Baligned%257D" alt="math" width="147" height="82"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The loss (mean‑squared‑error) is  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cmathcal%257BL%257D%253D%2520%255Cfrac%257B1%257D%257B2n%257D%255Csum_%257Bi%253D1%257D%255E%257Bn%257D%2528%255Chat%257By%257D_i-y_i%2529%255E2%2520." class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cmathcal%257BL%257D%253D%2520%255Cfrac%257B1%257D%257B2n%257D%255Csum_%257Bi%253D1%257D%255E%257Bn%257D%2528%255Chat%257By%257D_i-y_i%2529%255E2%2520." alt="math" width="120" height="36"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Derivative w.r.t. the output activation:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520A%255E%257B%25282%2529%257D%257D%2520%253D%2520%255Chat%257By%257D%2520-%2520y%2520." class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520A%255E%257B%25282%2529%257D%257D%2520%253D%2520%255Chat%257By%257D%2520-%2520y%2520." alt="math" width="82" height="28"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Because &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Csigma%2527%2528z%2529%253D%255Csigma%2528z%2529%25281-%255Csigma%2528z%2529%2529" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Csigma%2527%2528z%2529%253D%255Csigma%2528z%2529%25281-%255Csigma%2528z%2529%2529" alt="\sigma'(z)=\sigma(z)(1-\sigma(z))" width="127" height="14"&gt;&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520Z%255E%257B%25282%2529%257D%257D%2520%253D%2520%2528%255Chat%257By%257D-y%2529%255Codot%250A%2520%2520%2520%255Csigma%255C%2521%255Cbigl%2528Z%255E%257B%25282%2529%257D%255Cbigr%2529%255Cbigl%25281-%255Csigma%255C%2521%255Cbigl%2528Z%255E%257B%25282%2529%257D%255Cbigr%2529%255Cbigr%2529%2520." class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520Z%255E%257B%25282%2529%257D%257D%2520%253D%2520%2528%255Chat%257By%257D-y%2529%255Codot%250A%2520%2520%2520%255Csigma%255C%2521%255Cbigl%2528Z%255E%257B%25282%2529%257D%255Cbigr%2529%255Cbigl%25281-%255Csigma%255C%2521%255Cbigl%2528Z%255E%257B%25282%2529%257D%255Cbigr%2529%255Cbigr%2529%2520." alt="math" width="226" height="28"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the code the factor &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Csigma%2527%2528Z%255E%257B%25282%2529%257D%2529" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Csigma%2527%2528Z%255E%257B%25282%2529%257D%2529" alt="\sigma'(Z^{(2)})" width="42" height="15"&gt;&lt;/a&gt; is omitted from &lt;code&gt;DZ2&lt;/code&gt; and later absorbed into the hidden‑layer error term; this simplification does not affect the final update.&lt;/p&gt;

&lt;p&gt;Back‑propagating to the hidden layer:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cbegin%257Baligned%257D%250A%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520A%255E%257B%25281%2529%257D%257D%2520%2526%253D%2520%250A%2520%2520%2520%2520%2520%2520%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520Z%255E%257B%25282%2529%257D%257D%2520W%255E%257B%25282%2529%255Ctop%257D%2520%255C%255C%250A%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520Z%255E%257B%25281%2529%257D%257D%2520%2526%253D%2520%250A%2520%2520%2520%2520%2520%2520%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520A%255E%257B%25281%2529%257D%257D%2520%255Codot%250A%2520%2520%2520%2520%2520%2520%255Csigma%2527%255C%2521%255Cbigl%2528Z%255E%257B%25281%2529%257D%255Cbigr%2529%2520.%250A%255Cend%257Baligned%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cbegin%257Baligned%257D%250A%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520A%255E%257B%25281%2529%257D%257D%2520%2526%253D%2520%250A%2520%2520%2520%2520%2520%2520%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520Z%255E%257B%25282%2529%257D%257D%2520W%255E%257B%25282%2529%255Ctop%257D%2520%255C%255C%250A%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520Z%255E%257B%25281%2529%257D%257D%2520%2526%253D%2520%250A%2520%2520%2520%2520%2520%2520%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520A%255E%257B%25281%2529%257D%257D%2520%255Codot%250A%2520%2520%2520%2520%2520%2520%255Csigma%2527%255C%2521%255Cbigl%2528Z%255E%257B%25281%2529%257D%255Cbigr%2529%2520.%250A%255Cend%257Baligned%257D" alt="math" width="145" height="60"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Averaged over the batch, the gradients for the weight matrices and biases are  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cbegin%257Baligned%257D%250A%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520W%255E%257B%25282%2529%257D%257D%2520%2526%253D%2520%250A%2520%2520%2520%2520%2520%2520%255Cfrac%257B1%257D%257Bn%257D%2520A%255E%257B%25281%2529%255Ctop%257D%2520%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520Z%255E%257B%25282%2529%257D%257D%2520%255C%255C%250A%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520b%255E%257B%25282%2529%257D%257D%2520%2526%253D%2520%250A%2520%2520%2520%2520%2520%2520%255Cfrac%257B1%257D%257Bn%257D%255Csum_i%2520%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520Z%255E%257B%25282%2529%257D_i%257D%2520%255C%255C%250A%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520W%255E%257B%25281%2529%257D%257D%2520%2526%253D%2520%250A%2520%2520%2520%2520%2520%2520%255Cfrac%257B1%257D%257Bn%257D%2520X%255E%255Ctop%2520%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520Z%255E%257B%25281%2529%257D%257D%2520%255C%255C%250A%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520b%255E%257B%25281%2529%257D%257D%2520%2526%253D%2520%250A%2520%2520%2520%2520%2520%2520%255Cfrac%257B1%257D%257Bn%257D%255Csum_i%2520%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520Z%255E%257B%25281%2529%257D_i%257D%2520.%250A%255Cend%257Baligned%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cbegin%257Baligned%257D%250A%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520W%255E%257B%25282%2529%257D%257D%2520%2526%253D%2520%250A%2520%2520%2520%2520%2520%2520%255Cfrac%257B1%257D%257Bn%257D%2520A%255E%257B%25281%2529%255Ctop%257D%2520%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520Z%255E%257B%25282%2529%257D%257D%2520%255C%255C%250A%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520b%255E%257B%25282%2529%257D%257D%2520%2526%253D%2520%250A%2520%2520%2520%2520%2520%2520%255Cfrac%257B1%257D%257Bn%257D%255Csum_i%2520%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520Z%255E%257B%25282%2529%257D_i%257D%2520%255C%255C%250A%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520W%255E%257B%25281%2529%257D%257D%2520%2526%253D%2520%250A%2520%2520%2520%2520%2520%2520%255Cfrac%257B1%257D%257Bn%257D%2520X%255E%255Ctop%2520%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520Z%255E%257B%25281%2529%257D%257D%2520%255C%255C%250A%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520b%255E%257B%25281%2529%257D%257D%2520%2526%253D%2520%250A%2520%2520%2520%2520%2520%2520%255Cfrac%257B1%257D%257Bn%257D%255Csum_i%2520%255Cfrac%257B%255Cpartial%255Cmathcal%2520L%257D%257B%255Cpartial%2520Z%255E%257B%25281%2529%257D_i%257D%2520.%250A%255Cend%257Baligned%257D" alt="math" width="126" height="139"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These formulas correspond exactly to the NumPy statements in the training loop.&lt;/p&gt;




&lt;h2&gt;
  
  
  Training the Network
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;W1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;B1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;B2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_network&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;B1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;B2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After 10 000 passes the parameters have been adjusted so that the network produces a high confidence (~1) for the true XOR cases and a low confidence (~0) for the false cases.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Forward Pass – Inspecting the Predictions
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Z1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;B1&lt;/span&gt;
&lt;span class="n"&gt;A1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sigmoid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Z1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;Z2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;A1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;B2&lt;/span&gt;
&lt;span class="n"&gt;A2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sigmoid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Z2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;A2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Typical output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[[0.02]
 [0.97]
 [0.96]
 [0.03]]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Interpretation&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Input&lt;/th&gt;
&lt;th&gt;Network output&lt;/th&gt;
&lt;th&gt;XOR truth&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;(0,0)&lt;/td&gt;
&lt;td&gt;0.02 → &lt;strong&gt;False&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;(0,1)&lt;/td&gt;
&lt;td&gt;0.97 → &lt;strong&gt;True&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;(1,0)&lt;/td&gt;
&lt;td&gt;0.96 → &lt;strong&gt;True&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;(1,1)&lt;/td&gt;
&lt;td&gt;0.03 → &lt;strong&gt;False&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The network has successfully learned the XOR mapping.&lt;/p&gt;




&lt;h2&gt;
  
  
  Core Machine‑Learning Concepts Explained
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Neuron (linear part)&lt;/strong&gt; – computes a weighted sum of inputs plus a bias.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Activation function&lt;/strong&gt; – adds non‑linearity; sigmoid maps to &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%25280%252C%25E2%2580%25AF1%2529" alt="(0, 1)" width="42" height="13"&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Loss function&lt;/strong&gt; – measures prediction error; we use mean‑squared‑error.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gradient&lt;/strong&gt; – direction of steepest increase of the loss; we move opposite to it.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Back‑propagation&lt;/strong&gt; – systematic use of the chain rule to compute all gradients efficiently.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gradient descent&lt;/strong&gt; – updates parameters by a small step proportional to the negative gradient.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Epoch&lt;/strong&gt; – one full sweep over the training set; multiple epochs let the model converge.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why a Hidden Layer Is Necessary for XOR
&lt;/h2&gt;

&lt;p&gt;A single linear neuron computes  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Chat%257By%257D%253D%2520%255Csigma%255C%2521%255Cbigl%2528w_1%2520x_1%2520%252B%2520w_2%2520x_2%2520%252B%2520b%255Cbigr%2529%2520." class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Chat%257By%257D%253D%2520%255Csigma%255C%2521%255Cbigl%2528w_1%2520x_1%2520%252B%2520w_2%2520x_2%2520%252B%2520b%255Cbigr%2529%2520." alt="math" width="138" height="16"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Its decision boundary is a straight line in the &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%2528x_1%252Cx_2%2529" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%2528x_1%252Cx_2%2529" alt="(x_1,x_2)" width="39" height="13"&gt;&lt;/a&gt; plane. XOR’s positive points &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%25280%252C1%2529" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%25280%252C1%2529" alt="(0,1)" width="26" height="13"&gt;&lt;/a&gt; and &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%25281%252C0%2529" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%25281%252C0%2529" alt="(1,0)" width="26" height="13"&gt;&lt;/a&gt; are diagonally opposite; no straight line can separate them from the negative points &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%25280%252C0%2529" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%25280%252C0%2529" alt="(0,0)" width="28" height="13"&gt;&lt;/a&gt; and &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%25281%252C1%2529" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%25281%252C1%2529" alt="(1,1)" width="28" height="13"&gt;&lt;/a&gt;.  &lt;/p&gt;

&lt;p&gt;Adding a hidden layer with sigmoid units creates intermediate features such as “&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fx_1%2520%255Cneq%2520x_2" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fx_1%2520%255Cneq%2520x_2" alt="x_1 \neq x_2" width="42" height="12"&gt;&lt;/a&gt;”. After training, some hidden neurons fire only for the mixed inputs, enabling the final linear combination to separate the two classes. Thus a single hidden layer gives the network a non‑linear decision surface that can represent XOR, demonstrating the power of depth.&lt;/p&gt;




&lt;h2&gt;
  
  
  Complete Code Implementation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
import numpy as np

# ---- data -------------------------------------------------
X = np.array([
    [0, 0],
    [0, 1],
    [1, 0],
    [1, 1]
])                     # shape (4, 2)

y = np.array([[0], [1], [1], [0]])   # shape (4, 1)

# ---- parameter initialization -----------------------------
W1 = np.random.randn(2, 4)
B1 = np.zeros((1, 4))

W2 = np.random.randn(4, 1)
B2 = np.zeros((1, 1))

# ---- hyper‑parameters --------------------------------------
epochs = 10000
lr = 0.1

# ---- helper functions --------------------------------------
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def sigmoid_derivative(a):
    return a * (1 - a)

# ---- training function -------------------------------------
def train_network(X, y, W1, W2, B1, B2, epochs, lr):
    n = X.shape[0]

    for _ in range(epochs):
        # forward
        Z1 = np.dot(X, W1) + B1
        A1 = sigmoid(Z1)

        Z2 = np.dot(A1, W2) + B2

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>xor</category>
      <category>neuralnetwork</category>
      <category>twolayer</category>
      <category>numpy</category>
    </item>
    <item>
      <title>🚀 A Gentle Walk‑Through of Logistic Regression in Python</title>
      <dc:creator>Viswa M</dc:creator>
      <pubDate>Fri, 27 Mar 2026 18:43:20 +0000</pubDate>
      <link>https://dev.to/viswa_m_09/a-gentle-walk-through-of-logistic-regression-in-python-1011</link>
      <guid>https://dev.to/viswa_m_09/a-gentle-walk-through-of-logistic-regression-in-python-1011</guid>
      <description>&lt;h1&gt;
  
  
  🚀 A Gentle Walk‑Through of Logistic Regression in Python
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Meta description&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Learn logistic regression in Python from scratch using NumPy. Step‑by‑step guide to build, train, and predict without heavy libraries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
logisticregression, python, numpy, machinelearning, dataanalysis, classification, gradientdescent, crossentropy, sigmoid, tutorial&lt;/p&gt;


&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When you think of &lt;em&gt;classification&lt;/em&gt;, imagine questions like “Is this email spam?” or “Will this customer churn?” The answer is a binary label (&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F1%253D%2520%255Ctext%257Byes%257D%252C%25200%253D%2520%255Ctext%257Bno%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F1%253D%2520%255Ctext%257Byes%257D%252C%25200%253D%2520%255Ctext%257Bno%257D" alt="1= \text{yes}, 0= \text{no}" width="84" height="11"&gt;&lt;/a&gt;). Logistic regression turns a linear model into a probability estimate, allowing us to quantify confidence in the decision. Because it relies on a simple sigmoid function, we can write the whole algorithm in a few lines while preserving intuition.&lt;/p&gt;


&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data&lt;/strong&gt;: features &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cmathbf%257BX%257D" alt="\mathbf{X}" width="10" height="9"&gt;, binary labels &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cmathbf%257By%257D" alt="\mathbf{y}" width="7" height="8"&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parameters&lt;/strong&gt;: a scalar weight &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fm" alt="m" width="11" height="6"&gt; and bias &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fb" alt="b" width="5" height="9"&gt; for one feature; a vector &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cmathbf%257BW%257D" alt="\mathbf{W}" width="15" height="9"&gt; and bias &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fb" alt="b" width="5" height="9"&gt; for many
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training&lt;/strong&gt;: 1 000 epochs of gradient descent
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prediction&lt;/strong&gt;: sigmoid applied to the linear combination of inputs
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The same equations work whether we have a single feature or several; the only difference is that the weight becomes a vector.&lt;/p&gt;


&lt;h2&gt;
  
  
  Imports and Data
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt;  &lt;span class="c1"&gt;# progress bar
&lt;/span&gt;
&lt;span class="c1"&gt;# One‑dimensional toy data
&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Two‑dimensional toy data
&lt;/span&gt;&lt;span class="n"&gt;X2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
               &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;35&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
               &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;
&lt;span class="n"&gt;y2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;These tiny arrays let us step through the whole learning process without any external data files.&lt;/p&gt;


&lt;h2&gt;
  
  
  Initialisation
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 1‑D parameters
&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;

&lt;span class="c1"&gt;# 2‑D parameters
&lt;/span&gt;&lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bias&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;

&lt;span class="c1"&gt;# Common hyper‑parameters
&lt;/span&gt;&lt;span class="n"&gt;lr&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;   &lt;span class="c1"&gt;# learning rate
&lt;/span&gt;&lt;span class="n"&gt;epochs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;   &lt;span class="c1"&gt;# full passes over the dataset
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Learning rate&lt;/strong&gt; (&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Flr" alt="lr" width="9" height="9"&gt;) controls the step size in gradient descent.
Too high, and we overshoot; too low, and training stalls.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Epochs&lt;/strong&gt; is the number of times we loop over the entire dataset.&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Sigmoid (Logistic) Function
&lt;/h2&gt;

&lt;p&gt;The sigmoid squashes any real number into the interval &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%25280%252C%255C%252C1%2529" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%25280%252C%255C%252C1%2529" alt="(0,\,1)" width="29" height="13"&gt;&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sigmoid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fz" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fz" alt="z" width="6" height="6"&gt;&lt;/a&gt; is very negative, the output is close to 0; when &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fz" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fz" alt="z" width="6" height="6"&gt;&lt;/a&gt; is very positive, it approaches 1.&lt;/p&gt;




&lt;h2&gt;
  
  
  1‑D Logistic Regression
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;logisticRegression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# number of samples
&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;leave&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Forward pass
&lt;/span&gt;        &lt;span class="n"&gt;z&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
        &lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sigmoid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Gradients of cross‑entropy loss
&lt;/span&gt;        &lt;span class="n"&gt;dm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Gradient descent updates
&lt;/span&gt;        &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;dm&lt;/span&gt;
        &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What the loop does&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Compute the linear score &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fz" alt="z" width="6" height="6"&gt;.
&lt;/li&gt;
&lt;li&gt;Convert &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fz" alt="z" width="6" height="6"&gt; into a probability &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Chat%257By%257D" alt="\hat{y}" width="6" height="12"&gt; with the sigmoid.
&lt;/li&gt;
&lt;li&gt;Calculate how much each parameter should change (&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fdm" alt="dm" width="17" height="9"&gt;, &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fdb" alt="db" width="11" height="9"&gt;).
&lt;/li&gt;
&lt;li&gt;Move the parameters a little toward the minimum.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After the loop, &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fm" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fm" alt="m" width="11" height="6"&gt;&lt;/a&gt; and &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fb" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fb" alt="b" width="5" height="9"&gt;&lt;/a&gt; hold the trained model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prediction (1‑D)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;logisticRegression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;new_x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;
&lt;span class="n"&gt;prob&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sigmoid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;new_x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Probability that x = 9 is class 1:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prob&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output is a confidence score between 0 and 1, indicating how likely the point belongs to the positive class.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi‑Feature Logistic Regression
&lt;/h2&gt;

&lt;p&gt;The only change is that we replace the scalar weight with a vector and use matrix operations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;logisticRegressionMultipleFeatures&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;leave&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Forward pass
&lt;/span&gt;        &lt;span class="n"&gt;z&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
        &lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sigmoid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Gradients
&lt;/span&gt;        &lt;span class="n"&gt;dw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Updates
&lt;/span&gt;        &lt;span class="n"&gt;W&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;dw&lt;/span&gt;
        &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The gradients &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fdw" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fdw" alt="dw" width="15" height="9"&gt;&lt;/a&gt; and &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fdb" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fdb" alt="db" width="11" height="9"&gt;&lt;/a&gt; are derived exactly as in the 1‑D case, just expressed in vector form.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prediction (Multi‑D)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bias&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;logisticRegressionMultipleFeatures&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;sample&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;prob&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sigmoid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Probability that sample [40, 70] is class 1:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prob&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Again, the result is a probability that can be thresholded (e.g., 0.5) to obtain a hard class label.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Concepts &amp;amp; Math
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Linear model&lt;/strong&gt;: &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fz%2520%253D%2520%255Cmathbf%257Bw%257D%255E%255Ctop%2520%255Cmathbf%257Bx%257D%2520%252B%2520b" alt="z = \mathbf{w}^\top \mathbf{x} + b" width="72" height="12"&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sigmoid&lt;/strong&gt;: &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Csigma%2528z%2529%2520%253D%2520%255Cfrac%257B1%257D%257B1%2520%252B%2520e%255E%257B-z%257D%257D" alt="\sigma(z) = \frac{1}{1 + e^{-z}}" width="85" height="28"&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross‑entropy loss&lt;/strong&gt;:
$$ L = -\frac{1}{n} \sum_i \Big[ y_i \log \sigma(z_i) + (1 - y&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>logisticregression</category>
      <category>python</category>
      <category>numpy</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Building a Linear Regression Model from Scratch with Gradient Descent in Python</title>
      <dc:creator>Viswa M</dc:creator>
      <pubDate>Fri, 27 Mar 2026 18:35:36 +0000</pubDate>
      <link>https://dev.to/viswa_m_09/building-a-linear-regression-model-from-scratch-with-gradient-descent-in-python-1p19</link>
      <guid>https://dev.to/viswa_m_09/building-a-linear-regression-model-from-scratch-with-gradient-descent-in-python-1p19</guid>
      <description>&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Title
&lt;/h3&gt;

&lt;h1&gt;
  
  
  Gradient Descent Linear Regression in Python
&lt;/h1&gt;

&lt;h3&gt;
  
  
  Meta Description
&lt;/h3&gt;

&lt;p&gt;Learn how to build a linear regression model from scratch using gradient descent in Python. Step‑by‑step code, math, and practical tips.  &lt;/p&gt;

&lt;h3&gt;
  
  
  Tags
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;linearregression
&lt;/li&gt;
&lt;li&gt;gradientdescent
&lt;/li&gt;
&lt;li&gt;python
&lt;/li&gt;
&lt;li&gt;machinelearning
&lt;/li&gt;
&lt;li&gt;dataanalysis
&lt;/li&gt;
&lt;li&gt;codingtutorial
&lt;/li&gt;
&lt;li&gt;algorithm
&lt;/li&gt;
&lt;li&gt;mse
&lt;/li&gt;
&lt;li&gt;supervisedlearning
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  1. Introduction
&lt;/h2&gt;

&lt;p&gt;Linear regression is usually the first model you build when learning machine learning. It introduces the essential concepts of &lt;strong&gt;parameters, loss, gradients, and optimisation&lt;/strong&gt; in the simplest setting: a straight‑line fit.&lt;br&gt;&lt;br&gt;
In this post we’ll walk through a compact Python script that learns a line from five data points using &lt;strong&gt;gradient descent&lt;/strong&gt;. We’ll explain the maths, step through the code, and predict a new value. By the end you’ll understand why the parameters change and how to tweak the algorithm for your own data.  &lt;/p&gt;


&lt;h2&gt;
  
  
  2. What the program does in a nutshell
&lt;/h2&gt;

&lt;p&gt;The script trains a linear model  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Chat%257By%257D%253Dm%255C%252Cx%252Bb" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Chat%257By%257D%253Dm%255C%252Cx%252Bb" alt="math" width="67" height="12"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;to minimise mean‑squared error between predictions &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Chat%257By%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Chat%257By%257D" alt="\hat{y}" width="6" height="12"&gt;&lt;/a&gt; and the true outputs &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fy" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fy" alt="y" width="6" height="8"&gt;&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
Starting from &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fm%253D0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fm%253D0" alt="m=0" width="35" height="9"&gt;&lt;/a&gt;, &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fb%253D0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fb%253D0" alt="b=0" width="29" height="9"&gt;&lt;/a&gt;, it repeatedly updates these two numbers until the loss stops improving, then prints the learned slope, intercept, and a prediction for a new input.  &lt;/p&gt;


&lt;h2&gt;
  
  
  3. Code Implementation
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt;

&lt;span class="c1"&gt;# Data
&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Initial parameters
&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;          &lt;span class="c1"&gt;# slope
&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;          &lt;span class="c1"&gt;# intercept
&lt;/span&gt;
&lt;span class="c1"&gt;# Hyper‑parameters
&lt;/span&gt;&lt;span class="n"&gt;lr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;      &lt;span class="c1"&gt;# learning rate
&lt;/span&gt;&lt;span class="n"&gt;epochs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;  &lt;span class="c1"&gt;# number of iterations
&lt;/span&gt;
&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;# number of training samples
&lt;/span&gt;
&lt;span class="c1"&gt;# Gradient descent loop
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
    &lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;

    &lt;span class="n"&gt;dm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y_hat&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;   &lt;span class="c1"&gt;# gradient wrt m
&lt;/span&gt;    &lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y_hat&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;         &lt;span class="c1"&gt;# gradient wrt b
&lt;/span&gt;
    &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;dm&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Slope:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Intercept:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Prediction for a new input
&lt;/span&gt;&lt;span class="n"&gt;input_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;
&lt;span class="n"&gt;pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;input_value&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Predicted y for x=6:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Step‑by‑step walk‑through
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Imports&lt;/strong&gt; – &lt;code&gt;numpy&lt;/code&gt; for vector maths; &lt;code&gt;tqdm&lt;/code&gt; for a progress bar.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data&lt;/strong&gt; – &lt;code&gt;X&lt;/code&gt; (inputs) and &lt;code&gt;y&lt;/code&gt; (targets).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parameters&lt;/strong&gt; – start with &lt;code&gt;m = 0&lt;/code&gt;, &lt;code&gt;b = 0&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hyper‑parameters&lt;/strong&gt; – learning rate (&lt;code&gt;lr&lt;/code&gt;) controls the step size; &lt;code&gt;epochs&lt;/code&gt; limits how many updates we perform.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training loop&lt;/strong&gt;

&lt;ol&gt;
&lt;li&gt;Compute predictions: &lt;code&gt;y_hat = m * X + b&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;Compute gradients:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;dm = (-2 / n) * np.sum(X * (y - y_hat))&lt;/code&gt; (slope).
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;db = (-2 / n) * np.sum(y - y_hat)&lt;/code&gt; (intercept).
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Update parameters: move a fraction (&lt;code&gt;lr&lt;/code&gt;) of the negative gradient.
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After training&lt;/strong&gt; – print the final &lt;code&gt;m&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt;, then predict &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fy" alt="y" width="6" height="8"&gt; for &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fx%2520%253D%25206" alt="x = 6" width="30" height="9"&gt;.
&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  5. Key concepts (with maths)
&lt;/h2&gt;
&lt;h3&gt;
  
  
  5.1 Linear regression model
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Chat%257By%257D%2520%253D%2520m%255C%252Cx%2520%252B%2520b" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Chat%257By%257D%2520%253D%2520m%255C%252Cx%2520%252B%2520b" alt="math" width="67" height="12"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fm" alt="m" width="11" height="6"&gt; – slope (rate of change).
&lt;/li&gt;
&lt;li&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fb" alt="b" width="5" height="9"&gt; – intercept (value at &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fx%253D0" alt="x=0" width="31" height="9"&gt;).
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  5.2 Loss function (mean‑squared error)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Ctext%257BMSE%257D%2528m%252Cb%2529%2520%253D%2520%255Cfrac%257B1%257D%257Bn%257D%255Csum_%257Bi%253D1%257D%255E%257Bn%257D%2528y_i%2520-%2520%255Chat%257By%257D_i%2529%255E2" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Ctext%257BMSE%257D%2528m%252Cb%2529%2520%253D%2520%255Cfrac%257B1%257D%257Bn%257D%255Csum_%257Bi%253D1%257D%255E%257Bn%257D%2528y_i%2520-%2520%255Chat%257By%257D_i%2529%255E2" alt="math" width="163" height="36"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  5.3 Gradient of the MSE
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cfrac%257B%255Cpartial%2520%255Ctext%257BMSE%257D%257D%257B%255Cpartial%2520m%257D%250A%2520%2520%2520%253D%2520-%255Cfrac%257B2%257D%257Bn%257D%255Csum_%257Bi%253D1%257D%255E%257Bn%257Dx_i%2528y_i%2520-%2520%255Chat%257By%257D_i%2529" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cfrac%257B%255Cpartial%2520%255Ctext%257BMSE%257D%257D%257B%255Cpartial%2520m%257D%250A%2520%2520%2520%253D%2520-%255Cfrac%257B2%257D%257Bn%257D%255Csum_%257Bi%253D1%257D%255E%257Bn%257Dx_i%2528y_i%2520-%2520%255Chat%257By%257D_i%2529" alt="math" width="157" height="36"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cfrac%257B%255Cpartial%2520%255Ctext%257BMSE%257D%257D%257B%255Cpartial%2520b%257D%250A%2520%2520%2520%253D%2520-%255Cfrac%257B2%257D%257Bn%257D%255Csum_%257Bi%253D1%257D%255E%257Bn%257D%2528y_i%2520-%2520%255Chat%257By%257D_i%2529" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Cfrac%257B%255Cpartial%2520%255Ctext%257BMSE%257D%257D%257B%255Cpartial%2520b%257D%250A%2520%2520%2520%253D%2520-%255Cfrac%257B2%257D%257Bn%257D%255Csum_%257Bi%253D1%257D%255E%257Bn%257D%2528y_i%2520-%2520%255Chat%257By%257D_i%2529" alt="math" width="144" height="36"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These match the &lt;code&gt;dm&lt;/code&gt; and &lt;code&gt;db&lt;/code&gt; formulas in the code.  &lt;/p&gt;
&lt;h3&gt;
  
  
  5.4 Gradient descent update
&lt;/h3&gt;

&lt;p&gt;With learning rate &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Calpha" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Calpha" alt="\alpha" width="7" height="6"&gt;&lt;/a&gt;:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fm%2520%255Cleftarrow%2520m%2520-%2520%255Calpha%255C%252C%255Cfrac%257B%255Cpartial%2520%255Ctext%257BMSE%257D%257D%257B%255Cpartial%2520m%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fm%2520%255Cleftarrow%2520m%2520-%2520%255Calpha%255C%252C%255Cfrac%257B%255Cpartial%2520%255Ctext%257BMSE%257D%257D%257B%255Cpartial%2520m%257D" alt="math" width="108" height="28"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fb%2520%255Cleftarrow%2520b%2520-%2520%255Calpha%255C%252C%255Cfrac%257B%255Cpartial%2520%255Ctext%257BMSE%257D%257D%257B%255Cpartial%2520b%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fb%2520%255Cleftarrow%2520b%2520-%2520%255Calpha%255C%252C%255Cfrac%257B%255Cpartial%2520%255Ctext%257BMSE%257D%257D%257B%255Cpartial%2520b%257D" alt="math" width="96" height="28"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Repeating moves the parameters toward the minimum of the loss surface.  &lt;/p&gt;


&lt;h2&gt;
  
  
  6. Example section
&lt;/h2&gt;

&lt;p&gt;Here’s what the first few iterations look like (values rounded for clarity):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fm" alt="m" width="11" height="6"&gt;&lt;/th&gt;
&lt;th&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fb" alt="b" width="5" height="9"&gt;&lt;/th&gt;
&lt;th&gt;MSE&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0.000&lt;/td&gt;
&lt;td&gt;0.000&lt;/td&gt;
&lt;td&gt;12.80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.040&lt;/td&gt;
&lt;td&gt;0.240&lt;/td&gt;
&lt;td&gt;8.23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0.070&lt;/td&gt;
&lt;td&gt;0.360&lt;/td&gt;
&lt;td&gt;6.46&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;…&lt;/td&gt;
&lt;td&gt;…&lt;/td&gt;
&lt;td&gt;…&lt;/td&gt;
&lt;td&gt;…&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;After 1 000 epochs the algorithm converges to approximately&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Slope: 0.60
Intercept: 1.20
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Predicting for &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fx%2520%253D%25206" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fx%2520%253D%25206" alt="x = 6" width="30" height="9"&gt;&lt;/a&gt; gives  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Chat%257By%257D%253D0.60%255Ctimes6%252B1.20%253D4.80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%255Chat%257By%257D%253D0.60%255Ctimes6%252B1.20%253D4.80" alt="math" width="151" height="12"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;so the program outputs “Predicted y for x=6: 4.8”.  &lt;/p&gt;




&lt;h2&gt;
  
  
  7. Quick sanity checks
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Observation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Small learning rate&lt;/strong&gt; (&lt;code&gt;lr = 0.0001&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Converges slowly; more epochs needed.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Large learning rate&lt;/strong&gt; (&lt;code&gt;lr = 1&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Updates overshoot the optimum; loss may diverge.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Choosing a suitable &lt;code&gt;lr&lt;/code&gt; and &lt;code&gt;epochs&lt;/code&gt; is a standard practice for any optimisation problem.  &lt;/p&gt;




&lt;h2&gt;
  
  
  8. Take‑away
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;This code implements ordinary least‑squares regression with gradient descent.
&lt;/li&gt;
&lt;li&gt;Gradient descent is a generic optimisation routine that underpins logistic regression, neural networks, and more.
&lt;/li&gt;
&lt;li&gt;Understanding the update equations clarifies &lt;em&gt;why&lt;/em&gt; the parameters evolve and how to troubleshoot training.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Feel free to experiment:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Swap in a new dataset.
&lt;/li&gt;
&lt;li&gt;Try different learning rates or epoch counts.
&lt;/li&gt;
&lt;li&gt;Normalise your inputs or add a bias term.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Happy coding and keep building!  &lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Slug:&lt;/strong&gt; linear-regression-gradient-descent-python&lt;br&gt;&lt;br&gt;
&lt;strong&gt;SEO Title:&lt;/strong&gt; Gradient Descent Linear Regression in Python&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Meta Description:&lt;/strong&gt; Learn how to build a linear regression model from scratch using gradient descent in Python. Step‑by‑step code, math, and practical tips.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Keywords:&lt;/strong&gt; linear regression, gradient descent, python, machine learning, data science, supervised learning, mean squared error, model training, algorithm explanation&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Tags:&lt;/strong&gt; linearregression, gradientdescent, python, machinelearning, dataanalysis, codingtutorial, algorithm, mse, supervisedlearning&lt;/p&gt;

</description>
      <category>linearregression</category>
      <category>gradientdescent</category>
      <category>python</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Building a Simple Logistic Regression from Scratch (Python Edition)</title>
      <dc:creator>Viswa M</dc:creator>
      <pubDate>Fri, 27 Mar 2026 18:33:10 +0000</pubDate>
      <link>https://dev.to/viswa_m_09/building-a-simple-logistic-regression-from-scratch-python-edition-4pik</link>
      <guid>https://dev.to/viswa_m_09/building-a-simple-logistic-regression-from-scratch-python-edition-4pik</guid>
      <description>&lt;h1&gt;
  
  
  Building a Simple Logistic Regression from Scratch (Python Edition)
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Meta description:&lt;/strong&gt; Learn to build a simple logistic regression model in pure python with gradient descent, no libraries needed. Step‑by‑step guide, code snippets, predictions.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; logisticregression, python, gradientdescent, machinelearning, purepython, classification, tutorial, datamanipulation  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Slug:&lt;/strong&gt; build-logistic-regression-from-scratch-in-python  &lt;/p&gt;

&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;In this post we’ll hand‑craft a logistic‑regression classifier in vanilla NumPy, without any machine‑learning framework.&lt;br&gt;&lt;br&gt;
We’ll:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Train a one‑feature model.&lt;/li&gt;
&lt;li&gt;Scale the same idea to two features.&lt;/li&gt;
&lt;li&gt;See how gradient descent iteratively lowers the cross‑entropy loss.&lt;/li&gt;
&lt;li&gt;Finally, predict the probability that a new sample belongs to the positive class.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything is fully transparent, so you can trace every math step and every line of code.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. What the Code Does – Overview
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create toy data&lt;/strong&gt; for a binary classification problem.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Define a one‑feature logistic‑regression function&lt;/strong&gt; that trains by gradient descent.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Predict&lt;/strong&gt; the probability for a new single‑feature sample.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Define a multi‑feature version&lt;/strong&gt; of the same algorithm.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Predict&lt;/strong&gt; the probability for a new two‑feature sample.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All of this is implemented in plain NumPy, so you can see exactly what happens during training.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Step‑by‑Step Walk‑Through
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2.1 Imports &amp;amp; Data Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt;

&lt;span class="c1"&gt;# 1‑D toy data
&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;          &lt;span class="c1"&gt;# feature values
&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;          &lt;span class="c1"&gt;# binary labels
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;numpy&lt;/code&gt; handles vectorised math.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tqdm&lt;/code&gt; shows a progress bar during the training loop.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2.2 Hyperparameters &amp;amp; Initial Parameters
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;          &lt;span class="c1"&gt;# weight (slope)
&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;          &lt;span class="c1"&gt;# bias (intercept)
&lt;/span&gt;&lt;span class="n"&gt;lr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;      &lt;span class="c1"&gt;# learning rate
&lt;/span&gt;&lt;span class="n"&gt;epochs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;  &lt;span class="c1"&gt;# number of gradient steps
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Parameters start at zero.
&lt;/li&gt;
&lt;li&gt;The learning rate determines the step size.
&lt;/li&gt;
&lt;li&gt;More epochs mean more passes over the data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2.3 One‑Feature Logistic Regression – Core Function
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;logisticRegression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="c1"&gt;# Linear part
&lt;/span&gt;        &lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;

        &lt;span class="c1"&gt;# Sigmoid activation
&lt;/span&gt;        &lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="c1"&gt;# Gradients
&lt;/span&gt;        &lt;span class="n"&gt;dm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Gradient descent update
&lt;/span&gt;        &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;dm&lt;/span&gt;
        &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;code&gt;z = m * X + b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Linear combination of feature and bias.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;code&gt;σ(z) = 1/(1+e^{-z})&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Squashes any real number into the interval &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3F%25280%252C%255C%252C1%2529" alt="(0,\,1)" width="29" height="13"&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;dm&lt;/code&gt; &amp;amp; &lt;code&gt;db&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Partial derivatives of cross‑entropy loss w.r.t. &lt;code&gt;m&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Update rules&lt;/td&gt;
&lt;td&gt;Move parameters toward the minimum of the loss.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2.4 Training &amp;amp; Prediction for 1‑D Data
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;logisticRegression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Predict probability for a new input
&lt;/span&gt;&lt;span class="n"&gt;inp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;
&lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;inp&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="n"&gt;prob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Probability:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prob&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After training on the six points, the model estimates how likely &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fx%2520%253D%25209" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fsvg.image%3Fx%2520%253D%25209" alt="x = 9" width="30" height="9"&gt;&lt;/a&gt; belongs to the positive class.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.5 Multi‑Feature Logistic Regression – Scaling Up
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 2‑D toy data
&lt;/span&gt;&lt;span class="n"&gt;X2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;35&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;y2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;  &lt;span class="c1"&gt;# one weight per feature
&lt;/span&gt;&lt;span class="n"&gt;bias&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2.6 Core Function for Multiple Features
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
def logisticRegressionMultipleFeatures(X, y, W, b, lr, epochs):
    n = len(X)

    for _ in tqdm(range(epochs)):
        # Linear part
        z = np.dot(X, W) + b

        # Sigmoid
        y_hat = 1 / (1 + np.exp(-z))

        # Gradients
        dw = (1 / n
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>logisticregression</category>
      <category>python</category>
      <category>gradientdescent</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
