<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Suraj J</title>
    <description>The latest articles on DEV Community by Suraj J (@suraj47).</description>
    <link>https://dev.to/suraj47</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F267039%2F727836b4-05e6-45d5-9e42-54c432bc32ae.png</url>
      <title>DEV Community: Suraj J</title>
      <link>https://dev.to/suraj47</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/suraj47"/>
    <language>en</language>
    <item>
      <title>MLS.1.b Gradient Descent in Linear Regression</title>
      <dc:creator>Suraj J</dc:creator>
      <pubDate>Tue, 12 Nov 2019 06:13:28 +0000</pubDate>
      <link>https://dev.to/ml_scratch/mls-1-b-gradient-descent-in-linear-regression-53dl</link>
      <guid>https://dev.to/ml_scratch/mls-1-b-gradient-descent-in-linear-regression-53dl</guid>
      <description>&lt;h1&gt;
  
  
  Gradient Descent in Linear Regression
&lt;/h1&gt;

&lt;p&gt;Gradient Descent is a first order optimization algorithm to find the minimum of a function.It finds the minimum (local) of a function by moving along the direction of steep descent (downwards). This helps us to update the parameters of the model (weights and bias) more accurately. &lt;/p&gt;

&lt;p&gt;To get to the local minima we can't just go directly to the point (on a graph plot). We need to descend in smaller steps and check for minima and take another step to the direction of descent until we get our desired local minimum.&lt;/p&gt;

&lt;p&gt;The small steps mentioned above is called as learning rate. If the learning rate is very small the precision is more but is very time consuming. And large learning rate may lead us to miss the minimum (overshooting). The theory is to use the learning rate at higher rate until the slope of curve starts to decrease ad once it  starts decreasing, we start using smaller learning rates(Less time and More Precision).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Jy3knlqQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/q2iywvf7yjidq4gtht97.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Jy3knlqQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/q2iywvf7yjidq4gtht97.png" alt="Gradient descending over a slope"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;center&gt;&lt;small&gt;Gradient descending over a slope&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;The cost function helps us to evaluate how good our model is functioning or predicting. Its a &lt;em&gt;&lt;strong&gt;loss function&lt;/strong&gt;&lt;/em&gt;. It has its own curve and parameters (weights and bias). The slope of the curve helps us to update our parameters accordingly. The less the cost more the predicted probability.&lt;/p&gt;

&lt;p&gt;In the training phase, we are finding the y_train value to find how much is the value is deviating from the given output. Then we calculate the cost error in the given second phase by using the cost error formula.&lt;/p&gt;


&lt;center&gt;&lt;strong&gt;&lt;br&gt;
&lt;code&gt;y_train = w * xi + b&lt;/code&gt;

&lt;p&gt;&lt;code&gt;cost = (1/N) * ∑(yi − y_train)&lt;sup&gt;2&lt;/sup&gt;  {i from 1 to n}&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;&lt;/strong&gt;&lt;/center&gt;
&lt;br&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_iters&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;#Training phase 
&lt;/span&gt;    &lt;span class="n"&gt;y_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bias&lt;/span&gt;

    &lt;span class="c1"&gt;#Cost error calculating Phase
&lt;/span&gt;    &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n_samples&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;costs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Now we update the weights and bias to decrease our error by doing&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="c1"&gt;#Updating the weight and bias derivatives
&lt;/span&gt;    &lt;span class="n"&gt;Delta_w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n_samples&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;Delta_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n_samples&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 

    &lt;span class="c1"&gt;#Updating weights
&lt;/span&gt;    &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;learn_rate&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;Delta_w&lt;/span&gt;
    &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bias&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bias&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;learn_rate&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;Delta_b&lt;/span&gt;

    &lt;span class="c1"&gt;# end of loop
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;And ploting cost function against iterations&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kZTBKKrx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/1h7gmh057q6gqx3jq7v1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kZTBKKrx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/1h7gmh057q6gqx3jq7v1.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
&lt;small&gt;&lt;center&gt;Cost against iterations&lt;/center&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;Above given is a cost function curve against number of iterations. As number of iterations increases (steps) cost decreased drastically meaning minimum is nearby and almost became zero. We do the above updations until the error becomes &lt;em&gt;negligible&lt;/em&gt; or minimum is reached.&lt;/p&gt;
&lt;h2&gt;
  
  
  Source code from Scratch
&lt;/h2&gt;


&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LinearModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="s"&gt;"""
    Linear Regression Model Class
    """&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;pass&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;gradient_descent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;learn_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_iters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="s"&gt;"""
        Trains a linear regression model using gradient descent
        """&lt;/span&gt;
        &lt;span class="n"&gt;n_samples&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_features&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bias&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prev_weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prev_bias&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;
        &lt;span class="n"&gt;costs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_iters&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="s"&gt;""""
            Training Phase
            """&lt;/span&gt;
            &lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bias&lt;/span&gt;
            &lt;span class="s"&gt;"""
            Cost error Phase
            """&lt;/span&gt;
            &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n_samples&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;costs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="s"&gt;"""
            Verbose: Description of cost at each iteration
            """&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Cost at iteration {0}: {1}"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="s"&gt;"""
            Updating the derivative
            """&lt;/span&gt;
            &lt;span class="n"&gt;Delta_w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n_samples&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="n"&gt;Delta_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n_samples&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 

            &lt;span class="s"&gt;""""
            Updating weights and bias
            """&lt;/span&gt;
            &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;learn_rate&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;Delta_w&lt;/span&gt;
            &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bias&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bias&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;learn_rate&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;Delta_b&lt;/span&gt;

            &lt;span class="s"&gt;"""
            Save the weights for visualisation
            """&lt;/span&gt;
            &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prev_weights&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prev_bias&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;costs&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="s"&gt;"""
        Predicting the values by using Linear Model
        """&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bias&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# We have created our Linear Model class. Now we need to create and load our model.
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LinearModel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;w_trained&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b_trained&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;costs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gradient_descent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;learn_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.005&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_iters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OaW8XTUm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/ya2x6ov129op3xc9v55g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OaW8XTUm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/ya2x6ov129op3xc9v55g.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;visualize_training&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="s"&gt;"""
        Visualizing the line against the dataset        
        """&lt;/span&gt;

        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prev_weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prev_weights&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'red'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


        &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;animate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line_data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line_data&lt;/span&gt;
            &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set_ydata&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# update the data
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

        &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

        &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_next_weight_and_bias&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prev_weights&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prev_weights&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prev_bias&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;animation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FuncAnimation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;animate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;get_next_weight_and_bias&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;init_func&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;35&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;blit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;





&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Visualization of training phase to get the best fit line
&lt;/span&gt;&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;ani&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;visualize_training&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--b0h3sKUR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/6km0j9dyj36yhhbz85w0.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--b0h3sKUR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/6km0j9dyj36yhhbz85w0.gif" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Prediction Phase to test our model  
&lt;/span&gt;&lt;span class="n"&gt;n_samples&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
&lt;span class="n"&gt;n_samples_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;

&lt;span class="n"&gt;y_p_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y_p_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;error_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n_samples&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;y_p_train&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;error_test&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n_samples_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;y_p_test&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Error on training set: {}"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;error_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Error on test set: {}"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;error_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--msoial9p--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/vjgmluaf12ffqou8x48c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--msoial9p--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/vjgmluaf12ffqou8x48c.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Plotting predicted best fit line
&lt;/span&gt;&lt;span class="n"&gt;fig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_p_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8BclMOb9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/wiybp7ij6p976921wm91.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8BclMOb9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/wiybp7ij6p976921wm91.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;center&gt;&lt;small&gt;Predicted Output&lt;/small&gt;&lt;/center&gt;

&lt;blockquote&gt;
&lt;p&gt;Check out the full source code for &lt;a href="https://github.com/ML-Scratch/ML_Code_From_Scratch/blob/master/MLS.1.Linear%20Regression/LinearModel_Gradient_Descent.ipynb"&gt;Gradient Descent on GitHub&lt;/a&gt;&lt;br&gt;
and also check out the other approaches in &lt;a href="https://github.com/ML-Scratch/ML_Code_From_Scratch/tree/master/MLS.1.Linear%20Regression"&gt;Linear Regression by ML-Scratch&lt;/a&gt;&lt;/p&gt;


&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Contributors
&lt;/h2&gt;

&lt;p&gt;This series is made possible by help from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pranav (&lt;a class="comment-mentioned-user" href="https://dev.to/devarakondapranav"&gt;@devarakondapranav&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Ram (&lt;a class="comment-mentioned-user" href="https://dev.to/r0mflip"&gt;@r0mflip&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Devika (&lt;a class="comment-mentioned-user" href="https://dev.to/devikamadupu1"&gt;@devikamadupu1&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Pratyusha(&lt;a class="comment-mentioned-user" href="https://dev.to/prathyushakallepu"&gt;@prathyushakallepu&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Pranay (&lt;a class="comment-mentioned-user" href="https://dev.to/pranay9866"&gt;@pranay9866&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Subhasri (&lt;a class="comment-mentioned-user" href="https://dev.to/subhasrir"&gt;@subhasrir&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Laxman (&lt;a class="comment-mentioned-user" href="https://dev.to/lmn"&gt;@lmn&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Vaishnavi(&lt;a class="comment-mentioned-user" href="https://dev.to/vaishnavipulluri"&gt;@vaishnavipulluri&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Suraj (&lt;a class="comment-mentioned-user" href="https://dev.to/suraj47"&gt;@suraj47&lt;/a&gt;
)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>gradientdescent</category>
      <category>python</category>
      <category>linearregression</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>MLS.1.a Concepts for Linear Regression</title>
      <dc:creator>Suraj J</dc:creator>
      <pubDate>Tue, 12 Nov 2019 06:13:16 +0000</pubDate>
      <link>https://dev.to/ml_scratch/mls-1-a-concepts-for-linear-regression-1n9f</link>
      <guid>https://dev.to/ml_scratch/mls-1-a-concepts-for-linear-regression-1n9f</guid>
      <description>&lt;p&gt;The idea behind simple linear regression is to "fit" the observations of two variables into a linear relationship between them. Graphically, the task is to draw the line that is "best-fitting" or "closest" to the points.&lt;/p&gt;

&lt;p&gt;The equation of a straight line is written using the &lt;strong&gt;&lt;code&gt;y = mx + b&lt;/code&gt;&lt;/strong&gt;, where &lt;strong&gt;&lt;code&gt;m&lt;/code&gt;&lt;/strong&gt; is the slope (Gradient) and &lt;strong&gt;&lt;code&gt;b&lt;/code&gt;&lt;/strong&gt; is y-intercept (where the line crosses the Y axis).Where, &lt;strong&gt;&lt;code&gt;m&lt;/code&gt;&lt;/strong&gt; is the slope (Gradient) and &lt;strong&gt;&lt;code&gt;b&lt;/code&gt;&lt;/strong&gt; is y-intercept (Bias).In calculation of mean and y-intercept, we use some of the mathematical concepts explained below:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Mean&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This term is used to describe properties of statistical distributions. It is determined by adding all the data points in a population and then dividing the total by the number of points. The resulting number is known as the mean or the average.&lt;br&gt;
 &lt;/p&gt;
&lt;center&gt;&lt;strong&gt;&lt;code&gt;x̄ = Sum of observations / number of observations&lt;/code&gt;&lt;/strong&gt;&lt;/center&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Variance&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Variance (&lt;strong&gt;&lt;code&gt;σ2&lt;/code&gt;&lt;/strong&gt;) is a measurement of the spread between numbers in a data set. That is, it measures how far each number in the set is from the mean and therefore from every other number in the set.&lt;br&gt;
&lt;strong&gt;&lt;center&gt;&lt;code&gt;Variance = n * sum of all(xi − x̄)&lt;sup&gt;2&lt;/sup&gt;&lt;/code&gt;&lt;/center&gt;&lt;/strong&gt;&lt;br&gt;
Where:&lt;br&gt;
     &lt;code&gt;&lt;strong&gt;xi&lt;/strong&gt;&lt;/code&gt; = i&lt;sup&gt;th&lt;/sup&gt; data point&lt;br&gt;
     &lt;code&gt;&lt;strong&gt;x̄&lt;/strong&gt;&lt;/code&gt; = the mean of all data points&lt;br&gt;
     &lt;code&gt;&lt;strong&gt;n&lt;/strong&gt;&lt;/code&gt; = the number of data points&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Co-variance&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Covariance is a measure of how much two random variables vary together. It’s similar to variance, but where variance tells you how a single variable varies, co-variance tells you how two variables vary together.Square root of variance is called &lt;code&gt;Standard Deviation&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;
&lt;center&gt; &lt;strong&gt;&lt;code&gt;Cov(X,Y) = Σ (E(X) - μ) * (E(Y) - ν) / (n - 1)&lt;/code&gt;&lt;/strong&gt; &lt;/center&gt;
&lt;br&gt;
Where&lt;br&gt;
     &lt;strong&gt;&lt;code&gt;X&lt;/code&gt;&lt;/strong&gt; is a random variable&lt;br&gt;
     &lt;strong&gt;&lt;code&gt;E(X)&lt;/code&gt;&lt;/strong&gt; = &lt;strong&gt;&lt;code&gt;μ&lt;/code&gt;&lt;/strong&gt; is the expected value (the mean) of the random variable &lt;strong&gt;&lt;code&gt;X&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
     &lt;strong&gt;&lt;code&gt;E(Y)&lt;/code&gt;&lt;/strong&gt; = &lt;strong&gt;&lt;code&gt;ν&lt;/code&gt;&lt;/strong&gt; is the expected value (the mean) of the random variable &lt;strong&gt;&lt;code&gt;Y&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
     &lt;strong&gt;&lt;code&gt;n&lt;/code&gt;&lt;/strong&gt; = the number of items in the data set
&lt;h3&gt;
  
  
  &lt;strong&gt;Correlation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Correlation(r)&lt;/strong&gt; is a statistical technique that can show whether and how strongly pairs of variables are related.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1IrPEkCw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/70yu9ljtxsx0o45qs90j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1IrPEkCw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/70yu9ljtxsx0o45qs90j.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;center&gt;&lt;small&gt;&lt;strong&gt;&lt;code&gt;Sx&lt;/code&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;code&gt;Sy&lt;/code&gt;&lt;/strong&gt; = Standard deviation of &lt;code&gt;x&lt;/code&gt;, &lt;code&gt;y&lt;/code&gt;&lt;/small&gt;&lt;/center&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Root Mean Square Error (RMSE)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Root Mean Square Error (RMSE) is the standard deviation of the residuals(prediction errors). Residuals are a measure of how far from the regression line data points are.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JIl_QuP5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/kgllofd5bfihljduw61m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JIl_QuP5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/kgllofd5bfihljduw61m.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Calculation of Slope and Bias&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The slope of the line is calculated as the change in y divided by change in x.&lt;/p&gt;


&lt;center&gt;&lt;strong&gt;&lt;code&gt;slope m = change in y / change in x&lt;/code&gt;&lt;/strong&gt;&lt;/center&gt;

&lt;p&gt;The y-intercept over bias shall be calculated using the formula&lt;/p&gt;


&lt;center&gt;&lt;strong&gt;&lt;code&gt;y = m(x - x1) + y1&lt;/code&gt;&lt;/strong&gt;&lt;/center&gt;

&lt;p&gt;These values are different from what was actually there in the training set and if we plot this (x, y) graph against the original graph, the straight line will be way off the original points in the graph. This may lead to error which is the difference of values between actual points and the points on the straight line. Ideally, we’d like to have a straight line where the error is minimized across all points.Error can be reduced using many mathematical ways. One of such method is "Least Square Regression"&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Least Square Regression&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Least Square Regression is a method which minimizes the error in such a way that the sum of all square error is minimized.&lt;br&gt;
&lt;/p&gt;
&lt;center&gt;
&lt;strong&gt;&lt;br&gt;
&lt;code&gt;m = (Σ ((x - x̄) * (y - ȳ)) / Σ (x - x̄))&lt;sup&gt;2&lt;/sup&gt;&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;(or)&lt;/em&gt;&lt;br&gt;
&lt;strong&gt;&lt;code&gt;m = r(Sy / Sx)&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;(and we get the y-interept)&lt;/em&gt;&lt;br&gt;
&lt;strong&gt;&lt;code&gt;b = ȳ - m * x̄&lt;/code&gt;&lt;br&gt;
&lt;/strong&gt;
&lt;/center&gt;

&lt;p&gt;Where&lt;br&gt;
     &lt;code&gt;&lt;strong&gt;Sx&lt;/strong&gt;&lt;/code&gt; is standard deviation of &lt;code&gt;x&lt;/code&gt;&lt;br&gt;
     &lt;code&gt;&lt;strong&gt;Sy&lt;/strong&gt;&lt;/code&gt; is standard deviation of &lt;code&gt;y&lt;/code&gt;&lt;br&gt;
     &lt;strong&gt;&lt;code&gt;r&lt;/code&gt;&lt;/strong&gt; is correlation between &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt;&lt;br&gt;
     &lt;strong&gt;&lt;code&gt;m&lt;/code&gt;&lt;/strong&gt; is slope&lt;br&gt;
     &lt;strong&gt;&lt;code&gt;b&lt;/code&gt;&lt;/strong&gt; is the y-intercept&lt;/p&gt;

&lt;p&gt;This method is intended to reduce the sum square of all error values. The lower the error, lesser the overall deviation from the original point.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Cost Function&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The cost function calculates the square of the error for each example in the dataset, sums it up and divides this value by the number of examples in the dataset (denoted by &lt;code&gt;m&lt;/code&gt;).This cost function helps in determining the best fit line.The cost function for two variables &lt;strong&gt;&lt;code&gt;θ0&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;θ1&lt;/code&gt;&lt;/strong&gt; denoted by &lt;strong&gt;&lt;code&gt;J&lt;/code&gt;&lt;/strong&gt; and is given as follows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0Nw5xkCX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/9s039516h7vc4ty8i2me.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0Nw5xkCX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/9s039516h7vc4ty8i2me.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_kiMn6dP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/ymej6u66w2gueseg9x7a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_kiMn6dP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/ymej6u66w2gueseg9x7a.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, we have to make use of cost function to adjust our parameters  &lt;strong&gt;&lt;code&gt;θ0&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;θ1&lt;/code&gt;&lt;/strong&gt; such that they result in the least cost function value. We make use of a technique called Gradient Descent to minimize the cost function. &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Read On 📝&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://dev.to/ml_scratch/mls-1-b-gradient-descent-in-linear-regression-53dl"&gt;MLS.1.b Gradient Descent in Linear regression&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Contributors
&lt;/h2&gt;

&lt;p&gt;This series is made possible by help from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pranav (&lt;a class="comment-mentioned-user" href="https://dev.to/devarakondapranav"&gt;@devarakondapranav&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Ram (&lt;a class="comment-mentioned-user" href="https://dev.to/r0mflip"&gt;@r0mflip&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Devika (&lt;a class="comment-mentioned-user" href="https://dev.to/devikamadupu1"&gt;@devikamadupu1&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Pratyusha(&lt;a class="comment-mentioned-user" href="https://dev.to/prathyushakallepu"&gt;@prathyushakallepu&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Pranay (&lt;a class="comment-mentioned-user" href="https://dev.to/pranay9866"&gt;@pranay9866&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Subhasri (&lt;a class="comment-mentioned-user" href="https://dev.to/subhasrir"&gt;@subhasrir&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Laxman (&lt;a class="comment-mentioned-user" href="https://dev.to/lmn"&gt;@lmn&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Vaishnavi(&lt;a class="comment-mentioned-user" href="https://dev.to/vaishnavipulluri"&gt;@vaishnavipulluri&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Suraj (&lt;a class="comment-mentioned-user" href="https://dev.to/suraj47"&gt;@suraj47&lt;/a&gt;
)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>algebra</category>
      <category>linearregression</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>MLS.1 Linear Regression</title>
      <dc:creator>Suraj J</dc:creator>
      <pubDate>Tue, 12 Nov 2019 06:12:55 +0000</pubDate>
      <link>https://dev.to/ml_scratch/mls-1-linear-regression-1eo3</link>
      <guid>https://dev.to/ml_scratch/mls-1-linear-regression-1eo3</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;The source code for the topics discussed in the post can be found at &lt;a href="https://github.com/ML-Scratch/ML_Code_From_Scratch"&gt;https://github.com/ML-Scratch/ML_Code_From_Scratch&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Linear regression is a very basic supervised learning model. It is used when there is a linear relationship between the feature vector and the target , or in simple terms between the input and output we are trying to predict. Linear regression serves as the starting point for many machine learning enthusiasts and understanding this model can greatly help in mastering the more complex models in ML.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;When should you use Linear Regression?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;As the name suggests, Linear Regression involves fitting the best fit straight line through the data. Consider a dataset consisting information of used cars and the prices they were sold for. For example, it consists of the number of kilometers the car traveled and the price it was sold. As one might realize, there could be a linear relationship between the number of kilometers traveled and the selling price.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--VhkHGO4J--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/odkr516ke6ufgpith1hj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--VhkHGO4J--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/odkr516ke6ufgpith1hj.png" alt="Visualization of data traveled"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A visualisation of the data like in the above image makes it clear that a straight line can be fit for this kind of data. One should also note that in most of the cases it is impossible to fit at line that passes through all the points in the dataset. The best we could do is to fit a straight line that looks like passing through most of the points and we will see in the coming sections how we could do so. Once we fit a line through this data i.e generate a line equation, we can start predicting prices by plugging in the number of kilometers the car has traveled into the line equation.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Understanding the math&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The math behind the working of Linear regression is not at all complicated. For simplicity let’s assume we have only one feature i.e the no kilometers traveled (let's call this X) in the dataset and one column with the selling prices of these cars (let's call this Y).&lt;/p&gt;

&lt;p&gt;Our job is to create a line equation like &lt;strong&gt;&lt;code&gt;Y = mX + c&lt;/code&gt;&lt;/strong&gt; . When the value of &lt;strong&gt;&lt;code&gt;X&lt;/code&gt;&lt;/strong&gt; (i.e the number of kilometers traveled) from the dataset is plugged into this equation it should calculate  &lt;strong&gt;&lt;code&gt;Y&lt;/code&gt;&lt;/strong&gt;  (i.e the predicted selling price) that is either equal to the  selling price value from the dataset or some close enough value. &lt;/p&gt;

&lt;p&gt;As you might incur, the variables in the above line equation are &lt;strong&gt;&lt;code&gt;m&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;c&lt;/code&gt;&lt;/strong&gt; which are nothing but the slope of the line and the &lt;strong&gt;&lt;code&gt;y&lt;/code&gt;&lt;/strong&gt; intercept of the line. Remember that &lt;strong&gt;&lt;code&gt;X&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;Y&lt;/code&gt;&lt;/strong&gt; are not the variables in our case as they are nothing but constants from our dataset that we will use in creating the best fit line. &lt;/p&gt;

&lt;p&gt;So our job is now to find out the right &lt;strong&gt;&lt;code&gt;m&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;c&lt;/code&gt;&lt;/strong&gt; values so that we can make an ideal straight line that passes through most of the points in the dataset.  &lt;/p&gt;

&lt;p&gt;Let's modify the above equation slightly to &lt;strong&gt;&lt;code&gt;Y = θ0 + θ1X&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
Where &lt;strong&gt;&lt;code&gt;θ0 = c&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;Y = θ1 = m&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How to decide if a line is good enough?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Now that we understand the line equation, how should we decide if the line equation we are using the best fit line or not. An obvious way to do this is to plot the line against the dataset and visually decide. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--x6g0I0Xg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/gb6mvfr4xh17c2ccjht4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--x6g0I0Xg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/gb6mvfr4xh17c2ccjht4.png" alt="Good and bad lines"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However this is not practically possible for huge datasets with large no of features, which is often the case with most of the real world datasets. Hence we use a simple mathematical formula called the cost function to decide if a given line is a good fit or a bad fit to the data. &lt;/p&gt;

&lt;p&gt;Consider the following mini dataset:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;S.no&lt;/th&gt;
&lt;th&gt;Km's travel&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1000&lt;/td&gt;
&lt;td&gt;2.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2000&lt;/td&gt;
&lt;td&gt;1.7&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Suppose we start with random values for &lt;strong&gt;&lt;code&gt;θ0 = 10&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;θ1 = 20&lt;/code&gt;&lt;/strong&gt;. Let us plug in &lt;strong&gt;&lt;code&gt;X = 1000&lt;/code&gt;&lt;/strong&gt; as per the first example in the dataset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;center&gt;
&lt;code&gt;Y = 10 + 20(1000)&lt;/code&gt;&lt;br&gt;&lt;code&gt;Y = 20010&lt;/code&gt;
&lt;/center&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The predicted value according to the above equation is 20010 rupees whereas the selling price according to the dataset is 200000 rupees. This is definitely a bad prediction. The magnitude of the badness of this prediction or technically the  &lt;strong&gt;error&lt;/strong&gt; is the difference in the predicted value(denoted by &lt;code&gt;Ŷ&lt;/code&gt; and the actual value(denoted by &lt;code&gt;Y&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;In this the predicted value &lt;strong&gt;&lt;code&gt;Ŷ = 20010&lt;/code&gt;&lt;/strong&gt; where as the actual value(or value from the dataset) is &lt;strong&gt;&lt;code&gt;Y = 210000&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The cost function for two variables &lt;strong&gt;&lt;code&gt;θ0&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;θ1&lt;/code&gt;&lt;/strong&gt; are denoted by &lt;strong&gt;&lt;code&gt;J&lt;/code&gt;&lt;/strong&gt; and is given as follows&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--w36hY7MF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/mryokvqs0p2bw040d7ek.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--w36hY7MF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/mryokvqs0p2bw040d7ek.png" alt="Cost function"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The cost function calculates the square of the error for each example in the dataset, sums it up and divides this value by the number of examples in the dataset (denoted by &lt;code&gt;m&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;This cost function helps in determining the best fit line. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The division with 2 is to simplify calculations involving the first order differentials &lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Arriving at the best fit line&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Now that we have defined the cost function, we have to make use of it to adjust our parameters &lt;strong&gt;&lt;code&gt;θ0&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;θ1&lt;/code&gt;&lt;/strong&gt; such that they result in the least cost function value. We make use of a technique called Gradient Descent to minimize the value of the cost function. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CyVGamQr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/coda6mut2ch1zgrqamsa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CyVGamQr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/coda6mut2ch1zgrqamsa.png" alt="Derivation of Gradient Descent"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;center&gt;&lt;small&gt;Source: &lt;a href="https://mccormickml.com/2014/03/04/gradient-descent-derivation/"&gt;https://mccormickml.com/2014/03/04/gradient-descent-derivation/&lt;/a&gt;&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;Gradient descent makes small changes to existing &lt;strong&gt;&lt;code&gt;θ0&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;θ1&lt;/code&gt;&lt;/strong&gt; values such that they result in more and more smaller cost function values. The changes to &lt;strong&gt;&lt;code&gt;θ0&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;θ1&lt;/code&gt;&lt;/strong&gt; are performed as follows. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_55fvKgJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/1onp5hst6fybupkrf6ww.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_55fvKgJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/1onp5hst6fybupkrf6ww.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Where &lt;strong&gt;&lt;code&gt;j = 0&lt;/code&gt;&lt;/strong&gt; or &lt;strong&gt;&lt;code&gt;j = 1&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Let's try to understand, what this updating to &lt;strong&gt;&lt;code&gt;θ0&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;θ1&lt;/code&gt;&lt;/strong&gt; mean?
&lt;/h3&gt;

&lt;p&gt;The differential part of this equation i.e  determines whether we have to increment or decrement the value &lt;strong&gt;&lt;code&gt;θj&lt;/code&gt;&lt;/strong&gt;. If this differential is a positive value then &lt;strong&gt;&lt;code&gt;θj&lt;/code&gt;&lt;/strong&gt; is decremented and if this differential is a negative value then &lt;strong&gt;&lt;code&gt;θj&lt;/code&gt;&lt;/strong&gt; is decremented as it can be observed from the above equation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--AosJ7vJ_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/dfn8x4568fbun8ap78h6.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--AosJ7vJ_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/dfn8x4568fbun8ap78h6.jpg" alt="θ vs Cost function (J(θ))"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;center&gt;&lt;small&gt;θ vs Cost function (J(θ))&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;Now that we know if we have to increment or decrement &lt;strong&gt;&lt;code&gt;θj&lt;/code&gt;&lt;/strong&gt;, next we have to determine by how much &lt;strong&gt;&lt;code&gt;θj&lt;/code&gt;&lt;/strong&gt; should be changed.  This is what &lt;strong&gt;&lt;code&gt;α&lt;/code&gt;&lt;/strong&gt; or the learning rate indicates. Larger the &lt;strong&gt;&lt;code&gt;α&lt;/code&gt;&lt;/strong&gt; value, the larger is the updation for &lt;strong&gt;&lt;code&gt;θj&lt;/code&gt;&lt;/strong&gt; and vice versa. The value of &lt;strong&gt;&lt;code&gt;α&lt;/code&gt;&lt;/strong&gt; should not be too small as it will result in very slow convergence to the best fit line and it should not be too large as we might miss the values of &lt;strong&gt;&lt;code&gt;θj&lt;/code&gt;&lt;/strong&gt; which result in the best fit line. &lt;/p&gt;

&lt;p&gt;One set of updations of &lt;strong&gt;&lt;code&gt;θj&lt;/code&gt;&lt;/strong&gt; is called an iteration of Gradient Descent. &lt;/p&gt;

&lt;p&gt;This process of updation is repeated till the point where the cost function value remains largely unchanged. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--75WypS1o--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/fw1xjvobzuxol2ckrqvp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--75WypS1o--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/fw1xjvobzuxol2ckrqvp.png" alt="Cost function vs number of iterations"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After a sufficient number of iterations of gradient descent, we can visually check the performance the line by plotting it against the values in the dataset. If everything goes right, you should have a pretty decent line. You can now use this line equation to make predictions for any given &lt;strong&gt;&lt;code&gt;X&lt;/code&gt;&lt;/strong&gt; value (or the number of kilometers traveled).&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Pros&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Space complexity is very low it just needs to save the weights at the end of training. Hence it's a high latency algorithm&lt;/li&gt;
&lt;li&gt;Its very simple to understand&lt;/li&gt;
&lt;li&gt;Good interpretability&lt;/li&gt;
&lt;li&gt;Feature importance is generated at the time model building&lt;/li&gt;
&lt;li&gt;With the help of hyperparameter lambda, you can handle features selection hence we can achieve dimensionality reduction&lt;/li&gt;
&lt;li&gt;Small number of hyperparmeters&lt;/li&gt;
&lt;li&gt;Can be regularized to avoid overfitting and this is intuitive &lt;/li&gt;
&lt;li&gt;Lasso regression can provide feature importances&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Cons&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The algorithm assumes data is normally distributed in real but they are not&lt;/li&gt;
&lt;li&gt;Before building a model multi-collinearity should be avoided.&lt;/li&gt;
&lt;li&gt;Prone to outliers. &lt;/li&gt;
&lt;li&gt;Input data need to be scaled and there are a range of ways to do this.&lt;/li&gt;
&lt;li&gt;May not work well when the hypothesis function is non-linear.&lt;/li&gt;
&lt;li&gt;A complex hypothesis function is really difficult to fit. This can be done by using quadratic and higher order features, but the number of these grows rapidly with the number of original features and may become very computationally expensive.&lt;/li&gt;
&lt;li&gt;Prone to overfitting with a large number of features are present.&lt;/li&gt;
&lt;li&gt;May not handle irrelevant features &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So far so good, we have learned overview of Linear Regression our next post revolves around the math concepts involved in Linear Regression.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Read On 📝&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://dev.to/ml_scratch/mls-1-a-concepts-for-linear-regression-1n9f"&gt;MLS.1.a Concepts for Linear regression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/ml_scratch/mls-1-b-gradient-descent-in-linear-regression-53dl"&gt;MLS.1.b Gradient Descent in Linear regression&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Contributors
&lt;/h2&gt;

&lt;p&gt;This series is made possible by help from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pranav (&lt;a class="comment-mentioned-user" href="https://dev.to/devarakondapranav"&gt;@devarakondapranav&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Ram (&lt;a class="comment-mentioned-user" href="https://dev.to/r0mflip"&gt;@r0mflip&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Devika (&lt;a class="comment-mentioned-user" href="https://dev.to/devikamadupu1"&gt;@devikamadupu1&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Pratyusha(&lt;a class="comment-mentioned-user" href="https://dev.to/prathyushakallepu"&gt;@prathyushakallepu&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Pranay (&lt;a class="comment-mentioned-user" href="https://dev.to/pranay9866"&gt;@pranay9866&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Subhasri (&lt;a class="comment-mentioned-user" href="https://dev.to/subhasrir"&gt;@subhasrir&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Laxman (&lt;a class="comment-mentioned-user" href="https://dev.to/lmn"&gt;@lmn&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Vaishnavi(&lt;a class="comment-mentioned-user" href="https://dev.to/vaishnavipulluri"&gt;@vaishnavipulluri&lt;/a&gt;
)&lt;/li&gt;
&lt;li&gt;Suraj (&lt;a class="comment-mentioned-user" href="https://dev.to/suraj47"&gt;@suraj47&lt;/a&gt;
)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>scratch</category>
      <category>regression</category>
      <category>linear</category>
    </item>
    <item>
      <title>Introduction</title>
      <dc:creator>Suraj J</dc:creator>
      <pubDate>Tue, 12 Nov 2019 06:12:08 +0000</pubDate>
      <link>https://dev.to/ml_scratch/introduction-4di4</link>
      <guid>https://dev.to/ml_scratch/introduction-4di4</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;What is Machine Learning?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Machine Learning is an application of Artificial Intelligence(AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. This involves the task of learning from data with specific inputs to the machine.&lt;/p&gt;

&lt;p&gt;It’s important to understand what makes Machine Learning work and, thus, how it can be used in the future. This blog helps in understanding each concept of ML from basics and mathematics associated with it. Math concepts are the integral part of ML. &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What is ML-Scratch?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;ML-Scratch is an organisation that focuses on teaching machine learning algorithms from the primitive level. &lt;/p&gt;

&lt;p&gt;We provide detailed explanation of different concepts, such that one can code from the start (scratch, as we say) without using any imported functions.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Read On 📝&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://dev.to/ml_scratch/mls-1-linear-regression-1eo3"&gt;MLS.1 Linear Regression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/ml_scratch/mls-1-a-concepts-for-linear-regression-1n9f"&gt;MLS.1.a Concepts for Linear regression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/ml_scratch/mls-1-b-gradient-descent-in-linear-regression-53dl"&gt;MLS.1.b Gradient Descent in Linear regression&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
