<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nitin Kendre</title>
    <description>The latest articles on DEV Community by Nitin Kendre (@newbie_coder).</description>
    <link>https://dev.to/newbie_coder</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F495646%2F4b88cce5-b073-4046-8ee1-86f167a1c180.jpg</url>
      <title>DEV Community: Nitin Kendre</title>
      <link>https://dev.to/newbie_coder</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/newbie_coder"/>
    <language>en</language>
    <item>
      <title>Python : Linear search and Binary Search</title>
      <dc:creator>Nitin Kendre</dc:creator>
      <pubDate>Fri, 31 May 2024 19:13:54 +0000</pubDate>
      <link>https://dev.to/newbie_coder/python-linear-search-and-binary-search-2fbk</link>
      <guid>https://dev.to/newbie_coder/python-linear-search-and-binary-search-2fbk</guid>
      <description>&lt;p&gt;&lt;strong&gt;Practicing Python From Basics&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Linear Search:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Linear search, also known as sequential search, checks each element in a collection one by one until the target element is found or the end of the collection is reached.&lt;/li&gt;
&lt;li&gt;It's a simple but inefficient algorithm, especially for large datasets, as it has a &lt;strong&gt;time complexity of O(n)&lt;/strong&gt; in the worst case.&lt;/li&gt;
&lt;li&gt;Linear search is applicable to both sorted and unsorted collections.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Implementation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;linear_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Calling function
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;arr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;linear_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Element &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; found at index &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Element not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Element 3 found at index 4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  2nd calling
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;key1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;
&lt;span class="n"&gt;result1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;linear_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Element &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; found at index &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Element not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Element not found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;linear_search&lt;/code&gt; function takes a list &lt;code&gt;arr&lt;/code&gt; and a target value &lt;code&gt;key&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;It iterates through each element of the list using a for loop.&lt;/li&gt;
&lt;li&gt;For each element, it checks if it matches the target value.&lt;/li&gt;
&lt;li&gt;If a match is found, it returns the index of the element. If not found, it returns 0.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Binary Search
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Binary search is a more efficient algorithm for finding a target value within a sorted array.&lt;/li&gt;
&lt;li&gt;It repeatedly divides the search interval in half until the target is found or the interval is empty.&lt;/li&gt;
&lt;li&gt;Binary search has a &lt;strong&gt;time complexity of O(log n)&lt;/strong&gt;, making it significantly faster than linear search for large datasets.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;It requires the array to be sorted beforehand.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Implementation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;binary_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;=&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;mid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;mid&lt;/span&gt;

        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mid&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;

        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mid&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Calling Binary search
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;arr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;binary_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Element &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; found at index &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Element not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Element 12 found at index 5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  2nd Calling
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;binary_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Element &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; found at index &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Element not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Element not found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;binary_search&lt;/code&gt; function takes a sorted array &lt;code&gt;arr&lt;/code&gt; and a target value &lt;code&gt;key&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;It initializes &lt;code&gt;start&lt;/code&gt; and &lt;code&gt;end&lt;/code&gt; pointers to the start and end of the array, respectively.&lt;/li&gt;
&lt;li&gt;It repeatedly calculates the &lt;code&gt;mid&lt;/code&gt; index and compares the element at &lt;code&gt;mid&lt;/code&gt; with the &lt;code&gt;key&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Based on the comparison, it updates &lt;code&gt;start&lt;/code&gt; or &lt;code&gt;end&lt;/code&gt; pointers to narrow down the search interval.&lt;/li&gt;
&lt;li&gt;It continues until the target is found or the search interval is empty, returning the index of the target or 0 if not found.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>algorithms</category>
      <category>python</category>
      <category>learning</category>
    </item>
    <item>
      <title>Recursion : Python</title>
      <dc:creator>Nitin Kendre</dc:creator>
      <pubDate>Thu, 30 May 2024 18:45:03 +0000</pubDate>
      <link>https://dev.to/newbie_coder/recursion-python-1jeg</link>
      <guid>https://dev.to/newbie_coder/recursion-python-1jeg</guid>
      <description>&lt;h2&gt;
  
  
  Recursion
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Recursion is a programming technique where a function calls itself in order to solve a problem. &lt;/li&gt;
&lt;li&gt;The recursive approach breaks a problem down into smaller, more manageable sub-problems of the same type.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Components of Recursion:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Base Case:&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;The condition under which the recursion terminates.&lt;/li&gt;
&lt;li&gt;It prevents infinite recursion by providing a simple, non-recursive solution to the smallest instance of the problem.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Recursive Case:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;The part of the function where the function calls itself with a modified argument, gradually approaching the base case.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Types of Recursion:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Direct Recursion:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A function calls itself directly.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Indirect Recursion:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A function calls another function, which in turn calls the original function.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Example (Direct Recursion)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Factorial Function:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# base case
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Recursive case
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nf"&gt;factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Factorial of 5&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;fact&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;factorial of 5 is : &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fact&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;factorial of 5 is : 120
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;Here, &lt;code&gt;factorial(5)&lt;/code&gt; calls &lt;code&gt;factorial(4)&lt;/code&gt;, which calls &lt;code&gt;factorial(3)&lt;/code&gt;, and so on, until it reaches &lt;code&gt;factorial(1)&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example (Indirect Recursion)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# variable to count how many times function called
&lt;/span&gt;&lt;span class="n"&gt;function_call_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;functionA&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# Telling function's local space that i am using global variable.
&lt;/span&gt;    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;function_call_count&lt;/span&gt; 

    &lt;span class="c1"&gt;# Counting function call
&lt;/span&gt;    &lt;span class="n"&gt;function_call_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Printing from functionA().&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Base case to break the function call otherwise it will go infinitely calling functions.
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;function_call_count&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="c1"&gt;# function call
&lt;/span&gt;    &lt;span class="nf"&gt;functionB&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;functionB&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Printing from functionB().&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Function call
&lt;/span&gt;    &lt;span class="nf"&gt;functionA&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Calling functionA()
&lt;/span&gt;&lt;span class="nf"&gt;functionA&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Printing from functionA().
Printing from functionB().
Printing from functionA().
Printing from functionB().
Printing from functionA().
Printing from functionB().
Printing from functionA().
Printing from functionB().
Printing from functionA().
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Advantages of Recursion:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Simplifies the code for problems that can naturally be divided into similar sub-problems, like tree traversals, and certain mathematical computations (e.g., Fibonacci sequence).&lt;/li&gt;
&lt;li&gt;Provides a clear and straightforward approach to solve problems like backtracking and divide-and-conquer algorithms.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Disadvantages of Recursion:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Can lead to high memory usage due to the depth of the call stack, especially if the recursion depth is large.&lt;/li&gt;
&lt;li&gt;May result in slower performance compared to iterative solutions because of the overhead of multiple function calls.&lt;/li&gt;
&lt;li&gt;Risk of stack overflow if the base case is not properly defined or if the problem size is too large.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Tail Recursion:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A special form of recursion where the recursive call is the last operation in the function.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tail recursion can be optimized by the compiler to avoid increasing the call stack, converting the recursion into iteration internally.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tail_recursive_factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

    &lt;span class="c1"&gt;# Recursive case (only recursive case)
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;tail_recursive_factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;%%&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="nf"&gt;tail_recursive_factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CPU times: total: 0 ns&lt;br&gt;
Wall time: 0 ns

&lt;p&gt;120&lt;br&gt;
&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h5&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  &lt;em&gt;Most Important Python Does not optimize tail recursion. Some languages does not optimizes the tail recursion, and python is one of them.&lt;/em&gt;&lt;br&gt;
&lt;/h5&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Recursion is a powerful tool in programming, enabling elegant solutions for complex problems by breaking them down into simpler sub-problems. &lt;/li&gt;
&lt;li&gt;However, it requires careful implementation to manage memory and performance effectively.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>python</category>
      <category>programming</category>
      <category>coding</category>
    </item>
    <item>
      <title>Unleashing the Potential of Random Forest Regression : A Python Implementation Guide with Hyperparameter Tuning.</title>
      <dc:creator>Nitin Kendre</dc:creator>
      <pubDate>Fri, 09 Jun 2023 11:19:23 +0000</pubDate>
      <link>https://dev.to/newbie_coder/unleashing-the-potential-of-random-forest-regression-a-python-implementation-guide-with-hyperparameter-tuning-167o</link>
      <guid>https://dev.to/newbie_coder/unleashing-the-potential-of-random-forest-regression-a-python-implementation-guide-with-hyperparameter-tuning-167o</guid>
      <description>&lt;p&gt;In the field of machine learning, regression is a widely used technique for predicting continuous numerical values. And random forest regression is most versatile and effective algorithm in regression. In this article we will learn how to implement random forest regression using python language.&lt;/p&gt;

&lt;h2&gt;
  
  
  Random Forest Regressor
&lt;/h2&gt;

&lt;p&gt;Random Forest Regressor is an ensemble learning algorithm which combines decision trees and the concept of randomness. It is belongs to the supervised learning algorithm family. While working on data this algorithm create multiple decision trees and combines the predictions of all trees to give final output.&lt;/p&gt;

&lt;p&gt;The name &lt;code&gt;Random Forest&lt;/code&gt; comes from two concepts: &lt;code&gt;Randomness and forests&lt;/code&gt;. Forest meaning collection of trees, which this model creates by generating multiple decision trees and combining them all.&lt;/p&gt;

&lt;p&gt;In decision tree algorithm all data is used to create only one tree and predict using it. But in random forest algorithm creates trees using input given by user &lt;code&gt;n_estimators&lt;/code&gt; and predicts or gives output by combining output from all these trees.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Random Forest Regressor Works
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Preparation&lt;/strong&gt; :&lt;br&gt;
First step is preparing the training data. Each data point should be represented by a set of features and a corresponding numerical target value.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Generating Trees&lt;/strong&gt; : &lt;br&gt;
In this step algorithm creates trees according the input given by user by parameter &lt;code&gt;n_estimators&lt;/code&gt;. Each tree is built using a small random group of data from training dataset. This group of data is choose randomly and can be used many times.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Growing The Trees&lt;/strong&gt; :&lt;br&gt;
Each tree is build using smaller groups of data which are created by using chosen features. We do this process until we can not divide the data further or we met certain condition. Our aim is to make sure that the values we want to predict are as similar as possible within each smaller group.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Combining Predictions&lt;/strong&gt; :&lt;br&gt;
After Building all the trees, we need to combine the predictions. We do this by adding predictions from all trees and taking average. This is called aggregation. Let, if we are trying to predict a number, like the price of house, we usually take the average of all the predicted prices from the trees. &lt;br&gt;
This combined prediction gives us more accurate result rather than relying on just one tree. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Missing Values&lt;/strong&gt; :&lt;br&gt;
Random forest regressor can handle missing values using averaging approach. During prediction process, each decision tree separately handles the missing values and makes prediction.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Hyper Parameter Tuning
&lt;/h2&gt;

&lt;p&gt;Hyper parameters controls the behavior of algorithm and these parameters should be set before learning or training process. Tuning these parameters can impact the performance of the model. We will discuss here two important hyper parameters and their tuning.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;n_estimators&lt;/code&gt; : This parameter decides the number of decision tress in random forest. By increasing the number of trees the performance of model can be improved but also computational complexity is also increased. We can choose different values and final the one values which gives best results.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;max_depth&lt;/code&gt; : It represents the maximum depth of decision tree. A deeper tree can capture more complex relationships in the data but can also lead to overfitting. By tuning this parameter, we can find the right balance between model complexity and generalization.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To tune these parameters we can use Grid Search, Random Search, or Bayesian Optimization. &lt;/p&gt;

&lt;p&gt;Grid Search involves deciding the set of values for each hyper parameter and exhaustively evaluating all possible combinations.&lt;/p&gt;

&lt;p&gt;Random Search randomly samples combinations of hyperparameters and evaluate their performance.&lt;/p&gt;

&lt;p&gt;Bayesian Optimization uses a probabilistic model to search for promising hyperparameters.&lt;/p&gt;

&lt;p&gt;For brief explanation and more information on hyper parameter tuning you can refer this &lt;a href="https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74" rel="noopener noreferrer"&gt;Link&lt;/a&gt; &lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation of Random Forest Regressor using Python
&lt;/h2&gt;

&lt;p&gt;To implement random forest regression we will use &lt;code&gt;sklearn&lt;/code&gt; library, which provides different set of tools for machine learning tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1 : Importing Necessary Libraries
&lt;/h3&gt;

&lt;p&gt;First we will import necessary libraries for loading and manipulating the data.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;matplotlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Step 2 : Loading the data
&lt;/h3&gt;

&lt;p&gt;We will use salary data for this examples.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;sal_d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Position_Salaries.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Our data will look like this -&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnp6epidvt39uzi7a4zxx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnp6epidvt39uzi7a4zxx.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3 : Creating x and y variables
&lt;/h3&gt;

&lt;p&gt;x will be the independent variable and y will be the dependent variable.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sal_d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sal_d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Step 4 : Hyper Parameter Tuning
&lt;/h3&gt;

&lt;p&gt;In this step we will use &lt;code&gt;GridSearchCV&lt;/code&gt; for hyper parameter tuning.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="c1"&gt;## Importing required libraries.
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.ensemble&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RandomForestRegressor&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GridSearchCV&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mean_squared_error&lt;/span&gt;

&lt;span class="c1"&gt;## Defining param grid 
&lt;/span&gt;&lt;span class="n"&gt;param_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;n_estimators&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;max_depth&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;## Creating a random forest regressor object
&lt;/span&gt;&lt;span class="n"&gt;rf_regressor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RandomForestRegressor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Perform grid search for hyperparameter tuning
&lt;/span&gt;&lt;span class="n"&gt;grid_search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GridSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rf_regressor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;param_grid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cv&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scoring&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;neg_mean_squared_error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;grid_search&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Getting the best parameters and best score
&lt;/span&gt;&lt;span class="n"&gt;best_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;grid_search&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_params_&lt;/span&gt;
&lt;span class="n"&gt;best_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;grid_search&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_score_&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Best Parameters:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;best_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Best Score (MSE):&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;best_score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Step 5 : Training the Model
&lt;/h3&gt;

&lt;p&gt;In this step we will use above tuned parameters and whole dataset to train our model. &lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;rf_regressor_best&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RandomForestRegressor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_estimators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;best_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;n_estimators&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;best_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;max_depth&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Above code will create a model using tuned parameter in previous step.&lt;br&gt;
Below is the code to fit the whole dataset.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;p&gt;&lt;span class="c1"&gt;# Fiting the model to the training data using the best parameters&lt;br&gt;
&lt;/span&gt;&lt;span class="n"&gt;rf_regressor_best&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;/p&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Step 6 : Predictions.&lt;br&gt;
&lt;/h3&gt;

&lt;p&gt;In this step we will predict for unique input which will not be in dataset.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;p&gt;&lt;span class="n"&gt;rf_regressor_best&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt;&lt;span class="mf"&gt;6.5&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;&lt;/p&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Conclusion&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;By using all these steps anyone can implement random forest regressor using python. Also there are more parameters than 2, by tuning these parameters we can improve our model more.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;All these steps are done by me in python also this theory information is from internet and some udemy course. which I wrote in my words. If there is mistake above then please comment.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Decision Tree Regression : A Comprehensive Guide with Python Code Examples and Hyperparameter Tuning</title>
      <dc:creator>Nitin Kendre</dc:creator>
      <pubDate>Mon, 05 Jun 2023 06:51:00 +0000</pubDate>
      <link>https://dev.to/newbie_coder/decision-tree-regression-a-comprehensive-guide-with-python-code-examples-and-hyperparameter-tuning-1f0f</link>
      <guid>https://dev.to/newbie_coder/decision-tree-regression-a-comprehensive-guide-with-python-code-examples-and-hyperparameter-tuning-1f0f</guid>
      <description>&lt;p&gt;Decision Tree regression is popular and powerful algorithm in regression. But to get full potential of this algorithm you have to Hyperparameter Tuning. Means you have to choose some parameters that can best fit the data and predict correctly. In this article we will focus on implementation mainly using python. Also we will learn some hyperparameter tuning techniques.&lt;/p&gt;

&lt;p&gt;For more information on Decision tree Regression you can refer to this blog by Ashwin Prasad - &lt;a href="https://medium.com/analytics-vidhya/regression-trees-decision-tree-for-regression-machine-learning-e4d7525d8047" rel="noopener noreferrer"&gt;Link&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision Tree Regression
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;Decision Tree Regression&lt;/code&gt; builds a tree like structure by splitting the data based on the values of various features. Simply it creates different subsets of data. For prediction of new sample or data, average value of target variable from leaf node is used. It handles both categorical and continues variables, making it versatile algorithm for regression tasks.&lt;/p&gt;

&lt;p&gt;But in some libraries of python like &lt;code&gt;sklearn&lt;/code&gt; categorical variable can not be handled by decision tree regression. So we have to encode it using any encoder method, according to data or model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation Using Python
&lt;/h2&gt;

&lt;p&gt;We will use &lt;code&gt;sklearn&lt;/code&gt; library from python for implementation.&lt;br&gt;
First step will import necessary libraries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next we will read the dataset. In this article we will be using simple salary dataset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;## Reading the data
&lt;/span&gt;&lt;span class="n"&gt;sal_d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Position_Salaries.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;sal_d&lt;/span&gt; &lt;span class="c1"&gt;## this line will print the data in jupyter notebook
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our data will look like -&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvinucn08d4dpe4h2mcb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvinucn08d4dpe4h2mcb.png" alt="Salary Data"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we will create dependent and independent variable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sal_d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sal_d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In next step we will train our regression model using above variables. Our data is so small, so we are training our model on entire data. &lt;/p&gt;

&lt;p&gt;There are various steps in data preparing or preprocessing, you can refer those all steps in article - &lt;a href="https://dev.to/newbie_coder/7-essential-techniques-for-data-preprocessing-using-python-a-guide-for-data-scientists-3ijk"&gt;Link&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Below is the code for training our model -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="c1"&gt;## Importing 
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.tree&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DecisionTreeRegressor&lt;/span&gt;

&lt;span class="c1"&gt;## Creating model
&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DecisionTreeRegressor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## training our model
&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we will make a prediction using created model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt;&lt;span class="mf"&gt;6.5&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output of above -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight make"&gt;&lt;code&gt;&lt;span class="err"&gt;array([150000.])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we will visualize the prediction of our model. For higher resolution we will create x_grid. Which plot the line smooth.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;## For smooth line
&lt;/span&gt;&lt;span class="n"&gt;x_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## this will plot points on chart
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;red&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## this will plot the line connecting to points
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;blue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## This will give title to our plot
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;actual vs predict&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## this will give label to x axis
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;### this will give label to y axis
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;salary&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## This line will save our plot as image on our computer
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;savefig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;decision_tree_regression.png&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;bbox_inches&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tight&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## And this line is for showing the chart and ending.
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output of above plot code is below - &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuuas2dr99gh9o3j5qghe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuuas2dr99gh9o3j5qghe.png" alt="Decision Tree Regression Plot"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Hyperparameter Tuning
&lt;/h2&gt;

&lt;p&gt;Hyperparamter Tuning means we have to select the best values for parameters of algorithm in machine learning. It includes searching and evaluating different combinations of parameters to maximize the performance of model.&lt;/p&gt;

&lt;p&gt;To enhance the performance of decision tree regression we can tune its parameters using methods in &lt;code&gt;library&lt;/code&gt; like &lt;code&gt;GridSearchCV&lt;/code&gt; and &lt;code&gt;RandomizedSearchCV&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Grid Search
&lt;/h3&gt;

&lt;p&gt;Grid search is a method to find the best set of values for different options by trying out all possible combinations.&lt;/p&gt;

&lt;p&gt;Below is the code for implementing &lt;code&gt;GridSearchCV&lt;/code&gt; -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="c1"&gt;## importing class from library
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GridSearchCV&lt;/span&gt;

&lt;span class="c1"&gt;## Setting optimum values for parameters.
&lt;/span&gt;&lt;span class="n"&gt;param_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;max_depth&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;min_samples_split&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;min_samples_leaf&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;## creating instance
&lt;/span&gt;&lt;span class="n"&gt;grid_search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GridSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;DecisionTreeRegressor&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;param_grid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cv&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## fitting data 
# grid_search.fit(x, y)
&lt;/span&gt;
&lt;span class="n"&gt;grid_search&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## getting best parameters 
&lt;/span&gt;&lt;span class="n"&gt;best_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;grid_search&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_params_&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Best Hyperparameters:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;best_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using these parameters we can train our model and enhance the performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Randomized Search
&lt;/h3&gt;

&lt;p&gt;Randomized search is a way to find the best values for different parameters by randomly trying out a subset of possible combinations, which makes the search process faster.&lt;/p&gt;

&lt;p&gt;Below is the code for implementing &lt;code&gt;RandomizedSearchCV&lt;/code&gt; -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="c1"&gt;## importing class from library
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RandomizedSearchCV&lt;/span&gt;

&lt;span class="c1"&gt;## Setting optimum values for parameters.
&lt;/span&gt;&lt;span class="n"&gt;param&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;max_depth&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;min_samples_split&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;min_samples_leaf&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;## creating instance
&lt;/span&gt;&lt;span class="n"&gt;random_search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RandomizedSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;DecisionTreeRegressor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;param&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_iter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cv&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## fitting data
&lt;/span&gt;&lt;span class="n"&gt;random_search&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## getting best parameters 
&lt;/span&gt;&lt;span class="n"&gt;best_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random_search&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_params_&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Best Hyperparameters:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;best_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also using these parameters we can enhance performance of our model.&lt;/p&gt;

&lt;p&gt;From these two methods &lt;code&gt;RandomizedSearchCV&lt;/code&gt; is faster. Because &lt;code&gt;GridSearchCV&lt;/code&gt; apply parameters on all data and check one by one which is suitable.&lt;/p&gt;

&lt;p&gt;Also according to size of dataset it is decided which is faster.&lt;/p&gt;

&lt;p&gt;There are various methods for search best parameters to model. But these two I personally implemented, so explained it here as learned. &lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article we learned how to implement decision tree regression using python. Also we learned some techniques for &lt;em&gt;hyperparameter tuning&lt;/em&gt; like &lt;code&gt;GridSearchCV&lt;/code&gt; and &lt;code&gt;RandomizedSearchCV&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;All code implementations done by me. So if anyone finds a mistake in it please comment it down.&lt;/em&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  Thank You!
&lt;/h3&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Support Vector Regression (SVR) Using Python : A practical approach to Predictive Modeling</title>
      <dc:creator>Nitin Kendre</dc:creator>
      <pubDate>Fri, 02 Jun 2023 07:57:59 +0000</pubDate>
      <link>https://dev.to/newbie_coder/support-vector-regression-svr-using-python-a-practical-approach-to-predictive-modeling-5e6b</link>
      <guid>https://dev.to/newbie_coder/support-vector-regression-svr-using-python-a-practical-approach-to-predictive-modeling-5e6b</guid>
      <description>&lt;h2&gt;
  
  
  Introduction :
&lt;/h2&gt;

&lt;p&gt;Support Vector Regression (SVR) is a powerful algorithm used to solve regression problems. It is a part of Support Vector Machines (SVM) which is used for nonlinear relationships between variables.&lt;/p&gt;

&lt;p&gt;In this article we will learn how to implement it using python language.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding SVR :
&lt;/h2&gt;

&lt;p&gt;The goal of SVR is to find the hyperplane that best fits the data points, while allowing a margin of tolerance for errors. While traditional regression models focus on minimizing the errors, SVR focuses on data points within a specific margin. SVR Operates on the premise that only support vectors and the data point close to the margin, which significantly affects the model's performance.&lt;/p&gt;

&lt;p&gt;For more information on SVR you can refer this blog post &lt;a href="https://towardsdatascience.com/an-introduction-to-support-vector-regression-svr-a3ebc1672c2"&gt;LINK&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementing SVR Using Python :
&lt;/h2&gt;

&lt;p&gt;We will implement SVR algorithm using &lt;code&gt;sklearn&lt;/code&gt; library from &lt;code&gt;pyhton&lt;/code&gt; language. &lt;/p&gt;

&lt;p&gt;Below are the steps of implementation -&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1 : Importing Necessary Libraries
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplolib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2 : Loading the Dataset in preparing it
&lt;/h3&gt;

&lt;p&gt;We have discussed all data preprocessing steps previous article. You can refer that for more tools to data preparation &lt;a href="https://dev.to/newbie_coder/7-essential-techniques-for-data-preprocessing-using-python-a-guide-for-data-scientists-3ijk"&gt;LINK&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="n"&gt;salary_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Position_Salaries.csv'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## Below line will print first 10 rows from data.
&lt;/span&gt;&lt;span class="n"&gt;salary_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  creating Variables
&lt;/h4&gt;

&lt;p&gt;we will create dependent and independent variables in this step.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;## Creating independent variable
&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;salary_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;

&lt;span class="c1"&gt;## Creating dependent variable
&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;salary_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3 : Feature Scaling
&lt;/h2&gt;

&lt;p&gt;It is an important step in machine learning that brings all features in similar scales. It ensures that no single feature dominates the learning process due to differences in their magnitude. By scaling features, algorithms can converge faster and perform better, leading to more accurate and reliable models.&lt;/p&gt;

&lt;p&gt;Below is the code example to do feature scaling.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StandardScaler&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StandardScaler&lt;/span&gt;
&lt;span class="n"&gt;x_sc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;StandardScaler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;y_sc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;StandardScaler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here y variable is a 1D array. But &lt;code&gt;StandardScaler&lt;/code&gt; takes 2D array. So we have to reshape it. Below is the code for reshaping&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;y_res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Below code fits and transforms the x and y variable into similar scale.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="n"&gt;x_scld&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x_sc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y_scld&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_sc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4 : Training The SVR Model
&lt;/h3&gt;

&lt;p&gt;In this step we will use above scaled data to train our model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;## Importing SVR
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.svm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SVR&lt;/span&gt;

&lt;span class="c1"&gt;## Creating SVR model
&lt;/span&gt;&lt;span class="n"&gt;svr_rbf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SVR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kernel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'rbf'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;## here `rbf` stands for Radial Basis Function kernel
&lt;/span&gt;
&lt;span class="c1"&gt;## Training SVR model
&lt;/span&gt;&lt;span class="n"&gt;svr_rbf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_scld&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_sclf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here &lt;code&gt;rbf&lt;/code&gt; is one of the kernel from SVR model. For more learning you can refer this &lt;a href="https://www.geeksforgeeks.org/support-vector-regression-svr-using-linear-and-non-linear-kernels-in-scikit-learn/"&gt;Link&lt;/a&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5 : Predicting New Result
&lt;/h3&gt;

&lt;p&gt;In this step we predict a random value using random input.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_sc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inverse_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;svr_rbf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_sc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt;&lt;span class="mf"&gt;6.5&lt;/span&gt;&lt;span class="p"&gt;]])).&lt;/span&gt;&lt;span class="n"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here &lt;code&gt;inverse_transform()&lt;/code&gt; method will transform scaled data into original format.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6 : Visualizing the result and data with Smooth Curve
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;## This line will create a array using min and max value of data with difference of 0.1
&lt;/span&gt;&lt;span class="n"&gt;x_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## This line will reshape above 1D array to 2D array.
&lt;/span&gt;&lt;span class="n"&gt;x_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## This line will plot the scatter plot for x and y variable
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'red'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## This line will connect the points using curve line
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_sc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inverse_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;svr_rbf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_sc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_grid&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="n"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;&lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'blue'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## This line will give the title to our plot
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'actual vs true (SVR) smooth'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## this line label the x axis
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Level'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## this line will label the y axis
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Salary'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## This line will help to show the plot
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Above all steps are implemented personally by me. I learned them online. So if anyone finds any mistake please comment down. I will be happy to edit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion :
&lt;/h2&gt;

&lt;p&gt;SVR combined with data preprocessing can provide accurate predictions for regression tasks. By following above all steps anyone can implement it using python language.&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>productivity</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Polynomial Regression with Python: A Flexible Approach for Non-Linear Curve Fitting</title>
      <dc:creator>Nitin Kendre</dc:creator>
      <pubDate>Thu, 01 Jun 2023 14:25:50 +0000</pubDate>
      <link>https://dev.to/newbie_coder/polynomial-regression-with-python-a-flexible-approach-for-non-linear-curve-fitting-4j7h</link>
      <guid>https://dev.to/newbie_coder/polynomial-regression-with-python-a-flexible-approach-for-non-linear-curve-fitting-4j7h</guid>
      <description>&lt;p&gt;Polynomial Regression is a powerful technique, which helps us to model the relationship between dependent and independent variable. Independent variables can be multiple (one or many). It extends the concept of Simple or multiple Linear Regression by allowing for more flexible curve fitting.&lt;/p&gt;

&lt;p&gt;In this article we will see some theory behind it and How to implement it using python.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Understanding the Polynomial Regression
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Introduction to Polynomial Regression :
&lt;/h3&gt;

&lt;p&gt;The goal or aim of polynomial regression is to fit a polynomial equation to a given set of data points.&lt;/p&gt;

&lt;p&gt;Below is the polynomial equation :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight make"&gt;&lt;code&gt;
&lt;span class="nv"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; b0 + b1 &lt;span class="k"&gt;*&lt;/span&gt; x + b2 &lt;span class="k"&gt;*&lt;/span&gt; x^2 + ... + bn &lt;span class="k"&gt;*&lt;/span&gt; x^n

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;where,&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;y&lt;/code&gt; is the dependent variable.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;x&lt;/code&gt; is the independent variable.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;b1, b2, ..., bn&lt;/code&gt; are the coefficients of polynomial terms.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;n&lt;/code&gt; denotes the degree of polynomial&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Advantages and Limitations :
&lt;/h3&gt;

&lt;p&gt;Polynomial Regression offers several advantages over simple Linear Regression -&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Flexibility&lt;/code&gt; : By using polynomial terms, we can capture non-linear relationships between variables that can not be represented by a straight line.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Higher Order Trends&lt;/code&gt; : Polynomial Regression can capture Higher order trends in data which can be used for more accurate predictions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Interpretability&lt;/code&gt; : The polynomial Terms provide insights into the relationships between variables.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But Polynomial Regression also have some limitations -&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Overfitting&lt;/code&gt; : Using high-degree polynomials can easily cause the training data to overfit. Which results in poor generalization to unseen data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Computational Complexity&lt;/code&gt; : When we increase the degree of polynomial then complexity of the regression model also increases with the degree. Which makes it more computationally expensive.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  2. Implementing Polynomial Regression using Python :
&lt;/h2&gt;

&lt;p&gt;We will use &lt;code&gt;sklearn&lt;/code&gt; library from python language to implement Polynomial Regression. &lt;code&gt;sklearn&lt;/code&gt; provides a simple and efficient way to build machine learning models. &lt;/p&gt;

&lt;p&gt;Below are the steps to implement the Polynomial Regression -&lt;/p&gt;

&lt;h3&gt;
  
  
  step 1 : Data Preparation
&lt;/h3&gt;

&lt;p&gt;In this step, we will load the data and preprocess it for training our model. This step also ensure that data should in required format.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="c1"&gt;## Importing Libraries
&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;## Loading the dataset
&lt;/span&gt;
&lt;span class="n"&gt;sal_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Position_Salaries.csv"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;## creating dependent and independent variables
&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sal_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sal_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  step 2 : Feature Engineering
&lt;/h2&gt;

&lt;p&gt;In this step we will generate the polynomial terms by transforming our independent variables into polynomial terms.&lt;/p&gt;

&lt;p&gt;Below is the code for generating polynomial terms -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PolynomialFeatures&lt;/span&gt;
&lt;span class="n"&gt;pf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PolynomialFeatures&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;degree&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x_poly&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  step 3 : Model Training
&lt;/h3&gt;

&lt;p&gt;In this step we will train our model using above generated polynomial terms.&lt;/p&gt;

&lt;p&gt;Below is the code for training -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;
&lt;span class="n"&gt;lr2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;lr2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_poly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  step 4 : Evaluating the model
&lt;/h3&gt;

&lt;p&gt;We will use r_squared method which is in sklearn library to evaluate the model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;r2_score&lt;/span&gt;
&lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lr2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_poly&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;r2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r2_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"r_squared : "&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;r2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  step 5 : Visualizing results from Polynomial Regression
&lt;/h3&gt;

&lt;p&gt;This step will visualize the predicted values and original values on the line plot/scatter plot.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'red'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;lr2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_poly&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'blue'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'level'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Salary'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Polynomial'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These all steps I personally learned and implemented it using python. So if you find any mistake please comment down.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion :
&lt;/h2&gt;

&lt;p&gt;Remember to carefully choose the degree of the polynomial to avoid overfitting and consider the computational complexity of higher degree polynomials.&lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Mastering Multiple Linear Regression: A Step-by-Step Implementation Guide with Python Code Examples</title>
      <dc:creator>Nitin Kendre</dc:creator>
      <pubDate>Wed, 31 May 2023 10:16:28 +0000</pubDate>
      <link>https://dev.to/newbie_coder/mastering-multiple-linear-regression-a-step-by-step-implementation-guide-with-python-code-examples-16in</link>
      <guid>https://dev.to/newbie_coder/mastering-multiple-linear-regression-a-step-by-step-implementation-guide-with-python-code-examples-16in</guid>
      <description>&lt;h2&gt;
  
  
  Introduction :
&lt;/h2&gt;

&lt;p&gt;Multiple Linear Regression is a statistical model used to find relationship between dependent variable and multiple independent variables. This model helps us to find how different variables contribute to outcome or predictions. In this article we will see how to implement it using python language from data preparation to model evaluation.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Understanding Multiple Linear Regression :
&lt;/h2&gt;

&lt;p&gt;In simple linear regression only one independent and dependent variables are there. So Multiple Linear Regression extends this capacity of simple linear regression. Means there can many number of independent variables in Multiple Linear Regression.&lt;/p&gt;

&lt;p&gt;General Equation for Multiple Linear Regression is as follow -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight make"&gt;&lt;code&gt;&lt;span class="nv"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; β₀ + β₁x₁ + β₂x₂ + ... + βₚxₚ + ɛ
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;where,&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;y&lt;/code&gt; is the dependent variable&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;β₀&lt;/code&gt; is the intercept (means value of y when value of x is zero)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;β₁, β₂, ..., βₚ&lt;/code&gt; are the coefficients.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;x₁, x₂, ..., xₚ&lt;/code&gt; are the independent variables.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ɛ&lt;/code&gt; represents error terms.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  2. Data Preparation :
&lt;/h2&gt;

&lt;p&gt;It is the fundamental step in any Machine Learning Model. Because before feeding to model data should be clean, without any missing values, and all values should be in numeric. &lt;/p&gt;

&lt;p&gt;Below are some code examples -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;## Importing Libraries
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;## Loading Data
&lt;/span&gt;&lt;span class="n"&gt;m_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"50_Startups.csv"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;## Checking for missing values
&lt;/span&gt;&lt;span class="n"&gt;m_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isnull&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;## Creating dependent and independent variables.
&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Encoding Categorical Variables :
&lt;/h3&gt;

&lt;p&gt;It is necessary to encode categorical values in the form of numbers. Because model don't accepts categorical values like string, characters etc. In this article we will be using &lt;code&gt;one hot encoding&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Refer &lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html"&gt;link&lt;/a&gt; for more information on &lt;code&gt;one hot encoder&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.compose&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ColumnTransformer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OneHotEncoder&lt;/span&gt;
&lt;span class="n"&gt;ct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ColumnTransformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transformers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="s"&gt;'encoder'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OneHotEncoder&lt;/span&gt;&lt;span class="p"&gt;(),[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;])],&lt;/span&gt;&lt;span class="n"&gt;remainder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'passthrough'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Splitting Dataset into training and test set :
&lt;/h3&gt;

&lt;p&gt;By splitting the dataset into training and test set we can train our model using training set and evaluate our model using test set. So this is also an important step if you don't have testing dataset seperately.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.model_selectiom&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;
&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more data preparation tools refer this &lt;a href="https://dev.to/newbie_coder/7-essential-techniques-for-data-preprocessing-using-python-a-guide-for-data-scientists-3ijk"&gt;Link&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Model Training and Evaluation :
&lt;/h2&gt;

&lt;p&gt;Now in this step we will train our multiple linear regression model using training set and evaluate it using test set.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model Training -
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;
&lt;span class="n"&gt;mlr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;mlr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Predicting -
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mlr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Model Evaluation -
&lt;/h3&gt;

&lt;p&gt;we will use &lt;code&gt;mean squared error (MSE)&lt;/code&gt; for evaluating our model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mean_squared_error&lt;/span&gt;
&lt;span class="n"&gt;mse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To learn more on &lt;code&gt;Mean Squared Error (MSE)&lt;/code&gt; refer this &lt;a href="https://www.freecodecamp.org/news/machine-learning-mean-squared-error-regression-line-c7dde9a26b93/"&gt;Link&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Interpreting Coefficients :
&lt;/h2&gt;

&lt;p&gt;The coefficients in multiple linear regression equation determines the relationship between each independent and dependent variable. If value of coefficient is positive then it shows positive relationship between two variables, while negative value shows negative relationship.&lt;/p&gt;

&lt;p&gt;Below is the python code which will generate a data frame which will contains columns name and coefficient value of that column from data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;coefficients&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"Variable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;m_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;"Coefficient"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;mlr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;coef_&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;coefficients&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Model Evaluation using plots :
&lt;/h2&gt;

&lt;p&gt;We can use various diagnostic plots to evaluate the performance of model or diagnose any issue.&lt;/p&gt;

&lt;p&gt;Below is the python code for plot -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pytplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="c1"&gt;# residuals vs Predicted values
&lt;/span&gt;&lt;span class="n"&gt;residuals&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;residuals&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;axhline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'r'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'-'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Predicted Values"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Residuals"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Residuals vs. Predicted Values"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are different diagnostic plans for models. You can refer below links for more information.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://robert-alvarez.github.io/2018-06-04-diagnostic_plots/"&gt;Creating Diagnostic Plots in Python&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kirenz.com/post/2021-11-14-linear-regression-diagnostics-in-python/linear-regression-diagnostics-in-python/"&gt;Linear regression diagnostics in Python&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion :
&lt;/h2&gt;

&lt;p&gt;We have learned implementation of &lt;code&gt;Multiple Linear Regression&lt;/code&gt; using python. Also with that we learned data preparation, model evaluation etc.&lt;/p&gt;

&lt;p&gt;So by following these steps and using code examples provided, you can easily implement &lt;code&gt;Multiple Linear Regression&lt;/code&gt; in your own projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  References :
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;sklearn: Linear Regression Documentation - &lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html"&gt;Link&lt;/a&gt;. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;sklearn: Mean Squared Error Documentation - &lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html"&gt;Link&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Seaborn Documentation for Various Diagnostic plots - &lt;a href="https://seaborn.pydata.org/"&gt;Link&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matplotlib Documentation for Visualization - &lt;a href="https://matplotlib.org/"&gt;Link&lt;/a&gt;. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>productivity</category>
    </item>
    <item>
      <title>From Data to Prediction : Mastering Simple Linear Regression with python</title>
      <dc:creator>Nitin Kendre</dc:creator>
      <pubDate>Tue, 30 May 2023 10:30:00 +0000</pubDate>
      <link>https://dev.to/newbie_coder/from-data-to-prediction-mastering-simple-linear-regression-with-python-146p</link>
      <guid>https://dev.to/newbie_coder/from-data-to-prediction-mastering-simple-linear-regression-with-python-146p</guid>
      <description>&lt;p&gt;Linear Regression is an essential statistical method used to analyze the correlation between two variables. In this article we will study the concept of &lt;em&gt;simple linear regression&lt;/em&gt; and implement it using python language.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Introduction to Simple Linear Regression :
&lt;/h3&gt;

&lt;p&gt;Simple Linear Regression is a statistical method which helps us to find the relation between two variables. It mainly focuses on the exploration of the connection between a dependent variable (which we are predicting) and a independent variable ( which is used to predict ).&lt;/p&gt;

&lt;p&gt;The equation for simple linear regression is follow :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;y = b0 + b1 * x + e
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where,&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;y&lt;/code&gt; is the dependent variable.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;x&lt;/code&gt; is the independent variable.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;b0&lt;/code&gt; is the intercept ( means value of &lt;code&gt;y&lt;/code&gt; when value of &lt;code&gt;x&lt;/code&gt; is zero ).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;b1&lt;/code&gt; is the slope.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;e&lt;/code&gt; is the error.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The mail goal of simple linear regression is finding the values of &lt;code&gt;b0&lt;/code&gt; and &lt;code&gt;b1&lt;/code&gt; which best fits the data.&lt;/p&gt;

&lt;p&gt;For more theoretical knowledge refer &lt;a href="https://www.jmp.com/en_in/statistics-knowledge-portal/what-is-regression.html"&gt;this&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Data preprocessing :
&lt;/h3&gt;

&lt;p&gt;Data preprocessing is an essential step in any data analysis task. It includes data cleaning, transforming and preparing raw data for further analysis. &lt;/p&gt;

&lt;p&gt;In previous blog post we have discussed it in detail. Please refer :&lt;br&gt;
&lt;em&gt;&lt;a href="https://dev.to/newbie_coder/7-essential-techniques-for-data-preprocessing-using-python-a-guide-for-data-scientists-3ijk"&gt;7 Essential Techniques for Data Preprocessing Using Python: A Guide for Data Scientists&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Implementing Simple Linear Regression using python :
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Step 1 : Importing Necessary Libraries
&lt;/h4&gt;

&lt;p&gt;First step in any code always includes importing the necessary libraries for further analysis.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;seaborn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 2 : Loading and preprocessing the data
&lt;/h4&gt;

&lt;p&gt;For this model we will using dataset of students which includes their &lt;em&gt;study hours&lt;/em&gt; and their &lt;em&gt;exam scores&lt;/em&gt;. This data is stored in a csv file named &lt;code&gt;student_scores.csv&lt;/code&gt; .&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;

&lt;span class="c1"&gt;# Loading the data in code
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"student_scores.csv"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Checking for missing values in data
&lt;/span&gt;&lt;span class="n"&gt;missing_values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isnull&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Handling missing values
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inplace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Handling Outliers
&lt;/span&gt;&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;boxplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Hours'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zscore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Hours'&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nb"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="c1"&gt;# Encoding the categorical Variables
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_dummies&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Category'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 3 : Creating dependent and independent variables
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;  &lt;span class="c1"&gt;# independent variable
&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;   &lt;span class="c1"&gt;# dependent variable
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 4 : Splitting dataset in training and test set
&lt;/h4&gt;

&lt;p&gt;Splitting dataset can help in evaluating the model. Means we can test it using test set if it is trained well or not.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;
&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;x_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 5 : Fitting the Regression model on data
&lt;/h4&gt;

&lt;p&gt;There is class named &lt;code&gt;LinearRegression&lt;/code&gt; in &lt;code&gt;sklearn&lt;/code&gt; library which is used to fit the model on data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;
&lt;span class="n"&gt;regressor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;regressor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 6 : Predicting
&lt;/h4&gt;

&lt;p&gt;After fitting the model we can use that to make predictions on new data. Let's we test the model by predicting the exam score for student who studies for 9.25 hours.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;hrs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;9.25&lt;/span&gt;
&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;regressor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt;&lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'predicted score : '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Predicted Score&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;92.90985477015731&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 7 : Model Evaluation
&lt;/h4&gt;

&lt;p&gt;For model evaluation two values are important first one is Mean Squared error and second is R-squared.&lt;/p&gt;

&lt;p&gt;Below is code to calculate above values -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r2_score&lt;/span&gt;

&lt;span class="c1"&gt;# making predictions on entire dataset
&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;regressor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# calculating mean squared error
&lt;/span&gt;&lt;span class="n"&gt;mse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'mean_squared_error : '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;mse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# calculating r-squared value
&lt;/span&gt;&lt;span class="n"&gt;r2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r2_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"R-Squared value : "&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;r2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Conclusiong :
&lt;/h3&gt;

&lt;p&gt;In this article we have studied the concept of simple linear regression and implemented it using python language.&lt;/p&gt;

&lt;p&gt;Remember, Linear regression is just one of many techniques available in regression analysis. Keep exploring and expanding your knowledge to unlock the full potential of regression analysis.&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>productivity</category>
    </item>
    <item>
      <title>7 Essential Techniques for Data Preprocessing Using Python: A Guide for Data Scientists</title>
      <dc:creator>Nitin Kendre</dc:creator>
      <pubDate>Tue, 30 May 2023 03:45:00 +0000</pubDate>
      <link>https://dev.to/newbie_coder/7-essential-techniques-for-data-preprocessing-using-python-a-guide-for-data-scientists-3ijk</link>
      <guid>https://dev.to/newbie_coder/7-essential-techniques-for-data-preprocessing-using-python-a-guide-for-data-scientists-3ijk</guid>
      <description>&lt;p&gt;Data preprocessing is an important step in data science. Data preprocessing means cleaning, transforming and preparing raw data for analyzing. For this purpose Python programming language is used. Because python has inbuilt libraries and tools for data science and machine learning. In this blog we will see some steps in data preprocessing.&lt;/p&gt;

&lt;p&gt;Below are some steps anyone (even beginner can understand) can use in their practice or learning.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Importing Libraries :
&lt;/h3&gt;

&lt;p&gt;First step in any project is importing the necessary libraries, which will be used in entire code. Below are some common libraries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplot.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;seaborn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Loading Data :
&lt;/h3&gt;

&lt;p&gt;Next step is to load the data to process. In python pandas is a library which used for this purpose. Pandas is a very powerful library to load the data and process it. To load the data from &lt;em&gt;csv&lt;/em&gt; file there is function named &lt;code&gt;read_csv&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"data.csv"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Handling Missing Values :
&lt;/h3&gt;

&lt;p&gt;When we work on real world data, there might be some missing values in files. When there are missing values in data then any algorithm can't work on that data. So this step is most important in data preprocessing. So we need to handle all missing values before performing any analysis. &lt;code&gt;pandas&lt;/code&gt; includes various function/methods to handle missing values. There is a function names &lt;code&gt;isnull&lt;/code&gt; which provides information about missing values from data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;missing_values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isnull&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For filling some value in the place of missing value &lt;code&gt;fillna&lt;/code&gt; function can be used.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;inplace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;here &lt;code&gt;inplace&lt;/code&gt; means values will be overwritten in dataset directly&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Handling outliers :
&lt;/h3&gt;

&lt;p&gt;There may be some data points in dataset which may be totally different from other data points. These datapoints are known as outliers. This process can be called as &lt;em&gt;finding the odd one out&lt;/em&gt;. There is function in &lt;code&gt;seaborn&lt;/code&gt; library named &lt;code&gt;Boxplot&lt;/code&gt; which can be used to visualize the distribution of data and identify the outliers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;boxplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'column_name'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;z_score&lt;/code&gt; is used to identify and remove the outliers. &lt;code&gt;z_score&lt;/code&gt; is the measurement of standard deviation of a data point from the mean. this function is inside the &lt;code&gt;scipy&lt;/code&gt; library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zscore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nb"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To more information on outliers &lt;a href="https://dev.to/newbie_coder/outliers-4b10"&gt;Click Here&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Encoding the Categorical variables :
&lt;/h3&gt;

&lt;p&gt;Categorical variables means that data which shows categories or some memberships. Mostly this data can be in the form of string or characters. But all machine learning models works on only numerical data, so that's why we have to encode these variables in numerical format. &lt;/p&gt;

&lt;p&gt;There are several methods for encoding categorical variables.&lt;br&gt;
we will discuss &lt;code&gt;get_dummies&lt;/code&gt; method from &lt;code&gt;pandas&lt;/code&gt; library.&lt;br&gt;
This method creates dummy variables for categorical data. Basically it converts categorical data into 0/1 on the basis of categories from that column.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_dummies&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'column_name'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Demo :&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Rsy3Adqo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ty0a41vpqyfv3l7v8fwq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Rsy3Adqo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ty0a41vpqyfv3l7v8fwq.png" alt="Encoding Categorical Variables" width="800" height="163"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For more &lt;a href="https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html"&gt;Click Here&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Feature Scaling :
&lt;/h3&gt;

&lt;p&gt;To work the model smoothly data should be in some range. Here comes the feature scaling. Using feature scaling data points can be brought in some scale, like 0-1. There is method named &lt;code&gt;MinMaxScaler&lt;/code&gt; in &lt;code&gt;sklearn&lt;/code&gt; library, which can scale the features between 0 to 1.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MinMaxScaler&lt;/span&gt;
&lt;span class="n"&gt;scaler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MinMaxScaler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s"&gt;'column_name'&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scaler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s"&gt;'column_name'&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Above code scales the data points from given column between 0 to 1.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Feature Selection :
&lt;/h3&gt;

&lt;p&gt;Feature selection involves choosing the most relevant features from a dataset, aiming to enhance the accuracy and efficiency of machine learning models. &lt;code&gt;sklearn&lt;/code&gt; library provides the function named &lt;code&gt;SelectKBest&lt;/code&gt; which selects the top K features from the dataset using statistical tests. By opting out for most relevant features, we can optimize the performance of our models in terms of speed and accuracy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.feature_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SelectKBest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chi2&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'target_column'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'target_column'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;selection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SelectKBest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chi2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x_new&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;selection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;here &lt;code&gt;chi2&lt;/code&gt; means chi-squared value which indicates which features is most important. Here if chi2 is higher then that feature will be selected. And k=3 means number of features to select.&lt;/p&gt;

&lt;p&gt;For more information of chi2 - &lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html"&gt;Click Here&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion :
&lt;/h3&gt;

&lt;p&gt;In conclusion, we can say that data pre-processing plays important role in data science/Machine Learning. In this post we explored some fundamental techniques for data preprocessing using python. By applying these techniques, we can clean, transform and prepare raw data for further analysis and modeling.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Data Preprocessing with Python: Essential Techniques for Cleaning and Transforming Data</title>
      <dc:creator>Nitin Kendre</dc:creator>
      <pubDate>Sat, 06 May 2023 16:49:13 +0000</pubDate>
      <link>https://dev.to/newbie_coder/data-preprocessing-with-python-essential-techniques-for-cleaning-and-transforming-data-pnl</link>
      <guid>https://dev.to/newbie_coder/data-preprocessing-with-python-essential-techniques-for-cleaning-and-transforming-data-pnl</guid>
      <description>&lt;p&gt;Data preprocessing is a crucial step in the data science pipeline. It involves cleaning, transforming, and preparing raw data for further analysis. Python is a popular programming language for data preprocessing because of its rich ecosystem of data science libraries. &lt;/p&gt;

&lt;p&gt;In this blog post, we will explore some essential techniques for data preprocessing using Python.&lt;/p&gt;

&lt;h4&gt;
  
  
  1. Importing Libraries:
&lt;/h4&gt;

&lt;p&gt;The first step in any data science project is importing the necessary libraries. For data preprocessing, we typically use the following libraries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  2. Loading Data
&lt;/h4&gt;

&lt;p&gt;The next step is to load the data into Python. Pandas is a powerful library for loading and manipulating data. We can use the &lt;code&gt;read_csv&lt;/code&gt; function to load data from a CSV file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df = pd.read_csv('data.csv')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  3. Handling Missing Values
&lt;/h4&gt;

&lt;p&gt;Missing values are a common problem in real-world data. We need to handle missing values before we can perform any analysis. Pandas provides several functions to handle missing values. The &lt;code&gt;isnull&lt;/code&gt; function returns a Boolean mask indicating which values are missing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;missing_values = df.isnull()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can use the &lt;code&gt;fillna&lt;/code&gt; function to replace missing values with a specified value.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df.fillna(0, inplace=True)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  4. Handling Outliers
&lt;/h4&gt;

&lt;p&gt;Outliers are data points that are significantly different from the other data points in the dataset. Outliers can have a significant impact on statistical models, so it is essential to handle them. We can use the &lt;code&gt;boxplot&lt;/code&gt; function in Seaborn to visualize the distribution of data and identify outliers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sns.boxplot(x=df['column_name'])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can use the Z-score to identify and remove outliers. The Z-score measures how many standard deviations a data point is from the mean.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from scipy import stats
df = df[(np.abs(stats.zscore(df)) &amp;lt; 3).all(axis=1)]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  5. Encoding Categorical Variables
&lt;/h4&gt;

&lt;p&gt;Categorical variables are variables that take on a limited number of possible values. Machine learning algorithms typically require numeric input, so we need to encode categorical variables. We can use the &lt;code&gt;get_dummies&lt;/code&gt; function in Pandas to convert categorical variables into a series of binary columns.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df = pd.get_dummies(df, columns=['column_name'])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  6. Feature Scaling
&lt;/h4&gt;

&lt;p&gt;Feature scaling is the process of scaling the values of the features in the dataset. Scaling is essential for algorithms that use distance-based metrics, such as K-nearest neighbors and support vector machines. We can use the &lt;code&gt;MinMaxScaler&lt;/code&gt; function in Scikit-learn to scale the features between 0 and 1.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df[['column_name']] = scaler.fit_transform(df[['column_name']])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  7. Feature Selection
&lt;/h4&gt;

&lt;p&gt;Feature selection is the process of selecting the most relevant features from the dataset. Selecting the most relevant features can improve the accuracy and speed of the machine learning models. We can use the &lt;code&gt;SelectKBest&lt;/code&gt; function in Scikit-learn to select the top k features based on statistical tests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sklearn.feature_selection import SelectKBest, chi2
X = df.drop('target_column', axis=1)
y = df['target_column']
selector = SelectKBest(chi2, k=3)
X_new = selector.fit_transform(X, y)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In conclusion, data preprocessing is a critical step in the data science pipeline. In this blog post, we explored some essential techniques for data preprocessing using Python. By following these techniques, we can clean, transform, and prepare raw data&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Outliers</title>
      <dc:creator>Nitin Kendre</dc:creator>
      <pubDate>Wed, 23 Feb 2022 21:18:10 +0000</pubDate>
      <link>https://dev.to/newbie_coder/outliers-4b10</link>
      <guid>https://dev.to/newbie_coder/outliers-4b10</guid>
      <description>&lt;h2&gt;
  
  
  My Simple Definition of an outlier is :
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Outlier in a dataset means extremely high or extremely low value from other values in dataset.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrg44i8riy0efu2bp4uz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrg44i8riy0efu2bp4uz.png" alt="Outliers"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In above graph we can see that value below 50 and greater than 150 are outliers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding Outliers from a Dataset
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;An outliers follows the any one condition from below.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;1. outlier &amp;lt; Q1 - 1.5*(IQR)&lt;/code&gt;&lt;br&gt;
&lt;code&gt;2. outlier &amp;gt; Q3 + 1.5*(IQR)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;where&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;IQR = Interquartile range&lt;/li&gt;
&lt;li&gt;Q1 = Lower Quartile&lt;/li&gt;
&lt;li&gt;Q2 = Median or 2 Quartile&lt;/li&gt;
&lt;li&gt;Q3 = Upper Quartile&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;From above we can say that 1st rule means data point should be below of lower quartile and 2nd rule means that data point should be greater than upper quartile.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To Find the outlier we have to calculate the &lt;code&gt;Q1, Q2, Q3 and IQR&lt;/code&gt; first.&lt;/p&gt;
&lt;h2&gt;
  
  
  Finding the Upper, median, lower quartile and inter quartile range in an odd dataset:
&lt;/h2&gt;

&lt;p&gt;Let, we have -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3,5,1,4,2,6,7
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first step is to sort the given data in ascending order.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1,2,3,4,5,6,7
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, here lowest value is 1 i.e. MIN and highest value is 7 i.e. MAX&lt;/p&gt;

&lt;h3&gt;
  
  
  Calculating Q2 in an odd dataset:
&lt;/h3&gt;

&lt;p&gt;Now, Q2 means median or quartile 2, In this step we will calculate it.&lt;/p&gt;

&lt;p&gt;Our given data contains odd values i.e. 7.&lt;br&gt;
So, we have to divide it in equal to parts and there will be one middle value i.e. 4.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(1,2,3),4,(5,6,7)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;So here 4 is the median or Q2 value.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now, To verify it OR an alternate way to calculate it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;index of median = (total_no_of_values+1)/2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;code&gt;(7+1)/2 = 4&lt;/code&gt; which means number in dataset or array at place 4.&lt;/p&gt;

&lt;p&gt;SO &lt;code&gt;Q2 = 4&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Calculating Q1 in an odd dataset:
&lt;/h3&gt;

&lt;p&gt;Initial Dataset :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(1,2,3),4,(5,6,7)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, to calculate the lower quartile or Q1 we have to take first half of the data. That is-&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1,2,3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, here also we have to pick the middle value i.e. 2&lt;/p&gt;

&lt;p&gt;And Formula to calculate the Q1 is -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Q1_place = (total_number_of_values_in_first_half+1)/2
Q1_place = (3+1)/2
Q1_place = 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That means, Q1 is at place 2 in data.&lt;br&gt;
So, &lt;code&gt;Q1 = 2&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Calculating Q3 in an odd dataset:
&lt;/h3&gt;

&lt;p&gt;It is similar to calculating Q1 but instead of First half we have to take another half.&lt;br&gt;
&lt;code&gt;(1,2,3),4,(5,6,7)&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;5,6,7
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, the middle value is Q3 i.e. 6&lt;/p&gt;

&lt;p&gt;Formula:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Q3_place = (total_no_of_values_in_last_half+1)/2
Q3_place = (3+1)/2
Q3_place = 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That means, Q3 is at 2nd place in given half.&lt;br&gt;
So, &lt;code&gt;Q3 = 6&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Calculating IQR in an odd dataset:
&lt;/h3&gt;

&lt;p&gt;Formula for Calculating IQR is :-&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IQR = Q3 - Q1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To find the IQR of given data from above-&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IQR = 6-2
IQR = 4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  To find an outlier in an odd dataset:
&lt;/h2&gt;

&lt;p&gt;Given Data is-&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1,2,3,4,5,6,7
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We have calculated -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MIN = 1
Q1 = 2
MED = 4
Q3 = 6
MAX = 7
IQR = 4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, we can find if any outliers in data - &lt;br&gt;
&lt;em&gt;A data point to be an outlier it must follow any one rule of below.&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;outlier &amp;lt; Q1 - 1.5*(IQR)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;OR&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;outlier &amp;gt; Q3 + 1.5*(IQR)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So to find an outlier we have to calculate that minimum and maximum value.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;outlier &amp;lt; Q1-1.5*(IQR)
outlier &amp;lt; 2-1.5*(4)
outlier &amp;lt; 2-6.0
outlier &amp;lt; -4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;There are no minimum value outliers, because there is no value in dataset less than -4.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Next,-&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;outlier &amp;gt; Q3 + 1.5*(IQR)
outlier &amp;gt; 6 + 1.5*4
outlier &amp;gt; 6 + 6
outlier &amp;gt; 12
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;And There are no maximum value outliers in data.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding the upper, median, lower quartile and IQR in an Even Dataset:
&lt;/h2&gt;

&lt;p&gt;The process of finding quartiles and finding the outliers is bit different from odd dataset.&lt;/p&gt;

&lt;h3&gt;
  
  
  Calculating Q2 in an even dataset:
&lt;/h3&gt;

&lt;p&gt;Let, we have -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;4,8,12,16,20,24,28,32
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, given data is already sorted.&lt;br&gt;
So, to find the median or Q2 we have get an average of middle two numbers. Like-&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(4,8,12),16,20,(24,28,32)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;So, here we have to take average of 20 and 25, and that will be our median or Q2.&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Q2 = (16+20)/2
Q2 = 36/2
Q2 = 18
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Calculating Q1 in an even dataset:
&lt;/h3&gt;

&lt;p&gt;To calculate Q1 we have to cut given dataset in half -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;4,8,12,16 | 20,24,28,32
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here to find the Q1, we have to take average of the middle 2 numbers of first half -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;4,(8,12),16
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is, average of 8 and 12&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Q1 = (8+12)/2
Q1 = 20/2
Q1 = 10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Calculating Q3 in an even dataset:
&lt;/h3&gt;

&lt;p&gt;Showing given data in two half's -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;4,8,12,16 | 20,24,28,32
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To calculate Q3, we have to take average of middle two numbers of last half. Like -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;20,(24,28),32
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is, average of 24 and 28&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Q3 = (24+28)/2
Q3 = 52/2
Q3 = 26
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Calculating IQR in an even dataset:
&lt;/h3&gt;

&lt;p&gt;Calculating IQR is same as from Odd dataset, That is -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IQR = Q3 - Q1
IQR = 26 - 10
IQR = 16
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Finding an outlier in an even dataset:
&lt;/h2&gt;

&lt;p&gt;Now, we have calculated terms required -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MIN = 4
Q1 = 10
MED = 18
Q3 = 26
MAX = 32
IQR = 16
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rules for outliers -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;outlier &amp;lt; Q1 - 1.5*(IQR)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;OR&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;outlier &amp;gt; Q3 + 1.5*(IQR)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finding minimum value outlier -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;outlier &amp;lt; Q1 - 1.5*(IQR)
outlier &amp;lt; 10 - 1.5*(16)
outlier &amp;lt; 10 - 24
outlier &amp;lt; -14
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;SO, There is no minimum value outlier, Because there no value less than -14.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Finding Maximum value outlier -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;outlier &amp;gt; Q3 + 1.5*(IQR)
outlier &amp;gt; 26 + 1.5*(16)
outlier &amp;gt; 26 + 24
outlier &amp;gt; 50
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Here, Also no outlier, because there is no value greater than 50.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion:
&lt;/h2&gt;

&lt;p&gt;In this article we learned about how to calculate quartiles, inter quartile range and outliers.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;em&gt;Thank You!&lt;/em&gt;
&lt;/h2&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>python</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Links For Basics Of Machine Learning</title>
      <dc:creator>Nitin Kendre</dc:creator>
      <pubDate>Mon, 24 Jan 2022 18:47:00 +0000</pubDate>
      <link>https://dev.to/newbie_coder/links-for-basics-of-machine-learning-cne</link>
      <guid>https://dev.to/newbie_coder/links-for-basics-of-machine-learning-cne</guid>
      <description>&lt;h2&gt;
  
  
  Basic Concepts:-
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Free 1 Hour Course for Understand the basics of machine &lt;br&gt;
learning&lt;/strong&gt; by &lt;strong&gt;Google&lt;/strong&gt;&lt;br&gt;
Link :- &lt;br&gt;
&lt;a href="https://learndigital.withgoogle.com/digitalgarage/course/machine-learning-basics"&gt;https://learndigital.withgoogle.com/digitalgarage/course/machine-learning-basics&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.analyticsvidhya.com/blog/2021/07/basic-statistics-concepts-for-machine-learning-newbies/"&gt;Basic Statistics Concepts for Machine Learning Newbies! with practicals using Python&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://dzone.com/articles/data-structures-related-to-machine-learning-algori"&gt;Data Structures Related to Machine Learning&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/"&gt;Commonly used Machine Learning Algorithms (with Python and R Codes)&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/pythoneers/statistics-in-ml-3d70b09d72b7"&gt;Statistics in ML.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Interview Questions:-
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://analyticsindiamag.com/40-interview-questions-on-statistics-for-data-scientists/"&gt;40 Interview Questions On Statistics For Data Scientists.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.analyticsvidhya.com/blog/2021/04/25-probability-and-statistics-questions-to-ace-your-data-science-interviews/"&gt;25 Probability and Statistics Questions to Ace your Data Science Interviews.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.mygreatlearning.com/blog/statistics-interview-questions/"&gt;Top Statistics Interview Questions.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.kdnuggets.com/2021/02/10-statistical-concepts-data-science-interviews.html"&gt;10 Statistical Concepts You Should Know For Data Science Interviews.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.interviewbit.com/machine-learning-interview-questions/"&gt;Machine Learning Interview Questions.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;Thank You For Reading.&lt;/p&gt;

&lt;p&gt;_ _&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Above All Resources are free to read and Learn. I am not promoting any comercials.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>beginners</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
