<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: NADEEM SHAIK</title>
    <description>The latest articles on DEV Community by NADEEM SHAIK (@nadeemshaik).</description>
    <link>https://dev.to/nadeemshaik</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F956570%2F73033d3b-3702-4168-a16d-4cea59e4cd31.png</url>
      <title>DEV Community: NADEEM SHAIK</title>
      <link>https://dev.to/nadeemshaik</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nadeemshaik"/>
    <language>en</language>
    <item>
      <title>Fit vs Fit_transform</title>
      <dc:creator>NADEEM SHAIK</dc:creator>
      <pubDate>Tue, 07 Jan 2025 12:04:57 +0000</pubDate>
      <link>https://dev.to/nadeemshaik/fit-vs-fittransform-4ae5</link>
      <guid>https://dev.to/nadeemshaik/fit-vs-fittransform-4ae5</guid>
      <description>&lt;p&gt;Have you ever wondered whats the difference between fit() and fit_transform(). you must have came across these 2 functions somewhere while preprocessing your data. So, lets learn the difference between fit and fit_transform. we are going to understand this using an example &lt;/p&gt;

&lt;p&gt;whenever you want to perform standardization which is an essential preprocessing step, you typically need to calculate various parameters of the data like mean, min, max, variance. fit_transform calculates these parameters and applies to the dataset, where as fit calculates these parameters but doesn't apply to the dataset. &lt;/p&gt;

&lt;p&gt;Lets assume this small array of data &lt;br&gt;
data = [[1,2,3],[4,5,6],[7,8,9]]&lt;/p&gt;

&lt;p&gt;when you apply standard scaler and use fit and transform seperately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sklearn.preprocessing import StandardScaler

# step-1
Scaler = StandardScaler()

# step-2
scaled_data = Scaler.fit(data) # no scaling of data takes place here ,just the mean and std deviation are calculated. 

# step-3
scaled_data = Scaler.transform() # now the scaled data contains the data after performing standardization. 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;when you apply fit_transform instead of fit and transform seperately.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sklearn.preprocessing import StandardScaler

# step-1
Scaler = StandardScaler()

# step-2
scaled_data = Scaler.fit_transform(data) # scaled_data contains the data after performing standardization. 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we can observe that by using fit_transform() we are essentially reducing an extra step &lt;/p&gt;

&lt;p&gt;which one to use purely depends upon your usecase. If you want to learn parameters for once and then apply transformations to multiple datasets like training set and testing set, using fit and transform seperately is preferred. but if you want to apply transformation to a single dataset, use fit_transform() which makes the preprocessing pipeline concise. &lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datapreprocessing</category>
      <category>python</category>
      <category>sklearn</category>
    </item>
  </channel>
</rss>
