<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vikram Bhagat</title>
    <description>The latest articles on DEV Community by Vikram Bhagat (@vikram_bhagat_10f70442739).</description>
    <link>https://dev.to/vikram_bhagat_10f70442739</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1890157%2F5e7e03c2-9f34-4496-b0c8-6be73c567e24.png</url>
      <title>DEV Community: Vikram Bhagat</title>
      <link>https://dev.to/vikram_bhagat_10f70442739</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vikram_bhagat_10f70442739"/>
    <language>en</language>
    <item>
      <title>How to improve ML Model Accuracy for Text Classification?</title>
      <dc:creator>Vikram Bhagat</dc:creator>
      <pubDate>Tue, 06 Aug 2024 09:00:25 +0000</pubDate>
      <link>https://dev.to/vikram_bhagat_10f70442739/how-to-improve-ml-model-accuracy-for-text-classification-4l28</link>
      <guid>https://dev.to/vikram_bhagat_10f70442739/how-to-improve-ml-model-accuracy-for-text-classification-4l28</guid>
      <description>&lt;p&gt;Hi Experts,&lt;/p&gt;

&lt;p&gt;We are dealing with Text Classification Problem. We have around 80K records with around 50 classes. The data is highly imbalanced. It has 2 columns one for description and other contains class.&lt;br&gt;
Till now we have tried following models and techniques:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Data Preprocessing:
a. Lowercase conversion, removed numeric texts, removed 
  punctuations
b. Removed unimportant words and stop words
c. Lemmatization&lt;/li&gt;
&lt;li&gt;TFIDF transformation&lt;/li&gt;
&lt;li&gt;Using SKLEARN Models:
a. Linear SVC
b. Linear Regression
c. Logistic Regression
d. Decision Trees
e. Random Forest&lt;/li&gt;
&lt;li&gt;Using Huggingface Transformers:
a. Google Bert
b. Distil Bert&lt;/li&gt;
&lt;li&gt;SMOTE sampling&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It is observed that the maximum accuracy we got is 70% (Random Forest and Google Bert).&lt;br&gt;
Is there any scope to improve accuracy?&lt;br&gt;
If yes, what other techniques or models we can use to improve accuracy?&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>python</category>
      <category>scikitlearn</category>
    </item>
  </channel>
</rss>
