How to improve ML Model Accuracy for Text Classification?

Vikram Bhagat — Tue, 06 Aug 2024 09:00:25 +0000

Hi Experts,

We are dealing with Text Classification Problem. We have around 80K records with around 50 classes. The data is highly imbalanced. It has 2 columns one for description and other contains class.
Till now we have tried following models and techniques:

Data Preprocessing: a. Lowercase conversion, removed numeric texts, removed punctuations b. Removed unimportant words and stop words c. Lemmatization
TFIDF transformation
Using SKLEARN Models: a. Linear SVC b. Linear Regression c. Logistic Regression d. Decision Trees e. Random Forest
Using Huggingface Transformers: a. Google Bert b. Distil Bert
SMOTE sampling

It is observed that the maximum accuracy we got is 70% (Random Forest and Google Bert).
Is there any scope to improve accuracy?
If yes, what other techniques or models we can use to improve accuracy?

DEV Community: Vikram Bhagat

How to improve ML Model Accuracy for Text Classification?