How to improve ML Model Accuracy for Text Classification?

#scikitlearn #machinelearning #datascience #python

Hi Experts,

We are dealing with Text Classification Problem. We have around 80K records with around 50 classes. The data is highly imbalanced. It has 2 columns one for description and other contains class.
Till now we have tried following models and techniques:

Data Preprocessing: a. Lowercase conversion, removed numeric texts, removed punctuations b. Removed unimportant words and stop words c. Lemmatization
TFIDF transformation
Using SKLEARN Models: a. Linear SVC b. Linear Regression c. Logistic Regression d. Decision Trees e. Random Forest
Using Huggingface Transformers: a. Google Bert b. Distil Bert
SMOTE sampling

It is observed that the maximum accuracy we got is 70% (Random Forest and Google Bert).
Is there any scope to improve accuracy?
If yes, what other techniques or models we can use to improve accuracy?

DEV Community

How to improve ML Model Accuracy for Text Classification?

Top comments (0)