DEV Community

Cover image for Classifying Amazon Reviews with Python: From Raw Text to 88% Accuracy
Akanle Tolulope
Akanle Tolulope

Posted on

Classifying Amazon Reviews with Python: From Raw Text to 88% Accuracy

Ever wondered how businesses know if customers are happy or not? In this project, I built a machine learning model that classifies Amazon product reviews as Positive or Negative using NLP techniques. Here's how I did it.

  1. The Dataset
    I used the Amazon Review Polarity Dataset — sampling 200,000 reviews for training and 50,000 for testing. The dataset was perfectly balanced between positive and negative reviews, which is ideal for classification.

  2. Cleaning the Text
    Raw reviews are messy. I wrote a preprocessing function to lowercase text, strip punctuation, numbers, and remove stopwords using NLTK. This is really helpful for the model to identify words properly.

def clean_text(text):
    text = str(text).lower()
    text = re.sub(r"[^\w\s]", "", text)
    text = re.sub(r"\d+", "", text)
    words = [word for word in text.split() if word not in stop_words]
    return " ".join(words)
Enter fullscreen mode Exit fullscreen mode
  1. Converting Text to Numbers with TF-IDF Machine learning models need numbers, not words. TF-IDF weighs words by how unique they are to each review — common words like "the" get ignored, meaningful words like "terrible" get prioritised.
vectorizer = TfidfVectorizer(max_features=5000, min_df=5, max_df=0.9)
X_train = vectorizer.fit_transform(train_df["clean_text"])
X_test = vectorizer.transform(test_df["clean_text"])
Enter fullscreen mode Exit fullscreen mode
  1. Training & Comparing Models
    I trained and compared three models — Logistic Regression , Naive Bayes, and Linear SVM. Logistic Regression performed best and was used for the final evaluation.

  2. Results
    Tested on 50,000 reviews:
    Metric Negative - Positive
    Precision = 0.89 - 0.88
    Recall = 0.88 - 0.89
    F1-Score = 0.88 - 0.89
    Overall Accuracy: 88% — balanced performance across both classes.
    Classification Report

  3. Real-Time Predictions

Model identifying positive and negative reviews

def predict_sentiment(text):
    cleaned = clean_text(text)
    vectorized = vectorizer.transform([cleaned])
    prediction = model.predict(vectorized)[0]
    return "Positive" if prediction == 1 else "Negative"
Enter fullscreen mode Exit fullscreen mode

"This product is amazing!" -> Positive
"Completely useless, waste of money" -> Negative

  1. Visualizations Three charts helped tell the story:

Sentiment distribution — confirmed the dataset was balanced
Class distribution, Positive class and Negative class

Word cloud — top positive words: great, love, best
Diplayed the top positive words

Confusion matrix — symmetric errors, no class bias
Diagonal and Off diagonal, Showing the TN,FN,TP,FP

What I Learned
Working at this scale (250k reviews) taught me that clean data and a balanced dataset matter more than model complexity. Logistic Regression beat fancier approaches simply because the data was well prepared.
Next steps: hyperparameter tuning, cross-validation, and eventually a BERT-based model for higher accuracy.
Full code on my GitHub — feel free to clone and try it on your own dataset!

Found this helpful? Drop a like or leave a comment below!

Top comments (0)