Classifying Amazon Reviews with Python: From Raw Text to 88% Accuracy

#python #machinelearning #nlp #beginners

Ever wondered how businesses know if customers are happy or not? In this project, I built a machine learning model that classifies Amazon product reviews as Positive or Negative using NLP techniques. Here's how I did it.

The Dataset
I used the Amazon Review Polarity Dataset — sampling 200,000 reviews for training and 50,000 for testing. The dataset was perfectly balanced between positive and negative reviews, which is ideal for classification.
Cleaning the Text
Raw reviews are messy. I wrote a preprocessing function to lowercase text, strip punctuation, numbers, and remove stopwords using NLTK. This is really helpful for the model to identify words properly.

def clean_text(text):
    text = str(text).lower()
    text = re.sub(r"[^\w\s]", "", text)
    text = re.sub(r"\d+", "", text)
    words = [word for word in text.split() if word not in stop_words]
    return " ".join(words)

Converting Text to Numbers with TF-IDF Machine learning models need numbers, not words. TF-IDF weighs words by how unique they are to each review — common words like "the" get ignored, meaningful words like "terrible" get prioritised.

vectorizer = TfidfVectorizer(max_features=5000, min_df=5, max_df=0.9)
X_train = vectorizer.fit_transform(train_df["clean_text"])
X_test = vectorizer.transform(test_df["clean_text"])

Training & Comparing Models
I trained and compared three models — Logistic Regression , Naive Bayes, and Linear SVM. Logistic Regression performed best and was used for the final evaluation.
Results
Tested on 50,000 reviews:
Metric Negative - Positive
Precision = 0.89 - 0.88
Recall = 0.88 - 0.89
F1-Score = 0.88 - 0.89
Overall Accuracy: 88% — balanced performance across both classes.
Real-Time Predictions

def predict_sentiment(text):
    cleaned = clean_text(text)
    vectorized = vectorizer.transform([cleaned])
    prediction = model.predict(vectorized)[0]
    return "Positive" if prediction == 1 else "Negative"

"This product is amazing!" -> Positive
"Completely useless, waste of money" -> Negative

Visualizations Three charts helped tell the story:

Sentiment distribution — confirmed the dataset was balanced

Word cloud — top positive words: great, love, best

Confusion matrix — symmetric errors, no class bias

What I Learned
Working at this scale (250k reviews) taught me that clean data and a balanced dataset matter more than model complexity. Logistic Regression beat fancier approaches simply because the data was well prepared.
Next steps: hyperparameter tuning, cross-validation, and eventually a BERT-based model for higher accuracy.
Full code on my GitHub — feel free to clone and try it on your own dataset!

Found this helpful? Drop a like or leave a comment below!

DEV Community

Classifying Amazon Reviews with Python: From Raw Text to 88% Accuracy

Top comments (0)