Ever wondered how businesses know if customers are happy or not? In this project, I built a machine learning model that classifies Amazon product reviews as Positive or Negative using NLP techniques. Here's how I did it.
The Dataset
I used the Amazon Review Polarity Dataset — sampling 200,000 reviews for training and 50,000 for testing. The dataset was perfectly balanced between positive and negative reviews, which is ideal for classification.Cleaning the Text
Raw reviews are messy. I wrote a preprocessing function to lowercase text, strip punctuation, numbers, and remove stopwords using NLTK. This is really helpful for the model to identify words properly.
def clean_text(text):
text = str(text).lower()
text = re.sub(r"[^\w\s]", "", text)
text = re.sub(r"\d+", "", text)
words = [word for word in text.split() if word not in stop_words]
return " ".join(words)
- Converting Text to Numbers with TF-IDF Machine learning models need numbers, not words. TF-IDF weighs words by how unique they are to each review — common words like "the" get ignored, meaningful words like "terrible" get prioritised.
vectorizer = TfidfVectorizer(max_features=5000, min_df=5, max_df=0.9)
X_train = vectorizer.fit_transform(train_df["clean_text"])
X_test = vectorizer.transform(test_df["clean_text"])
Training & Comparing Models
I trained and compared three models — Logistic Regression , Naive Bayes, and Linear SVM. Logistic Regression performed best and was used for the final evaluation.Results
Tested on 50,000 reviews:
Metric Negative - Positive
Precision = 0.89 - 0.88
Recall = 0.88 - 0.89
F1-Score = 0.88 - 0.89
Overall Accuracy: 88% — balanced performance across both classes.
Real-Time Predictions
def predict_sentiment(text):
cleaned = clean_text(text)
vectorized = vectorizer.transform([cleaned])
prediction = model.predict(vectorized)[0]
return "Positive" if prediction == 1 else "Negative"
"This product is amazing!" -> Positive
"Completely useless, waste of money" -> Negative
- Visualizations Three charts helped tell the story:
Sentiment distribution — confirmed the dataset was balanced

Word cloud — top positive words: great, love, best

Confusion matrix — symmetric errors, no class bias

What I Learned
Working at this scale (250k reviews) taught me that clean data and a balanced dataset matter more than model complexity. Logistic Regression beat fancier approaches simply because the data was well prepared.
Next steps: hyperparameter tuning, cross-validation, and eventually a BERT-based model for higher accuracy.
Full code on my GitHub — feel free to clone and try it on your own dataset!
Found this helpful? Drop a like or leave a comment below!
Top comments (0)