<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Youssef Bellouz</title>
    <description>The latest articles on DEV Community by Youssef Bellouz (@youssef_bellouz_ce1d9c1b4).</description>
    <link>https://dev.to/youssef_bellouz_ce1d9c1b4</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3655381%2Fbd26e040-af61-423b-af9f-c1ef3905d6c3.jpg</url>
      <title>DEV Community: Youssef Bellouz</title>
      <link>https://dev.to/youssef_bellouz_ce1d9c1b4</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/youssef_bellouz_ce1d9c1b4"/>
    <language>en</language>
    <item>
      <title>Data Science Meets Wine: Predicting Wine Quality &amp; Recommending Wines Using Machine Learning</title>
      <dc:creator>Youssef Bellouz</dc:creator>
      <pubDate>Tue, 23 Dec 2025 15:07:48 +0000</pubDate>
      <link>https://dev.to/youssef_bellouz_ce1d9c1b4/predicting-wine-quality-with-data-science-my-vivino-project-1m88</link>
      <guid>https://dev.to/youssef_bellouz_ce1d9c1b4/predicting-wine-quality-with-data-science-my-vivino-project-1m88</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;With the rapid growth of data-driven platforms like Vivino, wine quality assessment is no longer driven only by expert sommeliers or traditional rules-based systems. Millions of users now generate real-world data reflecting authentic consumer preferences.&lt;/p&gt;

&lt;p&gt;In this project, we aim to answer a key business question:&lt;br&gt;
&lt;strong&gt;Can we predict wine quality and recommend similar wines using only physicochemical properties?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To achieve this, we built a complete Data Science pipeline that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cleans and explores wine data&lt;/li&gt;
&lt;li&gt;Analyzes relationships between features&lt;/li&gt;
&lt;li&gt;Trains Machine Learning models&lt;/li&gt;
&lt;li&gt;Explains predictions using feature importance&lt;/li&gt;
&lt;li&gt;Recommends similar wines using cosine similarity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This project follows a scientific approach, from hypothesis formulation to evaluation and interpretation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dataset Overview
&lt;/h2&gt;

&lt;p&gt;We use the Wine Quality Dataset (UCI Machine Learning Repository), which contains two datasets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Red Wine&lt;/li&gt;
&lt;li&gt;White Wine&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each dataset includes physicochemical properties such as acidity, sugar, pH, alcohol, and a quality score assigned by experts.&lt;/p&gt;

&lt;p&gt;Target variable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;quality (integer score between 0 and 10)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Data Collecting &amp;amp; Cleaning
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cleaning Strategy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before modeling, data quality is critical. The following steps were applied:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Removed duplicate rows&lt;/li&gt;
&lt;li&gt;Converted all values to numeric types&lt;/li&gt;
&lt;li&gt;Dropped missing values&lt;/li&gt;
&lt;li&gt;Removed non-logical values (e.g., negative alcohol, unrealistic pH)&lt;/li&gt;
&lt;li&gt;Removed outliers using IQR (Interquartile Range)&lt;/li&gt;
&lt;li&gt;Applied log1p transformation on residual sugar to reduce skewness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This ensures the model learns from reliable and realistic data only.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Exploration
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Feature Distributions&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmh6sptt1jy1hq098w9du.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmh6sptt1jy1hq098w9du.png" alt=" " width="477" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2lc48we74tnxz3sxuxac.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2lc48we74tnxz3sxuxac.png" alt=" " width="527" height="258"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We explored distributions of all physicochemical features to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Detect skewed variables&lt;/li&gt;
&lt;li&gt;Identify outliers&lt;/li&gt;
&lt;li&gt;Understand natural value ranges&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This step guided later preprocessing decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Correlation Analysis
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9tsp9xpzsenqqrxgj976.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9tsp9xpzsenqqrxgj976.png" alt=" " width="431" height="385"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A correlation heatmap revealed important relationships:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Alcohol has a positive correlation with quality&lt;/li&gt;
&lt;li&gt;Volatile acidity negatively impacts wine quality&lt;/li&gt;
&lt;li&gt;Some features show weak or no correlation, justifying the use of non-linear models&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Modeling Strategy
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Feature Preparation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The dataset is split into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;X → physicochemical features&lt;/li&gt;
&lt;li&gt;y → quality score&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Separate models are trained for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Red wine&lt;/li&gt;
&lt;li&gt;White wine&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Data is split into training (80%) and testing (20%)&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Machine Learning Model
&lt;/h2&gt;

&lt;p&gt;We use a Random Forest Classifier, chosen because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It handles non-linear relationships well&lt;/li&gt;
&lt;li&gt;It is robust to noise&lt;/li&gt;
&lt;li&gt;It provides feature importance for explainability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;RandomForestClassifier(n_estimators=100, random_state=42)&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Model Evaluation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Accuracy &amp;amp; Classification Report&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flnna1ozwwt8lkbs8yy43.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flnna1ozwwt8lkbs8yy43.png" alt=" " width="245" height="233"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The models achieve strong accuracy on both datasets, with balanced performance across quality classes.&lt;/p&gt;

&lt;p&gt;The confusion matrix shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High accuracy for medium-quality wines&lt;/li&gt;
&lt;li&gt;Acceptable misclassification between neighboring quality scores (expected due to subjectivity)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This confirms that the model learned meaningful patterns from the data.&lt;/p&gt;

&lt;p&gt;Feature Importance (Model Explainability)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fencfxbgfi2jkfr6dtfh9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fencfxbgfi2jkfr6dtfh9.png" alt=" " width="462" height="605"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One of the most valuable insights comes from feature importance analysis.&lt;/p&gt;

&lt;p&gt;Top predictors include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Alcohol&lt;/li&gt;
&lt;li&gt;Sulphates&lt;/li&gt;
&lt;li&gt;Volatile acidity&lt;/li&gt;
&lt;li&gt;Density&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This aligns with real-world wine chemistry, validating the model’s credibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wine Recommendation System
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Beyond prediction, we implemented a recommendation system.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;How it Works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Predict the quality of a new wine sample&lt;/li&gt;
&lt;li&gt;Compute cosine similarity between the new wine and existing wines&lt;/li&gt;
&lt;li&gt;Recommend the top N most similar wines&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This transforms the project from a pure ML model into a business-ready feature.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvno78ysywri3o3ah7k50.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvno78ysywri3o3ah7k50.png" alt=" " width="170" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Business Impact
&lt;/h2&gt;

&lt;p&gt;This system can be used to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1. Recommend wines to users based on taste similarity&lt;/li&gt;
&lt;li&gt;2. Assist sellers in pricing and positioning wines&lt;/li&gt;
&lt;li&gt;3. Help customers discover high-quality wines beyond price bias&lt;/li&gt;
&lt;li&gt;4. Modernize rule-based recommendation systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 Cheaper wines can still be high-quality, and our model proves it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this project, we successfully:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Built a full Data Science pipeline&lt;/li&gt;
&lt;li&gt;Cleaned and explored real-world wine data&lt;/li&gt;
&lt;li&gt;Trained interpretable Machine Learning models&lt;/li&gt;
&lt;li&gt;Predicted wine quality accurately&lt;/li&gt;
&lt;li&gt;Implemented a similarity-based recommendation system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This demonstrates how data science can enhance customer experience, improve decision-making, and drive business value in digital marketplaces like Vivino.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Improvements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Convert quality into categorical classes (Low / Medium / High)&lt;/li&gt;
&lt;li&gt;Add hyperparameter tuning (GridSearch)&lt;/li&gt;
&lt;li&gt;Deploy the model as an API&lt;/li&gt;
&lt;li&gt;Build a web interface for real-time recommendations&lt;/li&gt;
&lt;li&gt;Integrate user preference data&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Note
&lt;/h2&gt;

&lt;p&gt;This project reflects a production-ready mindset:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clean code&lt;/li&gt;
&lt;li&gt;Modular pipeline&lt;/li&gt;
&lt;li&gt;Explainable ML&lt;/li&gt;
&lt;li&gt;Business-oriented thinking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Data science doesn’t replace taste — it enhances it.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
