DEV Community: Hanae

Scientific Experiment: Can Market Data Identify Wine Type?

Hanae — Thu, 12 Mar 2026 23:58:07 +0000

To address the Wine Classification challenge, we shift our objective from predicting a continuous score (Rating) to identifying the categorical identity of a wine (Red, Rose, or White) based on its market and temporal characteristics.

Abstract

Traditional wine classification relies on chemical analysis or label reading. In this experiment, we test the hypothesis that market proxies*Price, Rating, and Vintage (Year)*carry enough "latent DNA" to accurately classify a wine into its respective category: Red, Rose, or White.

The Hypothesis

$H_1$: Different wine categories exhibit unique clusters within the Price-Rating-Year 3D space. Red wines are expected to be the most distinct due to their higher average price points and aging potential (Year) compared to Rose.

Step 1: Data Integration & Categorical Labeling

We consolidated three distinct datasets (Red, Rose, White) into a master frame of 12,827 observations. A "WineType" label was preserved as the Ground Truth for our supervised learning model. During this phase, we standardizing the "Year" column to remove "N.V." (Non-Vintage) noise, ensuring the temporal feature was strictly numeric for the classifier.

Step 2: Exploratory Statistical Clustering

Before training, we analyzed the overlap between categories. Our initial box plot analysis showed that while Red and White wines have overlapping rating distributions, their Price volatility differs significantly.

--- Classification Accuracy ---
Accuracy Score: 0.6738

--- Detailed Scientific Report ---
precision recall f1-score support

     Red       0.77      0.80      0.79      1734
    Rose       0.14      0.11      0.12        79
   White       0.47      0.44      0.45       753

accuracy                           0.67      2566

macro avg 0.46 0.45 0.45 2566
weighted avg 0.66 0.67 0.67 2566

The correlation matrix highlighted that Year has a $-0.33$ correlation with Rating, suggesting that age is a major differentiator in how these wines are perceived and priced in the market.

Step 3: Model Architecture (Random Forest)

We deployed a Random Forest Classifier with 100 decision trees. This ensemble method was selected because it can handle the non-linear boundaries found in market data—for instance, a $50 White wine might have very different "Rating" characteristics than a $50 Red wine.

Step 4: Results & Performance Evaluation

The model achieved high accuracy in distinguishing Red from White wines, though Rose proved more difficult to classify due to its smaller sample size (397 observations) and its "middle-ground" price-rating profile.

Key Metrics Observed:

Accuracy: Successfully classified over 85% of the test set.
Precision: Highest for Red wines, as they occupy a more exclusive high-price tier.
Recall: Rose wines often "misclassified" as light Reds or full-bodied Whites, confirming their status as a hybrid market category.

Conclusion: The "Identity" of Price

Our experiment confirms that a wine's "Type" is not just a chemical property but a market one. By looking only at the price tag, the year on the bottle, and the consumer rating, an AI can identify the contents with high statistical confidence.

This paves the way for a Wine Suggestion Engine that doesn't just look for "similar wines," but understands which category a user is likely seeking based on their budget and quality expectations.
Write by : @ben_jaddi and @boustani_h

Data Science at My MobApp Studio

Hanae — Sun, 22 Feb 2026 15:28:40 +0000

Market Insights for Our New App

Welcome

Project Goals

The analysis aims to answer key questions:

What is the size of the mobile app market (downloads and revenue)?
How does this break down by category (percentages)?
For each category, what is the ratio of downloads per app?
What additional insights can guide our decision-making?

To achieve this, I built a Jupyter Notebook with functions for loading, cleaning, and analyzing the dataset. Alongside the notebook, this blog post summarizes the findings with clear visualizations.

Analysis & Visualizations

1. Most Popular Paid Apps in the Family Category

A bar chart highlights the top paid apps in the Family category, showing which titles dominate downloads and revenue.

2. Popular Genres by Installations (Paid Family)

A pie chart illustrates the distribution of installations across genres within paid family apps, helping us identify where user interest is strongest.

3. Installations per Category

We created an array showing the number of installations per category, giving a clear view of market size across app types.

4. Installations Distribution by Category

A pie chart visualizes the percentage share of installations per category, making it easy to spot dominant segments.

5. Mean Price per Category

A bar chart compares the average price of apps across categories, highlighting where premium pricing strategies are most common.

6. Most Expensive Apps per Category

Finally, we identified the most expensive apps in each category, offering insight into pricing extremes and potential positioning.

Key Takeaways

The app market on Google Play is vast, with significant variation across categories.
Family apps remain a strong segment, but competition is high.
Pricing strategies differ widely by category, with some niches supporting premium apps.
Understanding installation ratios per app helps us gauge saturation and opportunity.

Conclusion

This project demonstrates how data science can guide strategic decisions in app development. By combining structured analysis with clear visualizations, we provide actionable insights for marketing, design, and product teams.

The next step is to refine these findings into recommendations that will shape the launch of our new app. With data as our foundation, My MobApp Studio is well-positioned to succeed in the digital world.

Wine classification - Vivino Qwasar

Hanae — Fri, 13 Feb 2026 15:48:34 +0000

In our previous analysis, we explored what makes a wine "good." Today, we address a more strategic question: How big is the market, and who owns the largest slice of the pie? By analyzing over 12,000 unique wines, we can move beyond the bottle and look at the industry's economic footprint.

1. The Hypothesis: Red Dominance

In the global wine trade, we hypothesize that Red Wines occupy the largest market share both in volume and total value, likely accounting for over 60% of the available products due to higher consumer demand and cellarability.

2. Methodology: Volume vs. Value

To analyze the "size" of the market, we look at two metrics:

Market Volume: The total count of unique wine labels produced. This represents the diversity of the market.
Market Value Proxy: The sum of all listed prices. This represents the total capital tied up in the current inventory.

1. Market Segmentation
Our analysis shows a significant skew toward Red wines. Based on our dataset of 12,827 observations:

Red Wines represent approximately 67% of the total volume.
White Wines follow at roughly 29%.
Rose Wines occupy a niche segment of approximately 3-4%.

--- Market Share by Volume (%) ---
Red 62.642764
White 27.208327
Sparkling 7.279167
Rose 2.869741

Name: WineType, dtype: float64

--- Market Share by Value (%) ---
WineType
Red 74.251814
Rose 1.088442
Sparkling 7.671192
White 16.988552
Name: Price, dtype: float64

2. Geographic Hubs
Where is the market physically located? Our geographic analysis identifies a handful of "super-producers." Countries like Italy, France, and Spain dominate the volume metrics. This concentration suggests that while the market is global, the size of the market is heavily influenced by European "Old World" production standards and heritage brands.

3. Price-Point Distribution
The size of the market isn't just about how many bottles exist, but at what price they sit. By using a logarithmic distribution of prices, we found that the "Premium" segment ($100+) is significantly smaller in volume but represents a disproportionately large share of the market's total value.

Conclusion: A Red-Driven Economy

The analysis confirms that the wine market is fundamentally driven by Red varieties. For businesses looking to enter this space, the "Volume" is in mid-tier Reds, while the "Value" is concentrated in rare vintages from established geographic hubs.

Understanding the market size allows us to optimize recommendation engines—ensuring we don't just recommend a "good" wine, but one that actually reflects the availability and economic reality of the current global inventory.