DEV Community

Cover image for How I Built an Oreo Generator with 1.1 Sextillion Combinations
Cody Pearce
Cody Pearce

Posted on • Originally published at codinhood.com

How I Built an Oreo Generator with 1.1 Sextillion Combinations

TLDR

I built a web app that generates 1.1 sextillion possible Oreo combinations from 406,718 flavors. Try the generator or watch the video where I explain the process. Everything from Chocolate Chip Oreo to Pokemon Sriracha Danny DeVito Oreo.


Project Summary

Metric Value
Total combinations 1.1 sextillion
Flavors in database 406,718
Raw records processed 31 million
Manual review time 3-4 days

The math is straightforward combinatorics. The real challenge was building a comprehensive flavor database.


Table of Contents


The Mathematical Foundation

The formula for counting combinations is the binomial coefficient:

function combination(n: number, k: number): bigint {
  if (k > n) return 0n;
  if (k === 0 || k === n) return 1n;

  let result = 1n;
  for (let i = 1; i <= k; i++) {
    result = (result * BigInt(n - i + 1)) / BigInt(i);
  }
  return result;
}
Enter fullscreen mode Exit fullscreen mode

This calculates C(n,k): the number of ways to choose k items from n total items without replacement, where order doesn't matter and no item appears twice.

For Oreos with 1-4 flavors from n total flavors:

C(n,1) + C(n,2) + C(n,3) + C(n,4)
Enter fullscreen mode Exit fullscreen mode

The implementation uses BigInt because JavaScript numbers lose precision beyond 2^53. Standard numbers can't accurately represent values like 1.1 sextillion.

First, I needed to find n: the total number of possible flavor components.


Defining Flavor Categories

Looking at real Oreo releases, I noticed they fall into three distinct categories:

1. Food Flavors

Traditional flavors that you can actually taste: Birthday Cake, Red Velvet, Peanut Butter, Mint.

Red Velvet Oreo - Traditional food flavor

2. Food Brand Collaborations

Branded food products that bring their own flavor profiles: Reese's Peanut Butter Cup, Chips Ahoy!, Sour Patch Kids, Swedish Fish.

Reese's Oreo - Food brand collaboration

3. Non-Food Brand Collaborations

These are where Oreo gets creative with pure branding: Supreme, Game of Thrones, Star Wars, Lady Gaga, Selena Gomez.

Selena Gomez Oreo - Non-food brand collaboration

When Oreo releases a Pokemon Oreo, they're not claiming it tastes like Pikachu (hopefully). It's about the brand association.


Building the Flavor Database

I built the database in three phases: raw data acquisition, automated processing and filtering, and manual review.


Phase 1: Data Acquisition

No single dataset contains every possible flavor. I searched for comprehensive food databases, but most had significant gaps or poor organization.

Search Results food datasets

Wikipedia's food lists turned out to be surprisingly useful: easy to scrape and full of unique, high-quality entries.

Wikipedia's list of foods - easy to scrape with quality entries

Eventually, I assembled data from food databases, brand repositories, entertainment franchises, and celebrity databases.

Food Flavors

I gathered raw source files containing 427,344 total records from multiple food databases:

Source Count Description
Wikipedia Food Lists 261,547 Raw file entries from comprehensive food lists
USDA Foundation Foods Database 155,244 Raw database rows (nutritional data, metadata, duplicates)
Food Identicon 4,999 Community-maintained ingredients
Wikidata 3,222 Dishes, variations, and food variants via SPARQL queries
CORGIS Ingredients Dataset 2,332 Academic dataset entries
Total Raw Food Records 427,344

Food Brands

Source Count Description
Open Food Facts 4,000,000 1.1GB CSV database with products
Wikidata Food Brands 1,101 Structured food and beverage brands via SPARQL queries
Total 4,001,101

Non-Food Brands

I broke non-food brands into types to track totals for each category.

Entertainment Characters & Franchises

Source Count Description
Marvel Characters 16,376 Characters from FiveThirtyEight dataset
DC Characters 6,896 Characters from FiveThirtyEight dataset
Disney Characters 9,820+ From Disney API plus manual curation
Pokemon 900+ Pokemon with complete stats
Anime 525+ Dragon Ball (209), One Piece (316+)
Gaming 238+ Wikipedia gaming/Nintendo characters
Franchises 563 Wikipedia (425) + Wikidata (138) franchises
Total 35,318

Sports Teams

Source Count Description
Wikipedia Sports 1,190 Professional teams
Wikidata Sports 1,500 Sports entities with metadata
Total 2,690

Music & Celebrities

Source Count Description
MTV Artists 300+ Artists with social media and genre data
IMDb Names 14,681,812 Names for celebrity extraction
IMDb Titles 11,877,454 Titles
Manual Celebrities 99 Manually curated
Total 26,559,665

Non-Food Brands Summary

Category Count
Entertainment Characters & Franchises 35,318
Sports Teams 2,690
Music & Celebrities 26,559,665
Total Non-Food Brands 26,597,673

Total Raw Sources

Category Count Description
Food Flavors 427,344 Raw source records
Food Brands 4,001,101 Raw product records
Non-Food Brands 26,597,673 Raw source records
Total Raw Records 31,026,118

Phase 2: Automated Processing & Filtering

The 31 million raw source records needed extraction and processing. First, I extracted actual terms from the raw files (pulling food names from nutritional databases, brand names from product catalogs, character names from entertainment datasets). Then I filtered out technical codes, duplicates, and non-food terms. Each category required different processing strategies.

Food Flavors

Starting from 427,344 raw source records, I extracted food-related terms, then filtered to remove technical codes, duplicates, and non-food terms:

Duplicates of TOMATES,GRAPE - filtered out as too many duplicates

For example, USDA entries like Beef, eye of round roast, raw (ER37-R-23) were filtered out as too technical.

Beef, eye of round roast, raw (ER37-R-23) - filtered out as too technical

Filtering out technical codes was straightforward. But duplicates were harder: what even counts as a duplicate?

The Duplicate Problem

Simple flavors like potato and potatoes are obviously duplicates. But what about potato vs. baked potato vs. mashed potato vs. potato salad?

Without clear rules, I might accidentally filter out valid entries that should have been kept.

If Oreo released both Baked Potato Oreo and Mashed Potato Oreo, I think you'd expect them to taste completely different (well... maybe not completely but they'd have a different vibe.) Therefore, the way a food is prepared can change whether it's a duplicate or a unique flavor.

Baked Potato Oreo

Mashed Potato Oreo

Preparation methods aren't the only thing that creates distinct flavors. Complete dishes combine multiple ingredients into something entirely new. Pizza Pringles exist as a product, which means pizza is its own distinct flavor, not just cheese or tomato or bread. A Pizza Oreo would taste different from a Cheese Bread Oreo.

Pizza Pringles

Three-Tier Classification System

The solution was a three-tier classification system:

Tier Description Examples
Base Core ingredients potato, apple, chocolate
Variation Prepared forms mashed potato, apple pie, dark chocolate
Dish Complete dishes potato salad, apple strudel, chocolate cake

With these rules defined, I could now consistently decide what to keep and what to filter. Potato and potatoes are duplicates (keep one). But potato, baked potato, and potato salad are all unique entries across different tiers (keep all three).

Now that we have the correct classification system in place let's see the complete filtering process:

# Step Before After Description
1 Extraction 427,344 309,716 Extracted food terms from raw source files (e.g., pulled food names from 155k USDA nutritional data rows)
2 Bad Entry Filtering 309,716 229,577 Removed technical codes, overly long/short terms, entries with numbers
3 Non-Food Filtering 229,577 228,847 Eliminated non-food terms using keyword patterns
4 Classification 228,847 228,847 Sorted into base/variation/dish categories
5 Deduplication 228,847 66,745 Removed duplicates and normalized variations
Result 427,344 66,745

Food Brands

The 4 million products in Open Food Facts don't represent 4 million unique brands. Many products share the same brand, just in different sizes or formats.

For example, Open Food Facts contains thousands of Coca-Cola products:

  • Coca-Cola Classic 330ml can
  • Coca-Cola Classic 2L bottle
  • Coca-Cola Classic 500ml bottle
  • Coca-Cola Zero 330ml can
  • Coca-Cola Zero 2L bottle

These are all different products, but they don't all represent different flavors. I needed to extract brand variants that actually matter for taste. Coca-Cola Classic and Coca-Cola Zero taste different, so I kept them as separate flavors. But the size (330ml vs 2L) doesn't change the flavor, so I only kept one. Extracting unique brands from 4M products gave me 307,674 brand variants.

Here's the complete filtering process:

# Step Before After Description
1 Extraction 4,001,101 307,674 Extracted unique brand names from product records, deduplicated across products
2 Filtering & Cleaning 307,674 285,441 Removed 22,233 entries: pure punctuation, technical codes, URLs, entries with no letters
3 Deduplication 285,441 282,347 Merged 3,094 spelling variants like "Dr. Pepper" → "Dr Pepper", "Ben and Jerry's" → "Ben & Jerry's"
4 Database Import 282,347 281,996 Final quality filters during import
Result 4,001,101 281,996

Non-Food Brands

The real challenge with non-food brands: what even counts as a flavor? Is Microsoft Excel a flavor? Is Ikea a flavor?

Excel Oreo - too obscure

I focused on recognizability. We don't want entries so obscure that nobody gets the reference. A Stormtrooper who bonked his head in Star Wars A New Hope Oreo might be fun, but it's too specific.

Stormtrooper who bonked his head in Star Wars A New Hope Oreo - too obscure

A Lady Gaga Oreo or Pokémon Oreo makes sense because people actually know them.

This constraint meant aggressive filtering across 26 million raw entries:

# Step Before After Description
1 Extraction & deduplication 26,597,673 150,404 Extracted individual names and deduplicated to unique terms like "Batman", "The Avengers", "Robert Downey Jr."
2 Cleaning & classification 150,404 149,884 Removed invalid entries
3 Deduplication 149,884 131,155 Removed more duplicates
4 Normalization 131,155 130,712 Standardized formats
5 Database import 130,712 130,989 Final import to database
Result 26,597,673 130,989

Total After Processing

Category Before After Reduction
Food Flavors 427,344 66,745 84.4%
Food Brands 4,001,101 281,996 93.0%
Non-Food Brands 26,597,673 130,989 99.5%
Total 31,026,118 479,730 98.5%

Phase 3: Manual Review Interface

After all the automated processing, I still had 479,730 entries that needed manual curation. I built a comprehensive review interface directly into the generator to go through each entry one by one.

Screenshot of the review interface showing approval and flagging system
The manual review interface: approve, reject, or flag for cleanup

Over 3-4 days, I reviewed every entry. Bad entries like Broccoli, raw, OCEAN MIST, 4 bunches; Product of USA (NC1) - NFY0905IX got rejected as too technical. Legitimate flavors like Danny DeVito got approved. Entries with weird formatting got flagged for cleanup.

The combination of automated filtering and manual review reduced the database from 479,730 entries to 404,539 flavors, before adding the dictionary.

Final Flavor Count

Manual review filtered each category:

Category Before After Rejected
Food Flavors 66,745 33,856 32,889
Food Brands 281,996 251,179 30,817
Non-Food Brands 130,989 119,504 11,485
Total 479,730 404,539 75,191

The Dictionary Addition

After reviewing 66,745 food entries, I had 33,856 approved food flavors. But I couldn't shake the feeling I was missing potential flavors.

Somewhere out there was the platonic ideal of an Oreo flavor and I didn't want to miss it.

So I did the only rational thing: I read the dictionary. I processed every entry, filtered for food-related words, and extracted 2,179 additional food terms. Adding these to the existing 33,856 gave me 36,035 final food flavors.

Category Count Description
Food Flavors (After Manual Review) 33,856
Dictionary Addition +2,179 Food-related words from dictionary processing
Final Food Flavors 36,035

Food Flavors breakdown by tier:

Tier Before After Rejected
Base 39,114 24,275 14,839
Variation 9,803 7,449 2,354
Dish 17,828 4,311 13,517
Total Food Flavors 66,745 36,035 30,710

Final Total Across All Categories

Category Count
Food Flavors 36,035
Food Brands 251,179
Non-Food Brands 119,504
Total Final Flavors 406,718

The Combination Logic

With 406,718 total flavor components, I needed to set a reasonable limit on combinations.

The problem: you can't just stack flavors endlessly. Mix too many paint colors together and you get brown. Mix too many flavors and you get the same result: an indistinguishable mess.

A Bacon Mango Doritos Soy Sauce Mint Tom Hanks Pickle Oreo isn't innovative, it's just noise.

Your taste buds can't process that many competing signals. So I capped it at 4 flavors per Oreo max.

The calculation counts all possible Oreos with 1, 2, 3, OR 4 flavors (because an Oreo needs at least one flavor):

k Value Combinations Description
k=1 406,718 Just one flavor
k=2 82,709,562,403 Two flavors combined
k=3 11,213,100,794,099,516 Three flavors combined
k=4 1,140,134,072,368,046,162,485 Four flavors combined
Total 1,140,145,285,551,550,231,122

Total possible Oreo combinations: 1,140,145,285,551,550,231,122

The 4-flavor combinations (k=4) make up 99.9990165% of all possibilities. Each additional slot increases the combinations factorially, which is why the jump from k=3 to k=4 is so massive.

Chart showing exponential growth of combinations from k=1 to k=4
Exponential growth: k=4 combinations dwarf everything else


Building the Generator

Now that I've mapped 1.1 sextillion Oreo combinations, I need to build a generator that lets people actually explore them.

The obvious approach doesn't work: you can't pre-generate and store all combinations. My computer has 1 terabyte of storage. Even at just 1 byte per combination, the total storage needed is:

1,140,145 petabytes
Storage needed at 1 byte per combination

So... obviously we can't do that.

Instead, the generator picks random flavors from the database on demand, calculating combinations in real-time.

The first version was straightforward: SELECT 4 random flavors, display them. Done.

Except it was boring. Without scores, every combination felt equally meaningless.


The Scoring System

I added a scoring system with three components: edibility, harmony, and novelty. Each combination gets rated 0-100% based on how the flavors work together.

Problem solved, right?

Wrong. Most combinations still clustered around mediocre. Strawberry Vanilla Cookie Cream Oreo scores 52%. Chocolate Banana Caramel Mint Oreo scores 48%. The mathematical average of random ingredients is... average.

Nobody wants to browse through endless 45-55% scores. The generator needed something more aggressive.

Flavor Tiers

The scoring system works by classifying every flavor in the database into one of four tiers:

Tier Description Examples
God-Tier Classic dessert ingredients vanilla, chocolate, cookie, peanut butter, caramel
Great Popular fruits and spices strawberry, mint, cinnamon, coffee
Weird Savory foods that could work bacon, pizza, cheese, popcorn
Terrible Things that should never be Oreos soap, garbage, gasoline

Each tier affects how the three components get calculated:

Edibility

Tier Range Notes
God-tier 85-100 Classic dessert flavors
Great 75-95 Popular fruits and spices
Weird 80-100 or 5-35 Unpredictable (20% chance high, 80% chance low)
Terrible 1-15 Should never be Oreos

Harmony

Pairing Type Range Notes
Food-food 40-100 Natural flavor combinations
Food-brand 30-90 Mixed compatibility
Brand-brand 1-80 Wildcards, unpredictable

Novelty

Category Range Notes
Non-food brands 50-100 Highest novelty
Food brands 30-90 Moderate novelty
God-tier foods 5-70 Lowest novelty (too common)

A combination of four god-tier flavors might score high on edibility and harmony but low on novelty. Mix in terrible flavors and you get the opposite: high novelty, terrible edibility.

But most combinations still land around 50%. The breakthrough was an anti-boring algorithm that pushes boring middle scores toward the extremes.

Example of a god-tier high score combination

Anti-Boring Algorithm

When combinations score 45-55%, there's a 40% chance the algorithm pushes them to extremes:

private avoidBoringScores(edibility, harmony, novelty, finalScore, randomSeed) {
  if (finalScore >= 45 && finalScore <= 55) {
    // 60% chance: keep boring score
    // 40% chance: make it exciting
    if (this.seededRandom(randomSeed + 2000) < 0.6) {
      return { edibility, harmony, novelty, finalScore };
    }

    // Push to extreme: 60-90% if above 50, or 10-40% if below
    const targetScore = finalScore > 50
      ? 60 + this.seededRandom(randomSeed + 2001) * 30
      : 10 + this.seededRandom(randomSeed + 2002) * 30;

    // Adjust all three components to hit target
    const adjustment = targetScore - finalScore;
    const adjustmentPerComponent = adjustment / 3;

    return {
      edibility: edibility + adjustmentPerComponent,
      harmony: harmony + adjustmentPerComponent,
      novelty: novelty + adjustmentPerComponent,
      finalScore: (edibility + harmony + novelty) / 3
    };
  }

  return { edibility, harmony, novelty, finalScore };
}
Enter fullscreen mode Exit fullscreen mode

This creates more polarizing results: god-tier (80%+) or terrible (20%-), with fewer "meh" scores.

Example of a terrible low score combination


Using the Generator

The Interface

The UI gives you four empty slots. You drag flavor types (Food, Brand Food, or Non-Food Brand) into each slot. You can fill all four slots, or just one. Click "Generate" and the backend randomly picks flavors matching your constraints from the database, scores them using the tier-based system and anti-boring algorithm, then returns a combination with scores.

Each generated combination shows:

  • Flavor Names: The selected flavors and their types
  • Component Scores: Edibility, Harmony, and Novelty (0-100%)
  • Overall Score: Final rating from 0-100%
  • Analysis Label: "GOD-TIER", "CHAOTIC MESS", etc.

You can copy the result as an image or share it to Twitter/Bluesky.

Screenshot showing the drag-and-drop interface and generated results
Drag flavor types into slots, generate, and see the scored results

Achievements

The generator tracks your session and unlocks achievements:

Milestone:

  • First Bite: Your first generation
  • Flavor Explorer: Generate 10 combinations
  • Combination Master: Generate 100 combinations

Discovery:

  • Reality Check: Match an actual Oreo product
  • Unique Explorer: Discover 100 unique combinations

Special:

  • Perfect Harmony: Score 90% or higher
  • Brave Soul: Score less than 10%
  • Speed Demon: Generate 10+ per minute

Session stats show how many you've generated, your best score, generation rate per minute, and how long you've been generating.

Tech Stack

Backend:

  • NestJS
  • TypeScript
  • SQLite
  • TypeORM for queries

Frontend:

  • Vite
  • React
  • TypeScript
  • Lucide React for icons
  • Canvas API for image generation

The Numbers

Final stats from the actual implementation:

  • 406,718 total flavors in database
  • 1,140,145,285,551,550,231,122 total possible combinations (all 1-4 flavor Oreos)

From 31 million raw records to 1.1 sextillion combinations, the Oreo Generator maps the entire flavor possibility space. Somewhere out there is the Platonic Ideal of an Oreo Flavor, and now you can find it.


Try It Yourself

Try the Oreo Generator yourself or watch the video walkthrough to see the complete story.

Top comments (0)