I Built a Cosmetic Ingredients Dataset — Here’s What I Found

#api #discuss #architecture #showdev

I’ve been working on structuring cosmetic ingredient data into a JSON dataset.

What started as simple organization quickly turned into something more interesting, the data doesn’t just describe ingredients, it starts to describe systems.

Instead of isolated entries, each ingredient carries multiple layers:

chemical class
functional role
behavior inside a formulation
interactions with other ingredients

For example:

{
"name": "Coconut Oil",
"chemical_class": "Triglyceride",
"functional_class": "Soap precursor",
"system_behavior": {
"lather": "Rapid formation",
"cleansing": "Strong oil interaction"
},
"interactions": {
"reacts_with": ["sodium-hydroxide"],
"balanced_by": ["olive-oil"],
"affected_by": ["hard-water"]
}
}

At some point, it stops looking like individual ingredients and starts looking like a repeatable system design.

When you start looking at data like this, a pattern emerges:

Many formulations are not built from scratch, they follow repeating structural logic.

The same types of ingredients appear again and again, not because of branding, but because they fulfill specific roles within a system.

It starts to look less like a list of ingredients… and more like a designed architecture.

To explore this further, I built a structured dataset that maps these relationships and behaviors:

https://cleanformulation.com/data/ingredients-dataset.json

Full repository:
https://github.com/surayapa/cleanformulation

What I’m trying to understand now is this:

If we already know the chemical class, functional role, and interaction behavior of ingredients —

can we model or predict formulation performance before anything is physically created?

Or is real-world behavior still too complex for structured data to capture?

DEV Community

I Built a Cosmetic Ingredients Dataset — Here’s What I Found

Top comments (0)