Tagg

Posted on Apr 4

I Added Regulation Data From 10 Countries to My Cosmetic Ingredients API — Here's What I Found

#python #api #data #webdev

The Backstory

A few weeks ago, I launched a REST API for Korean cosmetic ingredients — 21,000+ ingredients from official Korean government sources, searchable by INCI name, CAS number, and Korean name.

(Original post here)

The API worked. But I kept getting the same question from the cosmetic industry people I talked to:

"Cool, but what about EU regulations? What about China?"

Fair point. If you're formulating cosmetics for global markets, knowing that an ingredient is restricted in Korea is only part of the puzzle. You need to know if it's also banned in the EU, restricted in China, or flagged in ASEAN.

So I went down the rabbit hole.

What I Found: A Hidden Goldmine

Korea's Ministry of Food and Drug Safety (MFDS) doesn't just track Korean regulations. Their database at nedrug.mfds.go.kr contains regulation data for 10 countries, all in one place:

Country	Records
EU	5,301
ASEAN	4,843
China	4,145
South Korea	4,046
Brazil	4,022
Argentina	4,022
Taiwan	2,137
Canada	1,947
Japan	386
USA	111

30,960 regulation records total. Each one tells you whether an ingredient is prohibited or restricted in that country, with detailed conditions — concentration limits, product type restrictions, and usage warnings.

The catch? The data is:

Embedded in HTML pages as JavaScript JSON arrays
Behind a Korean-language interface
Spread across 7,257 individual pages
No API, no download button

Sound familiar?

The Scraping Challenge

7,257 pages. One request at a time.

The MFDS detail pages have an interesting structure. Each page contains a JavaScript variable called arCountry — a JSON array with all the regulation data for that ingredient, across all countries. No AJAX calls needed. One page request = all countries.

But there's a catch within the catch: some ingredients have both restricted and prohibited data, stored in an if/else branch in the JavaScript. A naive regex extraction misses half the data. I had to write a bracket-depth counter to properly extract both arrays.

def extract_json_array(text, start_pos):
    """Bracket counting instead of regex — 
       handles nested brackets in JSON strings"""
    bracket_start = text.index('[', start_pos)
    depth = 0
    for i in range(bracket_start, len(text)):
        if text[i] == '[':
            depth += 1
        elif text[i] == ']':
            depth -= 1
            if depth == 0:
                return text[bracket_start:i + 1]
    return None

Small bug, big difference: one ingredient went from 0 regulation records to 5 after fixing this.

Verifying the Data

Here's the thing about scraping government data from one country about other countries: how do you know it's accurate?

I cross-checked our EU data against the CosIng database — the EU's official cosmetic ingredient database. CosIng publishes their Annex II (prohibited) and Annex III (restricted) lists as downloadable CSVs.

Verification results:

Metric	Result
Total EU records from MFDS	5,248
Matched against CosIng (by CAS number + name)	4,693 (89.4%)
Regulation type accuracy	99.2%
Type mismatches	38

The 38 mismatches weren't errors — they were edge cases where an ingredient is prohibited when used as hair dye but restricted for other uses. Different classification logic, same underlying data.

Good enough to ship.

The New API (v3.0.0)

New endpoint: `/v1/ingredient/{code}/regulations`

import requests

response = requests.get(
    "https://k-beauty-cosmetic-ingredients.p.rapidapi.com/v1/ingredient/9/regulations",
    params={"country": "EU"},
    headers={
        "X-RapidAPI-Key": "YOUR_API_KEY",
        "X-RapidAPI-Host": "k-beauty-cosmetic-ingredients.p.rapidapi.com"
    }
)

print(response.json())

Response:

{
  "success": true,
  "ingredient": {
    "code": 9,
    "kr_name": "리날룰",
    "inci_name": "Linalool"
  },
  "count": 1,
  "available_countries": ["한국", "EU"],
  "data": [
    {
      "country": "EU",
      "regulate_type": "제한",
      "notice_ingr_name": "1,6-Octadien-3-ol, 3,7-dimethyl-",
      "limit_condition": null,
      "source_type": "limit"
    }
  ]
}

One API call. One ingredient. Regulations across multiple countries. No PDF digging, no Google Translate, no guesswork.

Country access by tier

Not everyone needs all 10 countries. So I tiered it:

Tier	Price	Countries	Monthly Requests
BASIC	Free	Ingredients only (no regulations)	100
PRO	$29	South Korea, EU	2,000
ULTRA	$79	+ China, USA, Japan, ASEAN	5,000
MEGA	$199	All 10 countries	15,000

Interesting Findings From the Data

After collecting all 30,960 regulation records, some patterns jumped out:

1. The EU bans the most ingredients

EU leads with 5,301 regulation records. They're the strictest regulatory body for cosmetics — many other countries reference EU decisions when updating their own lists.

2. "Prohibited" doesn't always mean "dangerous"

Some ingredients are prohibited in cosmetics simply because they're classified as pharmaceuticals. Not because they're toxic — because they're too effective and fall under drug regulation instead.

3. The same ingredient, different rules everywhere

Take silver compounds: restricted in Canada (allowed in mouthwash up to 0.04%), but prohibited in the EU when in nano form. A global cosmetic brand needs to track these differences per-market.

4. Most MFDS regulated substances aren't in the KCIA ingredient dictionary

Only 1,269 out of 7,257 MFDS regulated substances matched KCIA ingredients by CAS number. The rest are chemical substances that are banned from cosmetics — they were never cosmetic ingredients to begin with.

Technical Decisions

Why not add all MFDS substances to the main ingredients table?

KCIA tracks what can be used in cosmetics. MFDS tracks what can't be (or has conditions). Mixing them would pollute the ingredient search results with non-cosmetic chemicals. Instead, I kept them in a separate regulations table, linked by CAS number where possible.

Why SQLite, still?

Added 30,960 rows to the existing 21,796-ingredient database. SQLite handles it fine — the regulations table has indexes on ingredient_code, country, and regulate_type. Query time is still under 50ms.

The rate limiting rabbit hole

I wanted per-tier rate limits (BASIC: 10/min, MEGA: 40/min). Turns out the Python slowapi library doesn't support dynamic rate limits based on request context. The decorator function gets called without access to the request object.

Solution: two-layer approach. slowapi handles the global ceiling (40/min), and a custom in-memory counter in the middleware enforces per-tier limits after the tier is detected from the subscription header.

What's Next

Automated weekly updates — KCIA change detection is already built, MFDS full re-scrape takes ~10 hours
API name/description SEO optimization on RapidAPI
More search filters for the regulations endpoint

Try It

The API is live on RapidAPI with a free tier.

🔗 K-Beauty Cosmetic Ingredients API

📂 GitHub - Example Code

If you're building cosmetic tech, regulatory tools, or just curious about what's actually in your skincare products across different countries — give it a shot.

Questions? Drop a comment below.

DEV Community

I Added Regulation Data From 10 Countries to My Cosmetic Ingredients API — Here's What I Found

The Backstory

What I Found: A Hidden Goldmine

The Scraping Challenge

7,257 pages. One request at a time.

Verifying the Data

Verification results:

The New API (v3.0.0)

New endpoint: `/v1/ingredient/{code}/regulations`

Country access by tier

Interesting Findings From the Data

1. The EU bans the most ingredients

2. "Prohibited" doesn't always mean "dangerous"

3. The same ingredient, different rules everywhere

4. Most MFDS regulated substances aren't in the KCIA ingredient dictionary

Technical Decisions

Why not add all MFDS substances to the main ingredients table?

Why SQLite, still?

The rate limiting rabbit hole

What's Next

Try It

Top comments (0)

The Backstory

What I Found: A Hidden Goldmine

The Scraping Challenge

7,257 pages. One request at a time.

Verifying the Data

Verification results:

The New API (v3.0.0)

New endpoint: /v1/ingredient/{code}/regulations

Country access by tier

Interesting Findings From the Data

1. The EU bans the most ingredients

2. "Prohibited" doesn't always mean "dangerous"

3. The same ingredient, different rules everywhere

4. Most MFDS regulated substances aren't in the KCIA ingredient dictionary

Technical Decisions

Why not add all MFDS substances to the main ingredients table?

Why SQLite, still?

The rate limiting rabbit hole

What's Next

Try It

New endpoint: `/v1/ingredient/{code}/regulations`