Session zero

Posted on Mar 7

Korean Fashion Data at Scale: Scraping Musinsa Rankings and Product Details with Python

#webdev #python #data #korea

1. Why Musinsa? The K-Fashion Intelligence Hub

The Platform in Numbers

Musinsa (무신사) isn't just Korea's largest fashion platform — it's the authority for Korean streetwear and contemporary fashion:

22M+ MAU (monthly active users) as of 2025
7,000+ brands listed, from global giants to independent Korean designers
Real-time ranking updated hourly across 100+ categories
#1 fashion destination for Korean consumers aged 15–35
$3B+ GMV annually, growing 30%+ year-over-year

Unlike luxury platforms, Musinsa's sweet spot is accessible fashion — the streetwear, sneakers, and Korean contemporary styles that define what young Koreans are actually wearing. This makes its data uniquely valuable for trend analysis.

Why Global Brands Are Watching

K-fashion has gone global. International buyers at trade shows in Paris and New York now track Musinsa rankings to identify emerging Korean brands before they go mainstream. Brands like Ader Error, Wooyoungmi, and Juun.J — now stocked in Dover Street Market and Selfridges — built their initial following on Musinsa.

For researchers, the platform represents:

A leading indicator of global streetwear trends (what's hot in Seoul today reaches Tokyo in 3 months, Los Angeles in 6)
Price benchmarking data across categories with consistent structure
Brand performance tracking — which Korean labels are gaining traction vs. declining
Demand signals for wholesale and investment decisions

2. What Data is Available on Musinsa

Musinsa structures its data across four main data types, each accessible via its public web interface:

2.1 Ranking Data

The crown jewel of Musinsa for trend researchers. Available at:
https://www.musinsa.com/main/musinsa/ranking

Each ranking entry contains:

- Rank position (current + previous)
- Brand name (Korean + romanized)
- Product name
- Current price (KRW)
- Original price (before discount)
- Discount percentage
- Thumbnail image URL
- Product page URL
- Category tag
- Gender target (men/women/unisex)

Category depth: Rankings are available for 100+ sub-categories including:

Top, Pants, Outerwear, Shoes, Accessories
Brand-specific rankings (Top 100 by brand)
Age/gender filtered rankings

2.2 Product Detail Pages

Individual product pages at https://www.musinsa.com/app/goods/{product_id} contain:

- Full product description
- All price variants (size/color options)
- Stock availability per variant
- Accumulated sales count ("판매량")
- Review count + average rating
- Brand information + brand store link
- Related products
- Size guide
- Material composition
- Origin (Korean domestic vs. import)

2.3 Brand Information

Each brand has a dedicated store page at https://www.musinsa.com/store/{brand_id}:

- Brand name + logo
- Brand description + founding year
- Total product count
- Follower count (brand popularity signal)
- Brand ranking position
- Representative product images
- Price range (min-max)

2.4 Search Results

Musinsa's search endpoint supports rich filtering:

- Keyword search
- Category filter
- Price range filter
- Brand filter  
- Gender filter
- Sort by: popularity, newest, price (asc/desc), review count

3. Technical Approach: Why Musinsa is Scraper-Friendly

Unlike many e-commerce platforms that use heavy client-side JavaScript rendering and aggressive bot detection, Musinsa takes a different approach.

Server-Side Rendering (SSR)

Musinsa's product and ranking pages are primarily server-side rendered (SSR). This means:

Product data is embedded in the initial HTML response — no need to execute JavaScript
Standard HTTP requests with appropriate headers return complete page content
Playwright or Selenium are optional, not required for most use cases
Response times are fast (< 1 second per page)

Verification (tested 2026-03-04):

GET https://www.musinsa.com/main/musinsa/ranking
→ HTTP 200 OK (SSR HTML, ranking data in page source)

Low Bot Detection

Compared to platforms like Coupang (which uses Akamai Bot Manager and returns 403 for basic requests), Musinsa's bot detection is relatively permissive:

No CAPTCHA challenges for standard browsing patterns
Rate limiting is loose (reasonable request intervals are sufficient)
Standard requests library with realistic headers works for most pages
Proxy rotation recommended for large-scale extraction but not required for testing

Recommended Stack

# For ranking + product pages (SSR)
requests + BeautifulSoup (Cheerio equivalent in Python)

# For search with AJAX pagination (if needed)
Playwright (optional fallback)

# For large scale
Apify SDK + Apify Proxy (automatic IP rotation)

4. Quick Start with the Apify Actor

The fastest way to get Musinsa data is the Musinsa Scraper Apify actor — no code required.

→ Try it now: https://apify.com/oxygenated_quagmire/musinsa-scraper

(Actor #13 — link will be active once development is complete)

What the Actor Does

The actor handles all the complexity for you:

Pagination through ranking pages (1,000+ items across all categories)
Product detail enrichment (click-through to individual pages)
Consistent JSON output structure
Automatic retry on transient failures
IP rotation via Apify Proxy

Input Schema

{
  "mode": "RANKING",
  "category": "all",
  "gender": "all",
  "maxItems": 100
}

Mode options:

RANKING — Extract current ranking by category
PRODUCT — Scrape specific product pages
BRAND — Extract brand information + product list
SEARCH — Keyword search with filters

Example Output (Ranking Mode)

[
  {
    "rank": 1,
    "previousRank": 3,
    "rankChange": 2,
    "productId": "1234567",
    "productName": "오버핏 워싱 데님 재킷",
    "productNameEn": "Oversized Washed Denim Jacket",
    "brandName": "커버낫",
    "brandNameEn": "Covernat",
    "currentPrice": 129000,
    "originalPrice": 179000,
    "discountRate": 27,
    "currency": "KRW",
    "category": "아우터",
    "gender": "unisex",
    "imageUrl": "https://image.musinsa.com/...",
    "productUrl": "https://www.musinsa.com/app/goods/1234567",
    "scrapedAt": "2026-03-04T15:00:00+09:00"
  }
]

Pricing

$0.50 per 1,000 items (Pay-per-result)
Free tier: $5/month credit = 10,000 items free
Typical ranking run (TOP 100 all categories): ~$0.05

5. Full Python Pipeline

For developers who want full control, here's a complete pipeline that extracts Musinsa rankings and enriches them with product details.

Part 1: Ranking Extractor

import requests
from bs4 import BeautifulSoup
import json
import time
from datetime import datetime

class MusinsaRankingScraper:
    def __init__(self):
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
            'Accept-Language': 'ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7',
            'Referer': 'https://www.musinsa.com/',
        })
        self.base_url = "https://www.musinsa.com/main/musinsa/ranking"

    def get_ranking(self, category="all", gender="all", page=1):
        """Extract ranking page for given category and gender."""
        params = {
            'storeCode': 'musinsa',
            'categoryCode': category,
            'gf': gender,
            'page': page
        }

        try:
            response = self.session.get(self.base_url, params=params, timeout=10)
            response.raise_for_status()
            return self._parse_ranking_page(response.text)
        except requests.RequestException as e:
            print(f"Request failed: {e}")
            return []

    def _parse_ranking_page(self, html):
        """Parse ranking HTML and extract product data."""
        soup = BeautifulSoup(html, 'html.parser')
        products = []

        # Find all ranking list items
        items = soup.select('li.list-item')

        for item in items:
            try:
                rank_elem = item.select_one('.rank-num')
                product_elem = item.select_one('.list-item__name')
                brand_elem = item.select_one('.list-item__brand')
                price_elem = item.select_one('.price')
                original_price_elem = item.select_one('.price-original')
                discount_elem = item.select_one('.discount-rate')
                link_elem = item.select_one('a[href*="/app/goods/"]')
                img_elem = item.select_one('img.lazy')

                if not all([rank_elem, product_elem, brand_elem]):
                    continue

                # Extract product ID from URL
                product_url = link_elem['href'] if link_elem else ''
                product_id = product_url.split('/')[-1].split('?')[0] if product_url else ''

                # Parse prices
                current_price = self._parse_price(price_elem.text if price_elem else '0')
                original_price = self._parse_price(original_price_elem.text if original_price_elem else '0')
                discount_rate = int(discount_elem.text.replace('%', '').strip()) if discount_elem else 0

                product = {
                    'rank': int(rank_elem.text.strip()),
                    'productId': product_id,
                    'productName': product_elem.text.strip(),
                    'brandName': brand_elem.text.strip(),
                    'currentPrice': current_price,
                    'originalPrice': original_price or current_price,
                    'discountRate': discount_rate,
                    'currency': 'KRW',
                    'productUrl': f"https://www.musinsa.com{product_url}" if product_url.startswith('/') else product_url,
                    'imageUrl': img_elem.get('data-original', '') if img_elem else '',
                    'scrapedAt': datetime.now().isoformat()
                }
                products.append(product)

            except (AttributeError, ValueError) as e:
                print(f"Parse error for item: {e}")
                continue

        return products

    def _parse_price(self, price_str):
        """Convert price string like '129,000원' to integer."""
        return int(''.join(filter(str.isdigit, price_str))) if price_str else 0

    def get_full_ranking(self, category="all", gender="all", max_pages=5):
        """Extract multiple pages of rankings."""
        all_products = []

        for page in range(1, max_pages + 1):
            products = self.get_ranking(category=category, gender=gender, page=page)

            if not products:
                break

            all_products.extend(products)
            print(f"  Page {page}: {len(products)} products extracted")
            time.sleep(0.5)  # Polite delay

        return all_products


# Usage
scraper = MusinsaRankingScraper()
ranking = scraper.get_full_ranking(category="all", gender="all", max_pages=3)
print(f"Total: {len(ranking)} products extracted")

Part 2: Product Detail Enricher

class MusinsaProductScraper:
    def __init__(self):
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language': 'ko-KR,ko;q=0.9',
        })

    def get_product_details(self, product_id):
        """Extract detailed information for a specific product."""
        url = f"https://www.musinsa.com/app/goods/{product_id}"

        try:
            response = self.session.get(url, timeout=10)
            response.raise_for_status()
            return self._parse_product_page(response.text, product_id)
        except requests.RequestException as e:
            print(f"Failed to fetch product {product_id}: {e}")
            return None

    def _parse_product_page(self, html, product_id):
        """Parse product detail page."""
        soup = BeautifulSoup(html, 'html.parser')

        # Extract JSON-LD if available (structured data)
        json_ld = soup.find('script', type='application/ld+json')
        structured_data = {}
        if json_ld:
            try:
                structured_data = json.loads(json_ld.string)
            except json.JSONDecodeError:
                pass

        # Extract key product info
        title_elem = soup.select_one('.product_title')
        brand_elem = soup.select_one('.product_title .brand')
        review_count_elem = soup.select_one('.review-count')
        rating_elem = soup.select_one('.star-rating-text')
        sales_elem = soup.select_one('.sales-count')
        description_elem = soup.select_one('#product_info')

        return {
            'productId': product_id,
            'fullTitle': title_elem.text.strip() if title_elem else '',
            'brandName': brand_elem.text.strip() if brand_elem else '',
            'reviewCount': self._extract_number(review_count_elem.text if review_count_elem else '0'),
            'averageRating': float(rating_elem.text.strip()) if rating_elem else 0.0,
            'salesCount': self._extract_number(sales_elem.text if sales_elem else '0'),
            'description': description_elem.text.strip()[:500] if description_elem else '',
            'structuredData': structured_data,
            'detailScrapedAt': datetime.now().isoformat()
        }

    def _extract_number(self, text):
        """Extract integer from text like '1,234개'."""
        digits = ''.join(filter(str.isdigit, text))
        return int(digits) if digits else 0

    def enrich_rankings(self, ranking_products, max_products=50, delay=0.3):
        """Enrich ranking data with product details."""
        product_scraper = MusinsaProductScraper()
        enriched = []

        for i, product in enumerate(ranking_products[:max_products]):
            product_id = product.get('productId')
            if not product_id:
                enriched.append(product)
                continue

            print(f"  [{i+1}/{min(len(ranking_products), max_products)}] Enriching product {product_id}...")
            details = product_scraper.get_product_details(product_id)

            if details:
                merged = {**product, **details}
                enriched.append(merged)
            else:
                enriched.append(product)

            time.sleep(delay)

        return enriched


# Full pipeline
ranking_scraper = MusinsaRankingScraper()
product_scraper = MusinsaProductScraper()

# Step 1: Get rankings
print("Extracting Musinsa TOP 100 rankings...")
ranking = ranking_scraper.get_full_ranking(max_pages=2)

# Step 2: Enrich top 20 with details
print(f"\nEnriching top 20 products with details...")
enriched = product_scraper.enrich_rankings(ranking, max_products=20)

# Step 3: Save to file
with open('musinsa_ranking_enriched.json', 'w', encoding='utf-8') as f:
    json.dump(enriched, f, ensure_ascii=False, indent=2)

print(f"\n✅ Complete! {len(enriched)} products saved to musinsa_ranking_enriched.json")

Part 3: Scheduled Price Monitor

import schedule
import time
import csv
from pathlib import Path

def monitor_price_changes(product_ids, output_file='musinsa_price_history.csv'):
    """Track price changes over time for specific products."""

    scraper = MusinsaProductScraper()
    fieldnames = ['timestamp', 'productId', 'productName', 'brandName', 
                  'currentPrice', 'originalPrice', 'discountRate']

    # Initialize CSV if not exists
    if not Path(output_file).exists():
        with open(output_file, 'w', newline='', encoding='utf-8') as f:
            writer = csv.DictWriter(f, fieldnames=fieldnames)
            writer.writeheader()

    print(f"Checking prices for {len(product_ids)} products...")
    rows = []

    for product_id in product_ids:
        details = scraper.get_product_details(product_id)
        if details:
            rows.append({
                'timestamp': datetime.now().isoformat(),
                'productId': product_id,
                'productName': details.get('fullTitle', ''),
                'brandName': details.get('brandName', ''),
                'currentPrice': details.get('currentPrice', 0),
                'originalPrice': details.get('originalPrice', 0),
                'discountRate': details.get('discountRate', 0),
            })
        time.sleep(0.3)

    with open(output_file, 'a', newline='', encoding='utf-8') as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writerows(rows)

    print(f"✅ {len(rows)} records saved at {datetime.now().strftime('%Y-%m-%d %H:%M')}")


# Watch list — top Korean streetwear items
WATCH_LIST = ['1234567', '2345678', '3456789']  # Replace with actual product IDs

# Run every 6 hours
schedule.every(6).hours.do(monitor_price_changes, product_ids=WATCH_LIST)

# Run once immediately
monitor_price_changes(WATCH_LIST)

# Keep running
while True:
    schedule.run_pending()
    time.sleep(60)

6. Real Use Cases

Use Case 1: Fashion Trend Intelligence for Brands

The Problem: A European fashion brand wants to understand which silhouettes, colors, and styles are trending in Korea before deciding on their next collection direction.

The Approach:

Extract Musinsa TOP 200 rankings weekly across all categories
Parse product names and descriptions to identify recurring style keywords
Track which brands/products are entering vs. leaving the top 100
Correlate ranking movements with seasonal events and Korean fashion weeks

Sample Analysis:

import pandas as pd
from collections import Counter

# Load weekly ranking snapshots
df = pd.read_json('musinsa_weekly_rankings.json')

# Keyword frequency in product names
all_words = ' '.join(df['productName'].tolist()).split()
word_freq = Counter(all_words)

# Trending brands (appeared in TOP 100 for first time this week)
new_entries = df[df['previousRank'] > 100]
print("New entrants to TOP 100:", new_entries['brandName'].value_counts().head(10))

# Average discount rate by category
print(df.groupby('category')['discountRate'].mean().sort_values(ascending=False))

Business Value: Validated $15K+ trend research reports replaced by automated weekly pipeline costing ~$2/week.

Use Case 2: Price Tracking and Discount Alert System

The Problem: A Korean fashion reseller wants to buy Musinsa items during sales and resell them internationally at a profit.

The Approach:

Define a watchlist of 200 products across categories
Monitor prices every 6 hours
Alert via Telegram/Slack when discount rate exceeds threshold (e.g., > 40%)
Track price history to identify seasonal patterns (biggest sales: 11.11, Chuseok, end-of-season)

Discount Alert Bot:

import httpx
import asyncio

async def check_discounts(watchlist, threshold=40):
    """Alert on deep discounts."""
    scraper = MusinsaProductScraper()
    alerts = []

    for product_id in watchlist:
        product = scraper.get_product_details(product_id)
        if product and product.get('discountRate', 0) >= threshold:
            alerts.append({
                'product': product['fullTitle'],
                'brand': product['brandName'],
                'original': product['originalPrice'],
                'sale': product['currentPrice'],
                'discount': product['discountRate']
            })

    if alerts:
        message = "🔥 Musinsa Deep Discount Alert!\n\n"
        for a in alerts:
            message += f"• {a['brand']} — {a['product']}\n"
            message += f"  ₩{a['original']:,} → ₩{a['sale']:,} ({a['discount']}% off)\n\n"
        # Send to Telegram, Slack, etc.
        print(message)

    return alerts


asyncio.run(check_discounts(WATCH_LIST, threshold=40))

Business Value: Automated deal detection replacing 2-3 hours of daily manual monitoring.

Use Case 3: Brand Benchmarking for K-Fashion Labels

The Problem: An emerging Korean fashion brand (50K Instagram followers) wants to understand how they rank against competitors and where their pricing sits in the market.

The Approach:

Extract all products for 10 competitor brands via brand scraper
Compare price distributions, discount frequencies, and product category mix
Track ranking trajectories over 30, 60, 90 days
Identify which product types drive the highest rankings for competitors

Competitive Dashboard Data:

# Brand comparison matrix
brands = ['covernat', 'apair', 'nnine', 'solew']  # Musinsa brand IDs
brand_data = {}

for brand_id in brands:
    products = scraper.get_brand_products(brand_id)
    brand_data[brand_id] = {
        'product_count': len(products),
        'avg_price': sum(p['currentPrice'] for p in products) / len(products),
        'top_ranked': min(p['rank'] for p in products if p.get('rank')),
        'categories': Counter(p['category'] for p in products).most_common(3)
    }

df = pd.DataFrame(brand_data).T
print(df.to_markdown())

Business Value: Replaces $5K/month brand intelligence subscription with automated scraper.

Use Case 4: Inventory and Restocking Monitor

The Problem: A Musinsa reseller needs to know when out-of-stock items (especially limited edition drops) come back in stock.

The Approach:

Check product pages for stock status every 15 minutes during anticipated restocking windows
Monitor new arrivals for specific brands during Musinsa's "New Season" drops
Automate purchase triggers (with care for ToS compliance)

Restocking Notifier:

def check_restock(product_id, target_sizes=['M', 'L', 'XL']):
    """Check if a sold-out product is back in stock."""
    product = scraper.get_product_details(product_id)

    if not product:
        return False

    available_sizes = product.get('availableSizes', [])
    restocked = [s for s in target_sizes if s in available_sizes]

    if restocked:
        print(f"🚨 RESTOCKED: {product['fullTitle']}")
        print(f"   Available in: {', '.join(restocked)}")
        return True
    return False


# Check every 10 minutes for limited items
schedule.every(10).minutes.do(check_restock, 
    product_id='LIMITED_EDITION_ID', 
    target_sizes=['M', 'L'])

Business Value: Automated monitoring replacing manual F5-refreshing during limited drops.

7. What's Next: The Korean E-Commerce Data Ecosystem

Musinsa is the fashion layer of a rich Korean e-commerce data ecosystem. Once you've built your fashion intelligence pipeline, the natural extensions are:

Connect to Price Comparison Platforms

Track the same products across Musinsa, 29CM, and W Concept to identify price differentials and arbitrage opportunities. Our naver-place-search scraper can help you find offline retail locations carrying the same brands.

Layer in Consumer Sentiment

Combine Musinsa ranking data with our Naver Blog Reviews scraper to understand why products rank highly — not just that they do. Korean fashion bloggers write extensively about Musinsa purchases.

Track Cultural Context with Naver News

Fashion trends don't happen in a vacuum. Use the Naver News scraper to correlate Musinsa ranking movements with fashion coverage, influencer mentions, and K-drama costume placements.

Build the K-Commerce Intelligence Stack

For full market coverage:

Data Need	Apify Actor
Fashion rankings & products	Musinsa Scraper ← This guide
Fashion & lifestyle reviews	Naver Place Reviews
Consumer opinion & blog posts	Naver Blog Reviews
News & brand mentions	Naver News Scraper
K-pop merch demand	Melon Chart Scraper
C2C fashion resale market	Bunjang Market Scraper
Book/entertainment tie-ins	YES24 Book Scraper

The full stack gives you a 360° view of Korean consumer behavior — from what they're listening to (Melon) to what they're reading about (Naver News) to what they're buying (Musinsa, Bunjang).

Wrapping Up

Musinsa's combination of SSR rendering, rich structured data, and relatively low bot detection makes it one of the most accessible major Korean e-commerce platforms for data extraction. With a ~$0.50/1K items price point via the Apify actor, you can run a full weekly ranking extraction for less than $0.10.

Whether you're a trend researcher, price intelligence analyst, or brand strategist, Musinsa data gives you a direct window into the pulse of Korean fashion — in real time.

Get started:

Apify Actor: https://apify.com/oxygenated_quagmire/musinsa-scraper (coming soon)
Full Python code: All snippets in this article are available as a GitHub Gist (link after publish)
Korean scraping series: Naver Place Reviews | Melon Chart | Daangn Market

All code examples are for educational purposes. Always review a platform's Terms of Service and robots.txt before scraping. Use reasonable request delays and avoid overloading servers.

Part of the "Korean Web Data" series — scraping guides for Korea's unique digital ecosystem.

DEV Community

Korean Fashion Data at Scale: Scraping Musinsa Rankings and Product Details with Python

1. Why Musinsa? The K-Fashion Intelligence Hub

The Platform in Numbers

Why Global Brands Are Watching

2. What Data is Available on Musinsa

2.1 Ranking Data

2.2 Product Detail Pages

2.3 Brand Information

2.4 Search Results

3. Technical Approach: Why Musinsa is Scraper-Friendly

Server-Side Rendering (SSR)

Low Bot Detection

Recommended Stack

4. Quick Start with the Apify Actor

What the Actor Does

Input Schema

Example Output (Ranking Mode)

Pricing

5. Full Python Pipeline

Part 1: Ranking Extractor

Part 2: Product Detail Enricher

Part 3: Scheduled Price Monitor

6. Real Use Cases

Use Case 1: Fashion Trend Intelligence for Brands

Use Case 2: Price Tracking and Discount Alert System

Use Case 3: Brand Benchmarking for K-Fashion Labels

Use Case 4: Inventory and Restocking Monitor

7. What's Next: The Korean E-Commerce Data Ecosystem

Connect to Price Comparison Platforms

Layer in Consumer Sentiment

Track Cultural Context with Naver News

Build the K-Commerce Intelligence Stack

Wrapping Up

Top comments (0)