Session zero

Posted on Mar 7

Korea's Largest Bookstore Online — How to Scrape YES24 Bestsellers with Python

#books #data #python #korea

Draft Date: 2026-03-03

Status: DRAFT — 게시 금지 (파일 저장만)

Target Platform: Medium (The Startup 또는 Towards Data Science)

Target Audience: 출판 업계 분석가, 트렌드 리서처, K-문학 관심 개발자, 번역 에이전트

Actor URL: https://apify.com/oxygenated_quagmire/yes24-book-scraper

Introduction

Every week, millions of South Koreans decide what to read next — and most of them check YES24 first.

YES24 is South Korea's largest online bookstore. It's not just a retailer; it's a cultural barometer. The books that climb its bestseller charts reflect what a society of 52 million people is thinking about, worrying about, and curious about. Self-help surges before exam season. Political memoirs spike after elections. English-language study guides never leave the top 20.

For publishers, literary agents, translators, and data analysts, this creates an extraordinary opportunity: Korea's reading habits are publicly visible, updated in real-time, and organized into 13 clean categories — from literature to science to cooking.

There's just one problem: YES24 has no public API.

The YES24 Book Scraper on Apify solves this. Built on YES24's SSR-rendered HTML and JSON-LD structured data, it extracts clean book data across three modes — bestseller charts, keyword search, and individual book details — without requiring a full browser automation stack.

Why YES24 Data is Valuable

Korea's Reading Economy in Numbers

YES24 commands roughly 40-45% of South Korea's online book retail market (2025 estimates). In a country where physical bookstores have declined sharply, YES24 is where the market reveals itself:

13 bestseller categories updated weekly (Literature, Economy/Business, Self-Help, Children's, Comics, Science, History, Arts, Religion, Foreign Language, Cooking, Travel, and more)
100,000+ new titles/year listed
Customer reviews with star ratings on most titles
Sales rank updates continuously for top-ranked books

Beyond commerce, YES24 data is a cultural intelligence layer:

Which genres are growing vs. declining in Korea?
What does Korea's business reading list say about economic sentiment?
Which Korean titles might be candidates for international licensing?
How does a new Korean author's book rank against established names in its genre?

The Translation and Publishing Opportunity

For international publishers and literary agents, YES24 bestseller data is particularly valuable:

Rights acquisition intelligence: Korean books that sustain Top 10 positions for 4+ weeks in Fiction or Non-Fiction are proven commodities. Many become candidates for translation rights deals — but identifying them early requires monitoring the charts consistently.

Genre trend arbitrage: Certain genres that dominate Korean bestseller lists often presage global trends. Korean self-help and personal finance books began their global crossover (think The Courage to Be Disliked) well before Western publishers noticed. The data was always there.

K-content ecosystem alignment: As Korean films, dramas, and music achieve global reach, books by the same authors or on the same themes often lag 12-18 months in international attention. Early chart monitoring = early rights opportunities.

Why Standard Tools Fail

No public API — YES24 doesn't expose sales rank data via API
SSR rendering — Unlike SPAs, YES24 uses server-side rendering, making the HTML parseable — but the structure changes frequently enough that maintaining a custom scraper is brittle
JSON-LD structured data — YES24 embeds rich structured data for SEO purposes, which the actor exploits for clean extraction
Volume — Monitoring 13 categories weekly, with 50-100 books each, is too large for manual tracking

The Solution: YES24 Book Scraper on Apify

The YES24 Book Scraper is a cloud-hosted Apify actor with three operational modes designed for different research needs.

Three Operating Modes

1. Bestseller Mode — Pull the current bestseller chart for any of 13 categories

{
  "mode": "bestseller",
  "category": "literature",
  "limit": 50
}

2. Search Mode — Search the YES24 catalog by keyword

{
  "mode": "search",
  "keyword": "인공지능",
  "limit": 100
}

3. Detail Mode — Extract full metadata for a specific book by URL

{
  "mode": "detail",
  "bookUrl": "https://www.yes24.com/Product/Goods/12345678"
}

What You Get

For each book, the actor extracts:

{
  "rank": 1,
  "title": "불편한 편의점",
  "author": "김호연",
  "publisher": "나무옆의자",
  "publishedDate": "2021-04-20",
  "isbn": "9791190932264",
  "price": 14000,
  "discountedPrice": 12600,
  "rating": 4.8,
  "reviewCount": 12847,
  "description": "따뜻한 위로와 공감의 이야기...",
  "category": "소설",
  "rank": 1,
  "coverImageUrl": "https://image.yes24.com/goods/...",
  "bookUrl": "https://www.yes24.com/Product/Goods/...",
  "tags": ["소설", "한국소설", "베스트셀러"],
  "scraped_at": "2026-03-03T06:00:00+09:00"
}

Available Categories (13)

Category ID	Korean Name	Description
`literature`	소설/시/희곡	Fiction, Poetry, Drama
`economy`	경제경영	Business, Economics
`self_help`	자기계발	Self-Help, Personal Development
`children`	어린이	Children's Books
`comics`	만화	Comics, Manga
`science`	과학	Science, Technology
`history`	역사	History, Archaeology
`arts`	예술	Arts, Music, Film
`religion`	종교	Religion, Philosophy
`foreign_lang`	외국어	Language Learning
`cooking`	가정/요리	Cooking, Lifestyle
`travel`	여행	Travel, Geography
`teens`	청소년	Young Adult

Step-by-Step: How to Use the YES24 Book Scraper

Step 1: Create an Apify Account

Go to apify.com and sign up (free tier available)
Free tier includes $5/month credit — sufficient for 10,000+ book records
No credit card required to start

Step 2: Open the Actor

Navigate to:

👉 https://apify.com/oxygenated_quagmire/yes24-book-scraper

Click "Try for free" to open the input console.

[Screenshot: Actor page with "YES24 Book Scraper" title, 3-mode selector, category dropdown]

Step 3: Configure Your Query

For a weekly bestseller snapshot across all business books:

{
  "mode": "bestseller",
  "category": "economy",
  "limit": 100
}

For tracking a specific topic across the catalog:

{
  "mode": "search",
  "keyword": "ChatGPT 활용",
  "limit": 200
}

Key configuration fields:

Field	Description	Default
`mode`	`bestseller`, `search`, or `detail`	Required
`category`	Category for bestseller mode	`literature`
`keyword`	Search term for search mode	Required in search mode
`limit`	Max number of books to return	100
`bookUrl`	Direct URL for detail mode	Required in detail mode

[Screenshot: Input console showing mode selector set to "bestseller" and category set to "economy"]

Step 4: Run and Export

Click "Start" — runs on Apify cloud infrastructure in Seoul region
Monitor progress in the Live Log tab
Typical runtime: ~30 seconds for 100 books
Export as JSON, CSV, or XLSX from the Results tab

[Screenshot: Results table showing columns: rank, title, author, price, rating, reviewCount]

Step 5: Analyze with Python

import pandas as pd

# Load exported CSV
df = pd.read_csv('yes24_bestsellers.csv')

# Quick overview
print(f"Books scraped: {len(df)}")
print(f"\nTop 10 by rating:")
print(df.nlargest(10, 'rating')[['rank', 'title', 'author', 'rating', 'reviewCount']])

# Price analysis
print(f"\nAverage list price: ₩{df['price'].mean():,.0f}")
print(f"Average discount: {((df['price'] - df['discountedPrice']) / df['price'] * 100).mean():.1f}%")

# Most prolific publishers
print(f"\nTop publishers in bestseller chart:")
print(df['publisher'].value_counts().head(10))

Real-World Use Cases

Use Case 1: Weekly Publishing Intelligence Report

Scenario: A global literary agency wants to identify Korean fiction titles for international rights acquisition before they appear on mainstream radar.

Approach:

Schedule weekly runs of the scraper in Bestseller mode across Literature and Self-Help
Flag any book that:
- Has been in Top 20 for 3+ consecutive weeks
- Has 500+ reviews with a rating above 4.5
- Is not yet translated into English
Pull full detail records for flagged titles
Feed into a rights-inquiry workflow

Result: Early identification of titles like 불편한 편의점 (Convenience Store by the Sea) — months before its English translation was announced.

Data cost: ~$1/week for 400 books across 4 categories

Use Case 2: Korean Market Sentiment Tracker

Scenario: A macroeconomic research firm wants to track Korean consumer sentiment via reading behavior — what people choose to read reflects economic anxiety, optimism, or social mood.

Approach:

Scrape Economy/Business and Self-Help bestseller charts weekly
Classify books into thematic buckets: survival/frugality, wealth-building, career anxiety, entrepreneurship
Track the share of "anxiety-driven" vs. "opportunity-driven" content over time
Cross-reference with economic indicators (KOSPI, unemployment data)

Key hypothesis: Periods of high economic uncertainty see rises in "frugality/survival" self-help books; bull markets see "wealth-building/entrepreneurship" books climb.

Data cost: ~$2/month for continuous weekly monitoring

Use Case 3: K-Literature Trend Analysis for Translators

Scenario: A literary translator specializing in Korean-to-English wants to identify which Korean genres and themes have the highest potential for Western audiences.

Approach:

Collect 6 months of monthly bestseller snapshots across Literature, Self-Help, and Science
Extract keywords from book descriptions using NLP
Compare trending themes in Korea to current Western publishing trends (using Goodreads/Amazon data as baseline)
Identify gap opportunities: themes popular in Korea not yet prominent in Western publishing

Example finding: Environmental philosophy and "slow living" themes appear consistently in Korean self-help charts 12-18 months before equivalent titles appear in Western bestseller lists.

Data cost: ~$5 for 6-month historical snapshot

Use Case 4: Academic Research — Digital Bookstore as Cultural Mirror

Scenario: A cultural studies researcher studying how algorithmic curation shapes Korean reading culture.

Data collected:

12 months of weekly bestseller charts across all 13 categories
Full metadata including review counts, ratings, publisher, and publication date
~15,000 book-week data points

Research questions:

How long does a book stay on the chart? Is there a "decay curve"?
Do publisher size and pre-publication marketing correlate with initial chart position?
How does YES24's recommendation system affect new author discoverability?

Sample analysis:

import pandas as pd
import matplotlib.pyplot as plt

# Load 12 months of weekly data
df = pd.read_csv('yes24_12months.csv')
df['week'] = pd.to_datetime(df['scraped_at']).dt.isocalendar().week

# Chart decay analysis: weeks a book spent in Top 20
longevity = df[df['rank'] <= 20].groupby('isbn')['week'].nunique().reset_index()
longevity.columns = ['isbn', 'weeks_in_top20']

# Plot distribution
longevity['weeks_in_top20'].hist(bins=20, figsize=(10, 6))
plt.title('Distribution: How Long Books Stay in YES24 Top 20')
plt.xlabel('Weeks in Top 20')
plt.ylabel('Number of Books')
plt.savefig('chart_longevity.png', dpi=150)

Data cost: ~$10 for full 12-month research dataset

Python Integration: Full Pipeline Example

from apify_client import ApifyClient
import pandas as pd
from datetime import datetime

client = ApifyClient("YOUR_APIFY_API_TOKEN")

# === STEP 1: Scrape weekly bestsellers across 4 key categories ===
CATEGORIES = ["literature", "economy", "self_help", "science"]
all_books = []

for category in CATEGORIES:
    print(f"Scraping: {category}...")
    run_input = {
        "mode": "bestseller",
        "category": category,
        "limit": 100
    }

    run = client.actor("oxygenated_quagmire/yes24-book-scraper").call(
        run_input=run_input
    )

    items = client.dataset(run["defaultDatasetId"]).list_items().items
    for item in items:
        item["scrape_category"] = category
    all_books.extend(items)
    print(f"  → {len(items)} books extracted")

df = pd.DataFrame(all_books)
print(f"\nTotal books: {len(df)}")

# === STEP 2: Flag Rights Acquisition Candidates ===
candidates = df[
    (df['rank'] <= 20) &
    (df['rating'] >= 4.5) &
    (df['reviewCount'] >= 500)
].copy()

print(f"\n=== Rights Acquisition Candidates ===")
print(f"Books meeting criteria: {len(candidates)}")
print(candidates[['rank', 'title', 'author', 'rating', 'reviewCount', 'scrape_category']].to_string())

# === STEP 3: Export ===
timestamp = datetime.now().strftime("%Y%m%d")
filename = f"yes24_bestsellers_{timestamp}.csv"
df.to_csv(filename, index=False, encoding='utf-8-sig')  # utf-8-sig for Excel compatibility
print(f"\nSaved to {filename}")

Scheduling Weekly Monitoring

# Schedule weekly run every Monday at 9:00 AM KST (0:00 UTC)
schedule = client.schedules().create(
    name="yes24-weekly-bestsellers",
    cron_expression="0 0 * * 1",  # Monday 00:00 UTC = 09:00 KST
    actor_id="oxygenated_quagmire/yes24-book-scraper",
    run_input={
        "mode": "bestseller",
        "category": "literature",
        "limit": 100
    }
)
print(f"Weekly schedule created: {schedule['id']}")

Understanding the Output

Key Fields for Analysis

rank: Current chart position. Changes weekly. Combining rank + scraped_at creates time-series rank trajectory data.

reviewCount: Proxy for commercial success. Korean readers are prolific reviewers — 1,000+ reviews typically indicates a sustained bestseller.

rating: YES24 ratings skew slightly higher than Western equivalents (4.0+ is strong, 4.5+ is excellent). Low rating with high rank = divisive/controversial book — often culturally significant.

publishedDate: Combined with rank data, reveals how quickly a book rose to the charts. A book published 3 years ago still in Top 20 = a cultural touchstone, not just a recent release.

description: Rich text source for NLP. Korean book descriptions tend to be more detailed than Western equivalents, often including chapter-level summaries.

Pricing

The YES24 Book Scraper charges $0.50 per 1,000 items.

Use Case	Estimated Items	Est. Cost
Single category snapshot	100 books	~$0.10
All 13 categories	1,300 books	~$0.70
Weekly monitoring (4 categories)	~400/week	~$0.20/week
12-month research dataset	~25,000 records	~$12.50

The Apify free tier ($5/month) covers continuous weekly monitoring of all 13 categories for approximately 6 months.

Conclusion

YES24's bestseller charts are one of the most accessible windows into Korean cultural consumption available anywhere. What Koreans choose to read — and which books sustain their chart positions — encodes signals about economic sentiment, social trends, and the early movements of ideas that often become global.

The YES24 Book Scraper makes this data accessible to anyone: publishers scouting for translation rights, researchers studying Korean cultural production, analysts tracking sentiment, or developers building publishing intelligence tools.

Whether you're:

Identifying Korean titles for international licensing before they appear on Western radar
Tracking economic sentiment through the lens of business and self-help reading patterns
Building a publishing analytics tool that includes Korea in its coverage
Researching K-content culture beyond film and music

...Korea's reading data is now a query away.

Get Started

👉 Try the YES24 Book Scraper: https://apify.com/oxygenated_quagmire/yes24-book-scraper

Free Apify account includes $5/month — extract your first bestseller chart within minutes.

Questions or feature requests? Leave a review on the actor page.

The author maintains a portfolio of Korean data infrastructure actors on Apify. All 12 actors available at: https://apify.com/oxygenated_quagmire

Tags: #Korea #WebScraping #YES24 #Korean #Books #Publishing #DataScience #Apify #Python #KCulture #LiteraryAgency #Publishing

Suggested Publication: The Startup, Towards Data Science, Better Programming

DEV Community

Korea's Largest Bookstore Online — How to Scrape YES24 Bestsellers with Python

Introduction

Why YES24 Data is Valuable

Korea's Reading Economy in Numbers

The Translation and Publishing Opportunity

Why Standard Tools Fail

The Solution: YES24 Book Scraper on Apify

Three Operating Modes

What You Get

Available Categories (13)

Step-by-Step: How to Use the YES24 Book Scraper

Step 1: Create an Apify Account

Step 2: Open the Actor

Step 3: Configure Your Query

Step 4: Run and Export

Step 5: Analyze with Python

Real-World Use Cases

Use Case 1: Weekly Publishing Intelligence Report

Use Case 2: Korean Market Sentiment Tracker

Use Case 3: K-Literature Trend Analysis for Translators

Use Case 4: Academic Research — Digital Bookstore as Cultural Mirror

Python Integration: Full Pipeline Example

Scheduling Weekly Monitoring

Understanding the Output

Key Fields for Analysis

Pricing

Conclusion

Get Started

Top comments (0)