DEV Community

Session zero
Session zero

Posted on

Korea's Largest Bookstore Online — How to Scrape YES24 Bestsellers with Python

Draft Date: 2026-03-03

Status: DRAFT — 게시 금지 (파일 저장만)

Target Platform: Medium (The Startup 또는 Towards Data Science)

Target Audience: 출판 업계 분석가, 트렌드 리서처, K-문학 관심 개발자, 번역 에이전트

Actor URL: https://apify.com/oxygenated_quagmire/yes24-book-scraper


Introduction

Every week, millions of South Koreans decide what to read next — and most of them check YES24 first.

YES24 is South Korea's largest online bookstore. It's not just a retailer; it's a cultural barometer. The books that climb its bestseller charts reflect what a society of 52 million people is thinking about, worrying about, and curious about. Self-help surges before exam season. Political memoirs spike after elections. English-language study guides never leave the top 20.

For publishers, literary agents, translators, and data analysts, this creates an extraordinary opportunity: Korea's reading habits are publicly visible, updated in real-time, and organized into 13 clean categories — from literature to science to cooking.

There's just one problem: YES24 has no public API.

The YES24 Book Scraper on Apify solves this. Built on YES24's SSR-rendered HTML and JSON-LD structured data, it extracts clean book data across three modes — bestseller charts, keyword search, and individual book details — without requiring a full browser automation stack.


Why YES24 Data is Valuable

Korea's Reading Economy in Numbers

YES24 commands roughly 40-45% of South Korea's online book retail market (2025 estimates). In a country where physical bookstores have declined sharply, YES24 is where the market reveals itself:

  • 13 bestseller categories updated weekly (Literature, Economy/Business, Self-Help, Children's, Comics, Science, History, Arts, Religion, Foreign Language, Cooking, Travel, and more)
  • 100,000+ new titles/year listed
  • Customer reviews with star ratings on most titles
  • Sales rank updates continuously for top-ranked books

Beyond commerce, YES24 data is a cultural intelligence layer:

  • Which genres are growing vs. declining in Korea?
  • What does Korea's business reading list say about economic sentiment?
  • Which Korean titles might be candidates for international licensing?
  • How does a new Korean author's book rank against established names in its genre?

The Translation and Publishing Opportunity

For international publishers and literary agents, YES24 bestseller data is particularly valuable:

Rights acquisition intelligence: Korean books that sustain Top 10 positions for 4+ weeks in Fiction or Non-Fiction are proven commodities. Many become candidates for translation rights deals — but identifying them early requires monitoring the charts consistently.

Genre trend arbitrage: Certain genres that dominate Korean bestseller lists often presage global trends. Korean self-help and personal finance books began their global crossover (think The Courage to Be Disliked) well before Western publishers noticed. The data was always there.

K-content ecosystem alignment: As Korean films, dramas, and music achieve global reach, books by the same authors or on the same themes often lag 12-18 months in international attention. Early chart monitoring = early rights opportunities.

Why Standard Tools Fail

  1. No public API — YES24 doesn't expose sales rank data via API
  2. SSR rendering — Unlike SPAs, YES24 uses server-side rendering, making the HTML parseable — but the structure changes frequently enough that maintaining a custom scraper is brittle
  3. JSON-LD structured data — YES24 embeds rich structured data for SEO purposes, which the actor exploits for clean extraction
  4. Volume — Monitoring 13 categories weekly, with 50-100 books each, is too large for manual tracking

The Solution: YES24 Book Scraper on Apify

The YES24 Book Scraper is a cloud-hosted Apify actor with three operational modes designed for different research needs.

Three Operating Modes

1. Bestseller Mode — Pull the current bestseller chart for any of 13 categories

{
  "mode": "bestseller",
  "category": "literature",
  "limit": 50
}
Enter fullscreen mode Exit fullscreen mode

2. Search Mode — Search the YES24 catalog by keyword

{
  "mode": "search",
  "keyword": "인공지능",
  "limit": 100
}
Enter fullscreen mode Exit fullscreen mode

3. Detail Mode — Extract full metadata for a specific book by URL

{
  "mode": "detail",
  "bookUrl": "https://www.yes24.com/Product/Goods/12345678"
}
Enter fullscreen mode Exit fullscreen mode

What You Get

For each book, the actor extracts:

{
  "rank": 1,
  "title": "불편한 편의점",
  "author": "김호연",
  "publisher": "나무옆의자",
  "publishedDate": "2021-04-20",
  "isbn": "9791190932264",
  "price": 14000,
  "discountedPrice": 12600,
  "rating": 4.8,
  "reviewCount": 12847,
  "description": "따뜻한 위로와 공감의 이야기...",
  "category": "소설",
  "rank": 1,
  "coverImageUrl": "https://image.yes24.com/goods/...",
  "bookUrl": "https://www.yes24.com/Product/Goods/...",
  "tags": ["소설", "한국소설", "베스트셀러"],
  "scraped_at": "2026-03-03T06:00:00+09:00"
}
Enter fullscreen mode Exit fullscreen mode

Available Categories (13)

Category ID Korean Name Description
literature 소설/시/희곡 Fiction, Poetry, Drama
economy 경제경영 Business, Economics
self_help 자기계발 Self-Help, Personal Development
children 어린이 Children's Books
comics 만화 Comics, Manga
science 과학 Science, Technology
history 역사 History, Archaeology
arts 예술 Arts, Music, Film
religion 종교 Religion, Philosophy
foreign_lang 외국어 Language Learning
cooking 가정/요리 Cooking, Lifestyle
travel 여행 Travel, Geography
teens 청소년 Young Adult

Step-by-Step: How to Use the YES24 Book Scraper

Step 1: Create an Apify Account

  1. Go to apify.com and sign up (free tier available)
  2. Free tier includes $5/month credit — sufficient for 10,000+ book records
  3. No credit card required to start

Step 2: Open the Actor

Navigate to:

👉 https://apify.com/oxygenated_quagmire/yes24-book-scraper

Click "Try for free" to open the input console.

[Screenshot: Actor page with "YES24 Book Scraper" title, 3-mode selector, category dropdown]

Step 3: Configure Your Query

For a weekly bestseller snapshot across all business books:

{
  "mode": "bestseller",
  "category": "economy",
  "limit": 100
}
Enter fullscreen mode Exit fullscreen mode

For tracking a specific topic across the catalog:

{
  "mode": "search",
  "keyword": "ChatGPT 활용",
  "limit": 200
}
Enter fullscreen mode Exit fullscreen mode

Key configuration fields:

Field Description Default
mode bestseller, search, or detail Required
category Category for bestseller mode literature
keyword Search term for search mode Required in search mode
limit Max number of books to return 100
bookUrl Direct URL for detail mode Required in detail mode

[Screenshot: Input console showing mode selector set to "bestseller" and category set to "economy"]

Step 4: Run and Export

  1. Click "Start" — runs on Apify cloud infrastructure in Seoul region
  2. Monitor progress in the Live Log tab
  3. Typical runtime: ~30 seconds for 100 books
  4. Export as JSON, CSV, or XLSX from the Results tab

[Screenshot: Results table showing columns: rank, title, author, price, rating, reviewCount]

Step 5: Analyze with Python

import pandas as pd

# Load exported CSV
df = pd.read_csv('yes24_bestsellers.csv')

# Quick overview
print(f"Books scraped: {len(df)}")
print(f"\nTop 10 by rating:")
print(df.nlargest(10, 'rating')[['rank', 'title', 'author', 'rating', 'reviewCount']])

# Price analysis
print(f"\nAverage list price: ₩{df['price'].mean():,.0f}")
print(f"Average discount: {((df['price'] - df['discountedPrice']) / df['price'] * 100).mean():.1f}%")

# Most prolific publishers
print(f"\nTop publishers in bestseller chart:")
print(df['publisher'].value_counts().head(10))
Enter fullscreen mode Exit fullscreen mode

Real-World Use Cases

Use Case 1: Weekly Publishing Intelligence Report

Scenario: A global literary agency wants to identify Korean fiction titles for international rights acquisition before they appear on mainstream radar.

Approach:

  1. Schedule weekly runs of the scraper in Bestseller mode across Literature and Self-Help
  2. Flag any book that:
    • Has been in Top 20 for 3+ consecutive weeks
    • Has 500+ reviews with a rating above 4.5
    • Is not yet translated into English
  3. Pull full detail records for flagged titles
  4. Feed into a rights-inquiry workflow

Result: Early identification of titles like 불편한 편의점 (Convenience Store by the Sea) — months before its English translation was announced.

Data cost: ~$1/week for 400 books across 4 categories


Use Case 2: Korean Market Sentiment Tracker

Scenario: A macroeconomic research firm wants to track Korean consumer sentiment via reading behavior — what people choose to read reflects economic anxiety, optimism, or social mood.

Approach:

  1. Scrape Economy/Business and Self-Help bestseller charts weekly
  2. Classify books into thematic buckets: survival/frugality, wealth-building, career anxiety, entrepreneurship
  3. Track the share of "anxiety-driven" vs. "opportunity-driven" content over time
  4. Cross-reference with economic indicators (KOSPI, unemployment data)

Key hypothesis: Periods of high economic uncertainty see rises in "frugality/survival" self-help books; bull markets see "wealth-building/entrepreneurship" books climb.

Data cost: ~$2/month for continuous weekly monitoring


Use Case 3: K-Literature Trend Analysis for Translators

Scenario: A literary translator specializing in Korean-to-English wants to identify which Korean genres and themes have the highest potential for Western audiences.

Approach:

  1. Collect 6 months of monthly bestseller snapshots across Literature, Self-Help, and Science
  2. Extract keywords from book descriptions using NLP
  3. Compare trending themes in Korea to current Western publishing trends (using Goodreads/Amazon data as baseline)
  4. Identify gap opportunities: themes popular in Korea not yet prominent in Western publishing

Example finding: Environmental philosophy and "slow living" themes appear consistently in Korean self-help charts 12-18 months before equivalent titles appear in Western bestseller lists.

Data cost: ~$5 for 6-month historical snapshot


Use Case 4: Academic Research — Digital Bookstore as Cultural Mirror

Scenario: A cultural studies researcher studying how algorithmic curation shapes Korean reading culture.

Data collected:

  • 12 months of weekly bestseller charts across all 13 categories
  • Full metadata including review counts, ratings, publisher, and publication date
  • ~15,000 book-week data points

Research questions:

  • How long does a book stay on the chart? Is there a "decay curve"?
  • Do publisher size and pre-publication marketing correlate with initial chart position?
  • How does YES24's recommendation system affect new author discoverability?

Sample analysis:

import pandas as pd
import matplotlib.pyplot as plt

# Load 12 months of weekly data
df = pd.read_csv('yes24_12months.csv')
df['week'] = pd.to_datetime(df['scraped_at']).dt.isocalendar().week

# Chart decay analysis: weeks a book spent in Top 20
longevity = df[df['rank'] <= 20].groupby('isbn')['week'].nunique().reset_index()
longevity.columns = ['isbn', 'weeks_in_top20']

# Plot distribution
longevity['weeks_in_top20'].hist(bins=20, figsize=(10, 6))
plt.title('Distribution: How Long Books Stay in YES24 Top 20')
plt.xlabel('Weeks in Top 20')
plt.ylabel('Number of Books')
plt.savefig('chart_longevity.png', dpi=150)
Enter fullscreen mode Exit fullscreen mode

Data cost: ~$10 for full 12-month research dataset


Python Integration: Full Pipeline Example

from apify_client import ApifyClient
import pandas as pd
from datetime import datetime

client = ApifyClient("YOUR_APIFY_API_TOKEN")

# === STEP 1: Scrape weekly bestsellers across 4 key categories ===
CATEGORIES = ["literature", "economy", "self_help", "science"]
all_books = []

for category in CATEGORIES:
    print(f"Scraping: {category}...")
    run_input = {
        "mode": "bestseller",
        "category": category,
        "limit": 100
    }

    run = client.actor("oxygenated_quagmire/yes24-book-scraper").call(
        run_input=run_input
    )

    items = client.dataset(run["defaultDatasetId"]).list_items().items
    for item in items:
        item["scrape_category"] = category
    all_books.extend(items)
    print(f"{len(items)} books extracted")

df = pd.DataFrame(all_books)
print(f"\nTotal books: {len(df)}")

# === STEP 2: Flag Rights Acquisition Candidates ===
candidates = df[
    (df['rank'] <= 20) &
    (df['rating'] >= 4.5) &
    (df['reviewCount'] >= 500)
].copy()

print(f"\n=== Rights Acquisition Candidates ===")
print(f"Books meeting criteria: {len(candidates)}")
print(candidates[['rank', 'title', 'author', 'rating', 'reviewCount', 'scrape_category']].to_string())

# === STEP 3: Export ===
timestamp = datetime.now().strftime("%Y%m%d")
filename = f"yes24_bestsellers_{timestamp}.csv"
df.to_csv(filename, index=False, encoding='utf-8-sig')  # utf-8-sig for Excel compatibility
print(f"\nSaved to {filename}")
Enter fullscreen mode Exit fullscreen mode

Scheduling Weekly Monitoring

# Schedule weekly run every Monday at 9:00 AM KST (0:00 UTC)
schedule = client.schedules().create(
    name="yes24-weekly-bestsellers",
    cron_expression="0 0 * * 1",  # Monday 00:00 UTC = 09:00 KST
    actor_id="oxygenated_quagmire/yes24-book-scraper",
    run_input={
        "mode": "bestseller",
        "category": "literature",
        "limit": 100
    }
)
print(f"Weekly schedule created: {schedule['id']}")
Enter fullscreen mode Exit fullscreen mode

Understanding the Output

Key Fields for Analysis

rank: Current chart position. Changes weekly. Combining rank + scraped_at creates time-series rank trajectory data.

reviewCount: Proxy for commercial success. Korean readers are prolific reviewers — 1,000+ reviews typically indicates a sustained bestseller.

rating: YES24 ratings skew slightly higher than Western equivalents (4.0+ is strong, 4.5+ is excellent). Low rating with high rank = divisive/controversial book — often culturally significant.

publishedDate: Combined with rank data, reveals how quickly a book rose to the charts. A book published 3 years ago still in Top 20 = a cultural touchstone, not just a recent release.

description: Rich text source for NLP. Korean book descriptions tend to be more detailed than Western equivalents, often including chapter-level summaries.


Pricing

The YES24 Book Scraper charges $0.50 per 1,000 items.

Use Case Estimated Items Est. Cost
Single category snapshot 100 books ~$0.10
All 13 categories 1,300 books ~$0.70
Weekly monitoring (4 categories) ~400/week ~$0.20/week
12-month research dataset ~25,000 records ~$12.50

The Apify free tier ($5/month) covers continuous weekly monitoring of all 13 categories for approximately 6 months.


Conclusion

YES24's bestseller charts are one of the most accessible windows into Korean cultural consumption available anywhere. What Koreans choose to read — and which books sustain their chart positions — encodes signals about economic sentiment, social trends, and the early movements of ideas that often become global.

The YES24 Book Scraper makes this data accessible to anyone: publishers scouting for translation rights, researchers studying Korean cultural production, analysts tracking sentiment, or developers building publishing intelligence tools.

Whether you're:

  • Identifying Korean titles for international licensing before they appear on Western radar
  • Tracking economic sentiment through the lens of business and self-help reading patterns
  • Building a publishing analytics tool that includes Korea in its coverage
  • Researching K-content culture beyond film and music

...Korea's reading data is now a query away.

Get Started

👉 Try the YES24 Book Scraper: https://apify.com/oxygenated_quagmire/yes24-book-scraper

Free Apify account includes $5/month — extract your first bestseller chart within minutes.

Questions or feature requests? Leave a review on the actor page.


The author maintains a portfolio of Korean data infrastructure actors on Apify. All 12 actors available at: https://apify.com/oxygenated_quagmire


Tags: #Korea #WebScraping #YES24 #Korean #Books #Publishing #DataScience #Apify #Python #KCulture #LiteraryAgency #Publishing

Suggested Publication: The Startup, Towards Data Science, Better Programming

Top comments (0)