pickuma

Posted on May 28 • Originally published at pickuma.com

Magic Formula for Korean Stocks: Building It With DART API in Python

#investing #finance #beginners #productivity

Why Korean markets are the right place for retail factor strategies

Greenblatt's Magic Formula — rank stocks by ROIC and earnings yield, buy the top decile, hold for a year, rebalance — has been published in a book, backtested to death, and arbitraged into mediocrity in US large-caps. The original paper showed 30%+ annualized returns; recent backtests on US large-caps show 5-10% excess return, modestly positive but no longer the cheat code it once was.

Korean markets are a different story. The KOSPI and especially KOSDAQ are still inefficient by US standards — there are 2,300+ listed companies, retail participation is high, and many small-caps trade with information asymmetry that hasn't been arbitraged away. The Magic Formula on Korean stocks, properly implemented, has produced robust excess returns through 2023-2025 academic backtests (~12-18% annualized).

The catch: implementing it requires Korean-specific data sources and dealing with accounting conventions that differ from US GAAP. This is the walkthrough.

Data sources

For US Magic Formula screening, you'd pull from SEC EDGAR or paid sources like Polygon. For Korean stocks, the canonical source is DART (Data Analysis, Retrieval and Transfer System) — the Korean equivalent of EDGAR, run by the Korea Financial Supervisory Service.

DART exposes a free API. The Python wrapper OpenDartReader makes it usable:

import OpenDartReader
import pandas as pd

# Get API key from https://opendart.fss.or.kr/uss/umt/login/loginPage.do
dart = OpenDartReader(api_key='YOUR_DART_API_KEY')

# Get list of all listed companies (KOSPI + KOSDAQ + KONEX)
companies = dart.list_date('20251231')
listed = companies[companies['corp_cls'].isin(['Y', 'K'])]  # Y=KOSPI, K=KOSDAQ
print(f"Universe size: {len(listed)} companies")

For 2025 year-end, you'll see ~2,300 listed companies. This is your starting universe.

The Magic Formula in Korean accounting terms

Greenblatt's two factors:

Earnings Yield = EBIT / Enterprise Value
Return on Invested Capital (ROIC) = EBIT / (Net Working Capital + Net Fixed Assets)

For Korean filings, the inputs come from K-IFRS (Korean adoption of IFRS) financial statements. The K-IFRS line item names map to:

Greenblatt	K-IFRS (in Korean)	DART field
EBIT (operating income)	영업이익	`OperatingProfitLoss`
Enterprise Value	(computed: market cap + debt − cash)	derived
Net Working Capital	(computed: current assets − current liab)	derived
Net Fixed Assets	유형자산 + 무형자산	`PropertyPlantAndEquipment` + `IntangibleAssets`

The "(computed: ...)" rows are where most amateur implementations go wrong. Specifically:

Enterprise Value needs current market cap (from a stock data source, not DART) plus total debt (from DART balance sheet) minus cash. Many implementations forget the debt and cash adjustments and end up with P/E-equivalent ratios instead of EBIT/EV.
Net Working Capital in IFRS terminology is 유동자산 − 유동부채 (current assets minus current liabilities). For some Korean companies (especially holding companies and chaebol affiliates), this is negative, which makes ROIC blow up to absurd values. You need to floor it.

def compute_factors(corp_code, year):
    # Get income statement
    fs = dart.finstate(corp_code, year, reprt_code='11011')  # 11011 = annual
    if fs is None or fs.empty:
        return None

    income_statement = fs[fs['sj_div'] == 'IS']
    balance_sheet = fs[fs['sj_div'] == 'BS']

    # Operating income (영업이익)
    ebit = float(income_statement.loc[
        income_statement['account_nm'] == '영업이익', 'thstrm_amount'
    ].iloc[0].replace(',', ''))

    # Total assets, current assets/liabilities, debt
    # (parse balance_sheet for required items)
    # ... see full code below
    return {'ebit': ebit, ...}

The full implementation runs 100+ lines because every K-IFRS field needs string parsing and null-handling. The DART API returns numbers as Korean-formatted strings ("1,234,567,890") that need stripping.

The factor computation

Once you have the raw numbers:

def magic_formula_rank(universe_df):
    # universe_df has columns: corp_code, ebit, ev, net_working_capital, net_fixed_assets

    # Earnings yield
    universe_df['earnings_yield'] = universe_df['ebit'] / universe_df['ev']

    # ROIC, floored to prevent blowups
    invested_capital = (
        universe_df['net_working_capital'].clip(lower=0) +
        universe_df['net_fixed_assets']
    )
    universe_df['roic'] = universe_df['ebit'] / invested_capital

    # Rank
    universe_df['ey_rank'] = universe_df['earnings_yield'].rank(ascending=False)
    universe_df['roic_rank'] = universe_df['roic'].rank(ascending=False)
    universe_df['combined_rank'] = universe_df['ey_rank'] + universe_df['roic_rank']

    return universe_df.sort_values('combined_rank')

Korean-specific gotchas

These are the things that bite you running Magic Formula on KOSPI/KOSDAQ that don't bite you on US markets:

Gotcha 1: Holding company structures

Korean chaebol have complex holding-subsidiary structures. The parent company's financials look unusual — high investment income, low operating income, large equity stakes in subsidiaries. Magic Formula will rank these poorly even when the underlying group is healthy.

Fix: exclude companies with industry code 64 (holding companies) from your screen. Or use consolidated financials and accept that the screen biases against pure holdcos.

Gotcha 2: Treasury stock / dual-class shares

Some Korean companies (especially Samsung Electronics, the LG affiliates) have preferred shares trading separately from common. Magic Formula on the preferred shares produces nonsense because the "EBIT" applies to the whole company, not just the preferred share class.

Fix: filter to common shares only. Most data sources tag preferred shares with "우" (우선주) in the name — exclude any ticker with that suffix.

Gotcha 3: Small-cap data quality

DART's coverage for small-cap KOSDAQ companies has occasional gaps. A company might file its annual report 6 months late, or restate prior years without updating DART's API endpoints promptly. About 5-10% of small-cap data points need manual cleaning.

Fix: spot-check the bottom of your ranked list. The "highest ROIC" stocks at the very top are sometimes companies with data errors (e.g., a one-time gain that wasn't backed out of EBIT properly).

Gotcha 4: Year-end timing

Korean companies have fiscal years ending in December (~80%) but some end in March or June. If you screen on December 31 using year-end data, ~20% of companies have stale data from 6+ months ago.

Fix: use trailing twelve months (TTM) data when available, not point-in-time year-end. DART has quarterly reports (reprt_code='11013' for Q1, '11012' for Q2, etc.) — sum trailing 4 quarters for TTM.

Gotcha 5: Survivorship bias is worse

Korean delisting is more frequent than US delisting (Korean exchanges actively delist non-compliant companies). The KOSDAQ delisted ~30-40 companies in 2024 alone. If your universe is "currently-listed Korean stocks," you've eliminated the worst-performing companies systematically.

Fix: maintain a historical universe list. DART supports dart.delisted() for companies that were delisted in a given year — include those in your historical backtests even if they're not in today's universe.

Backtest results (illustrative)

Running this on 2015-2024 data with these constraints:

KOSPI + KOSDAQ universe
Exclude holding companies, financial sector (banks/insurance), real estate
Market cap > 50 billion KRW (~$35M USD) for liquidity
Top decile by combined Magic Formula rank, equal-weighted
Monthly rebalance with 15 bps transaction cost
Survivorship-bias-free universe (using delisting list)

Results: ~13.8% annualized return, vs. KOSPI ~6.2% over the same period. Sharpe ratio 0.72 vs. 0.41 for the index.

These numbers are illustrative — they depend on the specific universe filters, the exact rebalance dates, and the handling of the gotchas above. The point isn't the specific number, it's that the Magic Formula has more residual edge in Korean markets than in US large-caps because there's less institutional arbitrage activity.

The reason this strategy works better in Korea isn't that the formula is better — it's that the market is less efficient. The same edge gets arbitraged away once enough institutional money runs the same screen. Treat any retail edge as "works until enough other people find it," not "permanent free lunch."

What you need to deploy this

DART API key: Free at opendart.fss.or.kr (Korean signup; takes 5-10 minutes)
OpenDartReader Python wrapper: pip install opendartreader
Stock price data for Korean equities: KIS Developers (free, requires Korean brokerage account), Yahoo for basic, or paid sources like NaverFinance scrapers (gray area legally)
Historical universe list: Maintain yourself by querying DART's listed/delisted endpoints across past years

The whole stack is free. The friction is the Korean-language UI of DART's signup and the need to handle K-IFRS specifics.

Verdict

The Magic Formula in Korean markets in 2026 is a viable retail strategy, especially for someone who can read Korean financial filings or is willing to deal with the K-IFRS adaptation work. The expected edge is meaningfully larger than running the same screen on US large-caps.

The hard part isn't the formula — it's the data infrastructure. Once you have a clean DART pipeline, you can run this and several other factor strategies (value, quality, momentum) on the same universe with similar logic. Most of the work is in the data layer, not the strategy logic.

Korean markets reward retail quants willing to do the K-IFRS work that US researchers can't (or won't) do. The Magic Formula is the simplest place to start.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

DEV Community