Wikipedia Has a Free API — Query 60M+ Articles in Any Language (No Key, No Limits)

#python #api #tutorial #beginners

Most Developers Don't Know This Exists

Wikipedia has a powerful REST API and a MediaWiki API that lets you programmatically access 60M+ articles across 300+ languages. No API key. No rate limits (with polite use).

Get Any Article's Summary

import requests

def get_summary(title, lang="en"):
    url = f"https://{lang}.wikipedia.org/api/rest_v1/page/summary/{title}"
    r = requests.get(url)
    data = r.json()
    return {
        "title": data["title"],
        "extract": data["extract"],
        "thumbnail": data.get("thumbnail", {}).get("source"),
        "url": data["content_urls"]["desktop"]["page"]
    }

info = get_summary("Python_(programming_language)")
print(info["extract"][:200])

Search Articles

def search_wiki(query, limit=5):
    url = "https://en.wikipedia.org/w/api.php"
    params = {
        "action": "query",
        "list": "search",
        "srsearch": query,
        "srlimit": limit,
        "format": "json"
    }
    r = requests.get(url, params=params)
    return [{"title": s["title"], "snippet": s["snippet"]} 
            for s in r.json()["query"]["search"]]

results = search_wiki("web scraping")
for r in results:
    print(r["title"])

Get Full Article Content (HTML or Wikitext)

def get_full_article(title):
    url = f"https://en.wikipedia.org/api/rest_v1/page/html/{title}"
    r = requests.get(url)
    return r.text  # Full HTML

# Or plain text via MediaWiki API:
def get_plain_text(title):
    url = "https://en.wikipedia.org/w/api.php"
    params = {
        "action": "query",
        "titles": title,
        "prop": "extracts",
        "explaintext": True,
        "format": "json"
    }
    r = requests.get(url, params=params)
    pages = r.json()["query"]["pages"]
    page = next(iter(pages.values()))
    return page.get("extract", "")

Real Use Cases

Chatbot knowledge base — pull verified facts instead of hallucinating
Content enrichment — auto-add context to any entity mention
Language learning apps — get articles in any of 300+ languages
Data pipelines — extract structured data from infoboxes
Trivia/quiz apps — random article endpoint for endless questions

Random Article

def random_article():
    url = "https://en.wikipedia.org/api/rest_v1/page/random/summary"
    r = requests.get(url)
    data = r.json()
    return f"{data[title]}: {data[extract][:100]}..."

print(random_article())

Get Article in Any Language

# Same article in Japanese
info_ja = get_summary("Python_(プログラミング言語)", lang="ja")

# Or find all language versions of an article
def get_languages(title):
    url = "https://en.wikipedia.org/w/api.php"
    params = {"action": "query", "titles": title, "prop": "langlinks", "lllimit": 500, "format": "json"}
    r = requests.get(url, params=params)
    pages = r.json()["query"]["pages"]
    page = next(iter(pages.values()))
    return [(l["lang"], l["*"]) for l in page.get("langlinks", [])]

Why Use Wikipedia API vs Scraping?

Stable endpoints — won't break like HTML scraping
Structured data — JSON responses, not HTML parsing
Legal — explicitly allowed by Wikipedia's terms
Fast — cached CDN responses, typically <100ms

Building data tools? Check my GitHub for 300+ repos with API tutorials and automation scripts.

DEV Community