Most Developers Don't Know This Exists
Wikipedia has a powerful REST API and a MediaWiki API that lets you programmatically access 60M+ articles across 300+ languages. No API key. No rate limits (with polite use).
Get Any Article's Summary
import requests
def get_summary(title, lang="en"):
url = f"https://{lang}.wikipedia.org/api/rest_v1/page/summary/{title}"
r = requests.get(url)
data = r.json()
return {
"title": data["title"],
"extract": data["extract"],
"thumbnail": data.get("thumbnail", {}).get("source"),
"url": data["content_urls"]["desktop"]["page"]
}
info = get_summary("Python_(programming_language)")
print(info["extract"][:200])
Search Articles
def search_wiki(query, limit=5):
url = "https://en.wikipedia.org/w/api.php"
params = {
"action": "query",
"list": "search",
"srsearch": query,
"srlimit": limit,
"format": "json"
}
r = requests.get(url, params=params)
return [{"title": s["title"], "snippet": s["snippet"]}
for s in r.json()["query"]["search"]]
results = search_wiki("web scraping")
for r in results:
print(r["title"])
Get Full Article Content (HTML or Wikitext)
def get_full_article(title):
url = f"https://en.wikipedia.org/api/rest_v1/page/html/{title}"
r = requests.get(url)
return r.text # Full HTML
# Or plain text via MediaWiki API:
def get_plain_text(title):
url = "https://en.wikipedia.org/w/api.php"
params = {
"action": "query",
"titles": title,
"prop": "extracts",
"explaintext": True,
"format": "json"
}
r = requests.get(url, params=params)
pages = r.json()["query"]["pages"]
page = next(iter(pages.values()))
return page.get("extract", "")
Real Use Cases
- Chatbot knowledge base — pull verified facts instead of hallucinating
- Content enrichment — auto-add context to any entity mention
- Language learning apps — get articles in any of 300+ languages
- Data pipelines — extract structured data from infoboxes
- Trivia/quiz apps — random article endpoint for endless questions
Random Article
def random_article():
url = "https://en.wikipedia.org/api/rest_v1/page/random/summary"
r = requests.get(url)
data = r.json()
return f"{data[title]}: {data[extract][:100]}..."
print(random_article())
Get Article in Any Language
# Same article in Japanese
info_ja = get_summary("Python_(プログラミング言語)", lang="ja")
# Or find all language versions of an article
def get_languages(title):
url = "https://en.wikipedia.org/w/api.php"
params = {"action": "query", "titles": title, "prop": "langlinks", "lllimit": 500, "format": "json"}
r = requests.get(url, params=params)
pages = r.json()["query"]["pages"]
page = next(iter(pages.values()))
return [(l["lang"], l["*"]) for l in page.get("langlinks", [])]
Why Use Wikipedia API vs Scraping?
- Stable endpoints — won't break like HTML scraping
- Structured data — JSON responses, not HTML parsing
- Legal — explicitly allowed by Wikipedia's terms
- Fast — cached CDN responses, typically <100ms
Building data tools? Check my GitHub for 300+ repos with API tutorials and automation scripts.
More from me: 10 Dev Tools I Use Daily | 77 Scrapers on a Schedule | 150+ Free APIs
Top comments (0)