How to Build a Wikipedia Knowledge Graph with Python

#webdev #programming #python #tutorial

Wikipedia contains millions of interconnected articles — a perfect foundation for building knowledge graphs. In this tutorial, we extract entities and relationships to construct a navigable knowledge graph.

Why Knowledge Graphs?

Knowledge graphs power Google answer boxes, recommendation engines, and AI assistants. Building one from Wikipedia creates a structured dataset mapping how concepts relate — far more useful than raw text.

Setup

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Extracting Entities and Relationships

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Infobox Parser

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Querying the Graph

    def find_path(self, source, target):
        try: return nx.shortest_path(self.graph, source, target)
        except nx.NetworkXNoPath: return None

    def get_related(self, entity, depth=2):
        if entity not in self.graph: return []
        return list(nx.ego_graph(self.graph, entity, radius=depth).nodes())

    def export(self, filename="wiki_kg.json"):
        with open(filename, "w") as f:
            json.dump(nx.node_link_data(self.graph), f, indent=2)
        print(f"Exported {self.graph.number_of_nodes()} nodes, {self.graph.number_of_edges()} edges")

kg = WikiKnowledgeGraph()
for seed in ["Machine learning", "Neural network", "Python (programming language)"]:
    kg.extract_relationships(seed, depth=2)
    time.sleep(1)
kg.export()

Scaling

For large-scale extraction, ScraperAPI handles rotation and retries, ThorData provides residential proxies, and ScrapeOps monitors pipeline health.

What You Can Build

AI chatbot knowledge base grounded in structured facts
Research tools mapping connections between concepts
Content recommendation via graph proximity
Fact verification against structured Wikipedia data

DEV Community