DEV Community

agenthustler
agenthustler

Posted on • Edited on

How to Build a Wikipedia Knowledge Graph with Python

Wikipedia contains millions of interconnected articles — a perfect foundation for building knowledge graphs. In this tutorial, we extract entities and relationships to construct a navigable knowledge graph.

Why Knowledge Graphs?

Knowledge graphs power Google answer boxes, recommendation engines, and AI assistants. Building one from Wikipedia creates a structured dataset mapping how concepts relate — far more useful than raw text.

Setup

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Extracting Entities and Relationships

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Infobox Parser

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Querying the Graph

    def find_path(self, source, target):
        try: return nx.shortest_path(self.graph, source, target)
        except nx.NetworkXNoPath: return None

    def get_related(self, entity, depth=2):
        if entity not in self.graph: return []
        return list(nx.ego_graph(self.graph, entity, radius=depth).nodes())

    def export(self, filename="wiki_kg.json"):
        with open(filename, "w") as f:
            json.dump(nx.node_link_data(self.graph), f, indent=2)
        print(f"Exported {self.graph.number_of_nodes()} nodes, {self.graph.number_of_edges()} edges")

kg = WikiKnowledgeGraph()
for seed in ["Machine learning", "Neural network", "Python (programming language)"]:
    kg.extract_relationships(seed, depth=2)
    time.sleep(1)
kg.export()
Enter fullscreen mode Exit fullscreen mode

Scaling

For large-scale extraction, ScraperAPI handles rotation and retries, ThorData provides residential proxies, and ScrapeOps monitors pipeline health.

What You Can Build

  • AI chatbot knowledge base grounded in structured facts
  • Research tools mapping connections between concepts
  • Content recommendation via graph proximity
  • Fact verification against structured Wikipedia data

Top comments (0)