Wikipedia contains millions of interconnected articles — a perfect foundation for building knowledge graphs. In this tutorial, we extract entities and relationships to construct a navigable knowledge graph.
Why Knowledge Graphs?
Knowledge graphs power Google answer boxes, recommendation engines, and AI assistants. Building one from Wikipedia creates a structured dataset mapping how concepts relate — far more useful than raw text.
Setup
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Extracting Entities and Relationships
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Infobox Parser
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Querying the Graph
def find_path(self, source, target):
try: return nx.shortest_path(self.graph, source, target)
except nx.NetworkXNoPath: return None
def get_related(self, entity, depth=2):
if entity not in self.graph: return []
return list(nx.ego_graph(self.graph, entity, radius=depth).nodes())
def export(self, filename="wiki_kg.json"):
with open(filename, "w") as f:
json.dump(nx.node_link_data(self.graph), f, indent=2)
print(f"Exported {self.graph.number_of_nodes()} nodes, {self.graph.number_of_edges()} edges")
kg = WikiKnowledgeGraph()
for seed in ["Machine learning", "Neural network", "Python (programming language)"]:
kg.extract_relationships(seed, depth=2)
time.sleep(1)
kg.export()
Scaling
For large-scale extraction, ScraperAPI handles rotation and retries, ThorData provides residential proxies, and ScrapeOps monitors pipeline health.
What You Can Build
- AI chatbot knowledge base grounded in structured facts
- Research tools mapping connections between concepts
- Content recommendation via graph proximity
- Fact verification against structured Wikipedia data
Top comments (0)