Lewis Won

Posted on Dec 5, 2024

Navigating the world of Harry Potter with Knowledge Graphs

#langchain #knowledgegraph #python #rag

Aim

Are you a Harry Potter fan who want to have everything about the Harry Potter universe on your fingertips? Or do you simply want to impress your friends with a cool chart of how the different characters in Harry Potter come together? Look no further than knowledge graphs.

This guide will show you how to get a knowledge graph up in Neo4J with just your laptop and your favourite book.

What is knowledge graph

According to Wikipedia:

A knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data.

What do you need

In terms of hardware, all you need is a computer, preferably one with a Nvidia graphics card. To be fully self-sufficient, I will go with a local LLM setup, but one could easily also use an OpenAI API for the same purpose.

Steps in setting up

You will need the following:

Ollama, and your favourite LLM model
a python environment
Neo4J

Ollama

As I am coding on Ubuntu 24.04 in WSL2, in order for any GPU workload to passthrough easily, I am using Ollama docker. Running Ollama as a docker container is as simple as first installing the Nvidia container toolkit, and then the following:

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

If you do not have a Nvidia GPU, you can run a CPU-only Ollama using the following command in CLI:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Once you are done, you can pull your favourite LLM model into Ollama. The list of models available on Ollama is here. For example if I want to pull qwen2.5, I can run the following command in CLI:

docker exec -it ollama ollama run qwen2.5

And you are done with Ollama!

Python environment

You will first want to create a python virtual environment, so that any packages you install, or any configurations changes you made, are restricted to within the environment, instead of having these applied globally. The following command will create a virtual environment harry-potter-rag:

python -m venv harry-potter-rag

You can then activate the virtual environment using the following command:

source tutorial-env/bin/activate

Next, use pip to install the relevant packages, mainly from LangChain:

%pip install --upgrade --quiet  langchain langchain-community langchain-openai langchain-experimental neo4j

Setting up Neo4J

We will set up Neo4J as a docker container. For ease of setting up with specific configurations, we use docker compose. You may simply copy the following into a file called docker-compose.yaml, and then run docker-compose up -d in the same directory to set up Neo4J.

This setup also ensures data, logs and plugins are persisted in local folders, i.e. /data. /logs and plugins.

services:
  neo4j:
    container_name: neo4j
    image: neo4j:5.20
    ports:
      - 7474:7474
      - 7687:7687
    environment:
      - NEO4J_AUTH=none
      - NEO4J_apoc_export_file_enabled=true
      - NEO4J_apoc_import_file_enabled=true
      - NEO4J_apoc_import_file_use__neo4j__config=true
      - NEO4J_dbms_security_procedures_unrestricted=custom.pregel.proc.*,pregel.*,apoc.*,gds.*
      - NEO4J_dbms_security_procedures_allowlist=custom.pregel.proc.*,pregel.*,apoc.*,gds.*
      - NEO4J_PLUGINS=["apoc", "graph-data-science"]
    volumes:
      - ./neo4j_db/data:/data
      - ./neo4j_db/logs:/logs
      - ./neo4j_db/import:/var/lib/neo4j/import
      - ./neo4j_db/plugins:/plugins

Building the Knowledge Graph

We can now start building the Knowledge Graph in Jupyter Notebook! We first set up an Ollama LLM instance using the following:

llm = OllamaLLM(
    base_url="http://host.docker.internal:11434", # env var
    model="qwen2.5:latest", # env var
    system="you are an expert AI agent",
    verbose=True
    )

Next, we connect our LLM to Neo4J:

import os

from langchain_community.graphs import Neo4jGraph

os.environ["NEO4J_URI"] = "bolt://localhost:7687"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "password"

graph = Neo4jGraph()

Now, it is time to grab your favourite Harry Potter text, or any favourite book, and we will use LangChain to split the text into chunks. Chunking is a strategy to break down a long text into parts, and we can then send each part to the LLM to convert them into nodes and edges, and insert each chunk's nodes and edges in Neo4J. Just a quick primer, nodes are circles you see on a graph, and each edge joins two nodes together.

The code also prints the first chunk for a quick preview of how the chunks look like.

from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load example document
with open("harry_potter.txt") as f:
    book = f.read()

text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=500,
    chunk_overlap=150,
    length_function=len,
    is_separator_regex=False,
)
texts = text_splitter.create_documents([book])
print(texts[0])

Now, it is time to let our GPU do the heavy lifting and convert out text into Knowledge Graph! Before we dive deep into the entire book, let us experiment with prompts to better guide the LLM in returning a graph in the way we want.

Prompts are essentially examples of what we expect, or instructions of what we want to appear in the response. In the context of knowledge graphs, we can instruct the LLM to only extract persons and organisations as nodes, and to only accept certain types of relationships given the entities. For example, we can allow the relationship of spouse to only happen between a person and another person, and not between a person and an organisation.

We can now employ the LLMGraphTransformer on the first chunk of text to see how the graph could turn out. This is a good chance for us to tweak the prompt until the result is to our liking.

The following example expects nodes which could be a Person or Organization, and the allowed_relationships specify the types of relationships that are allowed. In order to allow LLM to capture the variety of the original text, I also set strict_mode to False, so that any other relationships or entities which are not defined below can also be captured. If you instead set strict_mode to True, entities and relationships that do not comply with what is allowed could be either dropped, or forced into what is allowed (which may be inaccurate).

llm_transformer_filtered = LLMGraphTransformer(
   llm=llm,
   allowed_nodes=["Person", "Organization"],
   allowed_relationships=[
       ("Person", "WORKED_WITH", "Person"),
       ("Person", "WORKED_IN", "Organization"),
       ("Person", "FRIEND", "Person"),
       ("Person", "SPOUSE", "Person"),
       ("Person", "OWNED", "Organization")
   ],
   strict_mode=False
)
documents = [texts[0]]
graph_documents_filtered = llm_transformer_filtered.convert_to_graph_documents(
documents
)
print(f"document:{[texts[0]]}")
print(f"Nodes:{graph_documents_filtered[0].nodes}")
print(f"Relationships:{graph_documents_filtered[0].relationships}")

After you are satisfied with fine-tuning your prompt, it is now time to ingest into a Knowledge Graph. Note that the try - except is to explicitly handle any response that could not be properly inserted into Neo4J -- the code is designed so that any error is logged, but does not block the loop from moving on with converting subsequent chunks into graph.

for i in range(len(texts)):
    try:
        llm_transformer_filtered = LLMGraphTransformer(
        llm=llm,
        allowed_nodes=["Person", "Organization"],
        allowed_relationships=[
            ("Person", "WORKED_WITH", "Person"),
            ("Person", "WORKED_IN", "Organization"),
            ("Person", "FRIEND", "Person"),
            ("Person", "SPOUSE", "Person"),
            ("Person", "OWNED", "Organization")
        ],
        strict_mode=False
        )
        documents = [texts[i]]
        print("before converting")
        graph_documents_filtered = llm_transformer_filtered.convert_to_graph_documents(
        documents
        )
        print("after converting")
        graph.add_graph_documents(graph_documents_filtered, include_source=True)
        print(f"Nodes:{graph_documents_filtered[0].nodes}")
        print(f"Relationships:{graph_documents_filtered[0].relationships}")
    except Exception as e:
        print("EXCEPTION!")
        print(type(e))
        print(f"document:{[texts[i]]}")
        continue
    else:
        continue

The loop above took me about 46 minutes to ingest Harry Potter and the Philosopher's Stone, Harry Potter and the Chamber of Secrets, and Harry Potter and the Prisoner of Azkaban. I end up with 4868 unique nodes! A quick preview is available below. You can see that the graph is really crowded, and and it is hard to distinguish who is related to who else, and in what way.

We can now leverage on cypher queries to look at say, Dumbledore!

MATCH (person:Person WHERE person.id = 'Dumbledore')
RETURN person

Ok now we get just Dumbledore himself. Let's see how he is related to Harry Potter.

MATCH (person:Person WHERE person.id = 'Dumbledore')--(character:Person)
MATCH (character WHERE character.id = 'Harry')
RETURN person, character

Ok, now we are interested in what Harry and Dumbledore have spoked.

MATCH (person:Person WHERE person.id = 'Dumbledore')-[rel:SPOKE_TO]-(character:Person)
MATCH (character WHERE character.id = 'Harry')
MATCH (person)-[:MENTIONS]-(doc:Document)
RETURN person, character, doc

We can see that the graph is still really confusing, with many documents to go through to really find what we are looking for. We can see that the modelling of documents as nodes is not ideal, and further work could be done on the LLMGraphTransformer to make the graph more intuitive to use.

Conclusion

You can see how easy it is to set up a Knowledge Graph on your own local computer, without even needing to connect to the internet.

The github repo, which also contains the entire Knowledge Graph of the Harry Potter universe, is available here.

Postscript

To import the harry_potter.graphml file into Neo4J, copy the graphml file into neo4j /import folder, and run the following on the Neo4J browser:

CALL apoc.import.graphml("harry_potter.graphml", {readLabels: true})

Top comments (2)

Bonnie Xu • Dec 6 '24

It's great

Lewis Won • Dec 6 '24

Thank you!

DEV Community

Navigating the world of Harry Potter with Knowledge Graphs

Aim

What is knowledge graph

What do you need

Steps in setting up

Ollama

Python environment

Setting up Neo4J

Building the Knowledge Graph

Conclusion

Postscript

Top comments (2)

Read next

Building a Local AI Code Reviewer with ClientAI and Ollama

Advent of Code 2024 - Day 15: Warehouse Woes

How to Create Your Own RAG with Free LLM Models and a Knowledge Base

AI Pronunciation Trainer