DEV Community

Aarushi Kansal
Aarushi Kansal

Posted on

Personal movie recommendation agent with GPT4 + Neo4J

Large Language Model (LLM) powered applications become more powerful and intriguing when you start leveraging the model's reasoning abilities, rather than pure generation abilities.

And combine that reasoning with external tools like databases and APIs and you have yourself an application that can reason and take actions.

Over the past few months I've been deep in the world of using LLMs for both personalized recommendations and reasoning and combing the two concepts, in more of assistant style format.

When you think of recommendation engines, maybe you see it as a huuuge topic to tackle, teams of data scientists, ML engineers, GPUs?

While that holds true, you can still get started with some basic DS algorithms, an LLM and some UI libraries.

I wanted to see what I could do with a knowledge graph since they can be a good basis for recommendation engines. And luckily, LangChain already has a chain that can be used with Neo4j, so that's what we'll use.

Getting started

Data

First we need a dataset and we'll use the default movies one provided by Neo4j. You can also find others on Kaggle, or create your own (probably more for when you're ready for a production system, cause trust me, cleaning data is a tiresome task!)

You can find the various existing sets + set up your sandbox database here.

LangChain

If there's something you want to use with an LLM, your best bet is to first check what's going on in the LangChain world.

LLMs are very good at creating Cypher queries, and I wanted to use an LLM to give a user a conversational way to get their personalized recommendations. This essentially means, we need something that takes a users natural language, and creates a Cypher query out of it that can be used to query the DB.

And LangChain's Graph DB QA chain does just that.

Set up is farily simple, as shown below:

graph = Neo4jGraph(
    url="bolt://18.212.1.173:7687",
    username="neo4j", 
    password="finishes-executions-arcs"
)


os.environ['OPENAI_API_KEY'] = "sk-key"

chain = GraphCypherQAChain.from_llm(
    ChatOpenAI(model_name="gpt-4", temperature=0.0), graph=graph, verbose=True,
)
Enter fullscreen mode Exit fullscreen mode

Main thing here is setting the temperature to 0 so the LLM doesn't try and get 'creative' with queries. We want a deterministic output here.

With just this set up alone, you can start doing some basic Q+A as well as getting it to run basic queries.

Q&A + Basic queries

Let's try out this chain, like so:

chain.run(
    """
        Set Cynthia Freeman's rating for Inception to 4.0.
    """
)
Enter fullscreen mode Exit fullscreen mode

The key here is that while this LLM has no awareness of your actual data or schema by default, the chain runs a few queries to get the entire schema and pass the schema in as context into the prompt. So all of a sudden, it's like your LLM has a solid understanding of your exact DB! Simple but elegant solution really.

And that's why it's great at figuring out queries like the above.

Let's try out some other questions, we're working on recommendations right, so let's see if our LLM can find us movies in similar genres.

find me movies most similar to 'Inception'

At this point, most likely it'll try and base it on genre or imdb ratings. So far pretty good. The LLM is doing enough reasoning to understand what 'similar' could mean (ie. genre or ratings).

You can tell it to base it on genre, making sure it always basis similarity on genre. Go further and tell it actors and it'll correctly identify that - by figuring out the movie to actor/s relationship and counting how many shared actors a movie has to Inception.

Now, at this point, the aspect I'm more interested in is the different similarity functions and algorithms we can apply to start getting recommendations for movies.

In particular, I want to use Neo4j's data science library. You can go ahead and read about the different ways of calculating similarities, if you aren't familiar.

And I want to some collaborative filtering, based on kNN. Essentially, I want recommendations based on users with similar tastes to mine, their top rated movies, that I haven't rated (no rating infers hasn't been watched).

So you can try any form of that question now

Who are the 5 users with tastes in movies most similar to Aarushi Kansal? What movies have they rated highly that Aarushi Kansal hasn't rated yet?

This seems pretty specific right, but unfortunately, it doesn't give me quite what I want. The query it comes out with is:

MATCH (u1:User {name: "Aarushi Kansal"})-[:RATED]->(m1:Movie)<-[:IN_GENRE]-(g:Genre)-[:IN_GENRE]->(m2:Movie)<-[:RATED]-(u2:User)
WHERE NOT (u1)-[:RATED]->(m2)
WITH u2, count(*) AS similarity, m2.title AS recommended_movie, m2.imdbRating as rating
ORDER BY similarity DESC, rating DESC
RETURN u2.name AS user, recommended_movie
LIMIT 5
Enter fullscreen mode Exit fullscreen mode

Okay so what if we go more specific?

Who are the 5 users with tastes in movies most similar to Aarushi Kansal? What movies have they rated highly that Aarushi Kansal hasn't rated yet? Use kNN and Pearson similarity

At this point, it tried hard, but the query just doesn't work at all:

MATCH (u1:User {name: "Aarushi Kansal"})-[:RATED]->(m1:Movie)<-[:RATED]-(u2:User)
WITH u1, u2, tofloat(count(m1)) as numCommonMovies
MATCH (u1)-[r1:RATED]->(m1:Movie)<-[r2:RATED]-(u2)
WITH u1, u2, numCommonMovies, m1,
     (r1.rating - u1.avgRating) * (r2.rating - u2.avgRating) as simNumer,
     (r1.rating - u1.avgRating) * (r1.rating - u1.avgRating) as simDenom1,
     (r2.rating - u2.avgRating) * (r2.rating - u2.avgRating) as simDenom2
WITH u1, u2, numCommonMovies, sum(simNumer) as simNumer, sum(simDenom1) as simDenom1, sum(simDenom2) as simDenom2
WITH u1, u2, simNumer, sqrt(simDenom1 * simDenom2) as simDenom
WHERE simDenom > 0
WITH u1, u2, numCommonMovies, simNumer / simDenom as pearson
ORDER BY pearson DESC, numCommonMovies DESC, u2.name ASC
LIMIT 10
MATCH (u2)-[r:RATED]->(m:Movie)
WHERE NOT (u1)-[:RATED]->(m) AND r.rating >= 4
RETURN u2.name as UserName, m.title as MovieTitle, r.rating as UserRating
ORDER BY r.rating DESC, m.title ASC, u2.name ASC;
Enter fullscreen mode Exit fullscreen mode

At this point, the best solution is to actually give it an example of the query you actually want:

MATCH (u1:User {name:"Aarushi Kansal"})-[r:RATED]->(m:Movie)
WITH u1, avg(r.rating) AS u1_mean

MATCH (u1)-[r1:RATED]->(m:Movie)<-[r2:RATED]-(u2)
WITH u1, u1_mean, u2, COLLECT({r1: r1, r2: r2}) AS ratings WHERE size(ratings) > 10

MATCH (u2)-[r:RATED]->(m:Movie)
WITH u1, u1_mean, u2, avg(r.rating) AS u2_mean, ratings

UNWIND ratings AS r

WITH sum( (r.r1.rating-u1_mean) * (r.r2.rating-u2_mean) ) AS nom,
     sqrt( sum( (r.r1.rating - u1_mean)^2) * sum( (r.r2.rating - u2_mean) ^2)) AS denom,
     u1, u2 WHERE denom <> 0

WITH u1, u2, nom/denom AS pearson
ORDER BY pearson DESC LIMIT 10

MATCH (u2)-[r:RATED]->(m:Movie) WHERE NOT EXISTS( (u1)-[:RATED]->(m) )

RETURN m.title, SUM( pearson * r.rating) AS score
ORDER BY score DESC LIMIT 25
Enter fullscreen mode Exit fullscreen mode

Annd, it works, I get movies like The Silence of the Lambs, Forest Gump, Pulp Fiction etc.

Up til now you can see we've gotten to a pretty good place, we can do querying on a knowledge graph, and even without too much context/ prompt engineering it's able to determine what relationships to search through (e.g. movies -> genres). As you add more guidance it gets better and better.

But having to give it an example of every similarity function or algorithm makes it a pretty poor assistant and bad user experience. Users would be better off just querying the DB or having some kind of button that runs the query etc.

And that's where we now combine what we have so far, with an Agent and Chainlit for that assistant style user experience.

Agent

First let's briefly summarize what we're aiming for.

A human-eqsue experience for a user, for movie recommendations. They should be able to ask for movies, based on their interests, update ratings and the agent should have some understanding about a user.

1) Chat interface
2) Natural language understanding
3) Access to data source (I'm also going to give it access to the internet, because I think an agent should be able get fun facts or summarizes, or even recent news about actors or directors)
4) Reasoning abilities to choose what tools and action to take

The code

For the interface, I'm using chainlit, a new UI library for building LLM apps, with an integration with Langchain.

I'm using the out of the box chat UI

Image description

For natural lanaguage understanding I'm using GPT-4, but you can sub out your favorite LLM here.

llm1 = OpenAI(temperature=0, streaming=True)
    # search = SerpAPIWrapper()
    memory = ConversationBufferMemory(
        memory_key="chat_history", return_messages=True)
Enter fullscreen mode Exit fullscreen mode

With memory, so it can remember the context of our conversation (an agent that forgets messages, feels like bad UX!)

I'm using chat-zero-shot-react-description, which is a MRKL implementation for chat models. If you're interested in MRKL agents and an intro into tools, you can check out another blog of mine. But in a nutshell, this is what allows the model to choose what tool (DB or Google search) to use when answering a user's requests.

    tools = [
        Tool(
            name="Cypher search",
            func=cypher_tool.run,
            description="""
            Utilize this tool to search within a movie database, 
            specifically designed to find movie recommendations for users.
            This specialized tool offers streamlined search capabilities
            to help you find the movie information you need with ease.
            """,
        ),
        Tool(
            name="Google search",
            func=search.run,
            description="""
    Utilize this tool to search the internet when you're missing information. In particular if you want recent events or news.
    """,
        )
    ]
    return initialize_agent(
        tools, llm1, agent="chat-zero-shot-react-description", verbose=True, memory=memory
    )
Enter fullscreen mode Exit fullscreen mode

Annnd, finally the Cypher/Neo4j/knowledge graph parts! So, remember earlier on, we found that the more context you give the LLM (details on which parts of the schema to use, example queries etc), the better it performs at finding the right movies for you? But the whole point here is to give the user an easy way to get recommendations, but not have to know or understand the inner workings of our DB or even Cypher.

Essentially, we want to bake the logic in the backend and the user never has to know.

So, what we're going to do is determine which exact usecases we want this agent to handle (or feel like an expert in). I.e. we're going to tell it exactly how to handle certain requests.

For the purpose of this post, I'm giving it the specifics of finding movie recommendations based on similar users and movies similar to another movie, based on content (i.e genres) using the Jacard index.

CYPHER_GENERATION_TEMPLATE = """Task:Generate Cypher statement to query a graph database. 
Instructions:
Make recommendations for a given user only. 
Update ratings for a given user only.
Schema:
{schema}
Username: 
{username}

Examples:
# When a user asks for movie recommendations: 


    # When asked for movies similar to a movie, use the weighted content algorithm, like this: 

CYPHER_GENERATION_PROMPT = PromptTemplate(
    input_variables=["schema", "question", "username"], template=CYPHER_GENERATION_TEMPLATE
)

Enter fullscreen mode Exit fullscreen mode

Note: this is the template built in LangChain, that I've repurposed for my needs. If you remove this piece of code, it will still work, just with the defaults we used earlier.

At this point you might be thinking, we've made this agent kind of 'dumb' by only allowing it to do things..? An if statement could have sufficed? Well, don't worry giving it examples like this only expands it's so called knowledge. It's still able to do all the other things like find movies based on actors and update your ratings.

You can play around with different/more algorithms based on your needs.

Let's see it action so far by asking it "what movie should I watch?":

Image description

So you can see it starts using the query for collaborative filtering based on neighbourhood - exactly what I wanted.

Let's try another one, "Suggest movies similar to Inception"

Image description

Again, now you can see it using the desired query.

If you want, now you can also start asking it about actors, news about them and so on, which it'll handle via our Google search tool.

Annd there you have it, an end to end personalised movie agent for you!
You can check out the full code here.

Links

https://docs.chainlit.io/examples/mrkl
https://neo4j.com/docs/graph-data-science/current/
https://neo4j.com/docs/graph-data-science/current/algorithms/kmeans/
https://python.langchain.com/docs/modules/chains/additional/graph_cypher_qa
https://neo4j.com/developer-blog/exploring-practical-recommendation-systems-in-neo4j/

Top comments (0)