Forem

Cover image for AI-Powered Graph Exploration with LangChain's NLP Capabilities, Question Answer Using Langchain
Deepak Raj
Deepak Raj

Posted on • Edited on

2 1

AI-Powered Graph Exploration with LangChain's NLP Capabilities, Question Answer Using Langchain

Have you ever struggled to write complex SQL or graph database queries? What if you could just describe what you want in plain English and get the results directly? Thanks to advancements in natural language processing, tools like LangChain make this not only possible but incredibly intuitive.

In this article, I will demonstrate how to use Python, LangChain, and Neo4j to seamlessly query a graph database using natural language. LangChain will handle the conversion of natural language queries into Cypher queries, providing a streamlined and time-saving experience.

What is LangChain?

LangChain is an open-source framework designed to simplify the creation of applications that utilize large language models (LLMs). Whether you're building chatbots, question-answering systems, text summarizers, or tools for generating database queries, LangChain provides a robust foundation.

By leveraging LangChain, developers can quickly prototype and deploy applications that bridge the gap between natural language and machine intelligence.

Prerequisites

Before we dive in, ensure that you have Python and Neo4j installed on your system. If not, you can install them using the resources below:

Alternatively, you can run Neo4j in Docker. Here’s the command to do so:

Run Neo4j in Docker

#!/bin/bash
# Define environment variables
NEO4J_VERSION="4.4.12" # Specify the Neo4j version you want to use
CONTAINER_NAME="neo4j_container"
NEO4J_DATA_DIR="./neo4j_data" # Local directory for Neo4j data
NEO4J_IMPORT_DIR="./neo4j_import" # Local directory for import files
NEO4J_PASSWORD="password" # Set a password for the Neo4j admin user
# Create local directories if they do not exist
mkdir -p "$NEO4J_DATA_DIR" "$NEO4J_IMPORT_DIR"
# Pull the Neo4j Docker image
echo "Pulling Neo4j version $NEO4J_VERSION..."
docker pull neo4j:$NEO4J_VERSION
# Run the Neo4j container
echo "Starting Neo4j container..."
docker run \
--name $CONTAINER_NAME \
-d \
-p 7474:7474 -p 7687:7687 \
-v "$NEO4J_DATA_DIR:/data" \
-v "$NEO4J_IMPORT_DIR:/import" \
-e NEO4J_AUTH=neo4j/$NEO4J_PASSWORD \
-e NEO4JLABS_PLUGINS='["apoc"]' \
-e NEO4J_dbms_security_procedures_unrestricted="apoc.*" \
--restart unless-stopped \
neo4j:$NEO4J_VERSION
echo "Neo4j is now running."
echo "Access it at http://localhost:7474 with username 'neo4j' and your specified password."
# what is username and password for neo4j
# username: neo4j
# password: password
view raw neo4j_docker.sh hosted with ❤ by GitHub

Setting Up the Environment

Install Python Dependencies

Install the necessary Python libraries by running the following command:

pip install --upgrade --quiet langchain langchain-neo4j langchain-openai langgraph
Enter fullscreen mode Exit fullscreen mode

Download the Dataset

For this tutorial, we’ll use the Goodreads Book Datasets With User Rating 2M, which can be downloaded from here.

Load the Dataset into Neo4j

To populate the graph database with our dataset, use the following script:

import pandas as pd
from neo4j import GraphDatabase
# Connect to Neo4j database
uri = "bolt://localhost:7687"
username = "neo4j"
password = "password"
driver = GraphDatabase.driver(uri, auth=(username, password))
def create_book_nodes(tx, book_title, publisher, author_name, language, rating):
# Create Book node and Author node if not exists, then create relationships
tx.run("""
MERGE (b:Book {title: $book_title, publisher: $publisher, language: $language, rating: $rating})
MERGE (a:Author {name: $author_name})
MERGE (a)-[:WROTE]->(b)
""", book_title=book_title, publisher=publisher, language=language, rating=rating, author_name=author_name)
def load_data_to_neo4j(csv_file_path):
# Load the Goodreads Books dataset into a pandas DataFrame
df = pd.read_csv(csv_file_path)
# replace NaN values with "Not Available"
df.fillna("Not Available", inplace=True)
# Start a session with Neo4j
with driver.session() as session:
# Iterate over each row in the dataset and load it into Neo4j
for index, row in df.iterrows():
# Extract book details
book_title = row['Name']
author_name = row['Authors'] # Assuming 'authors' column contains the author names
publisher = row['Publisher']
language = row['Language']
rating = row['Rating']
# Create nodes and relationships in Neo4j
session.write_transaction(create_book_nodes, book_title, publisher, author_name, language, rating)
print(f"Data from {csv_file_path} has been loaded into Neo4j.")
# Path to the Goodreads dataset CSV file
csv_file_path = "/home/deepak/Downloads/good_reads_book_data.csv" # Replace with your actual path to the dataset
# Load data into Neo4j
load_data_to_neo4j(csv_file_path)
# Close the connection
driver.close()

Querying the Graph Database Using LangChain

With everything set up, we’ll now use LangChain to query the graph database using natural language. LangChain will process your input, convert it into a Cypher query, and return the results. For this demonstration, we’ll leverage the GPT-4o-mini model and the following tools:

import os
os.environ["OPENAI_API_KEY"] = "sk-*"
from langchain.prompts import PromptTemplate
from langchain_neo4j import Neo4jGraph
from langchain_neo4j import GraphCypherQAChain
from langchain_openai import ChatOpenAI
os.environ["NEO4J_URI"] = os.getenv("NEO4J_URI", "bolt://localhost:7687")
os.environ["NEO4J_USERNAME"] = os.getenv("NEO4J_USERNAME", "neo4j")
os.environ["NEO4J_PASSWORD"] = os.getenv("NEO4J_PASSWORD", "password")
graph = Neo4jGraph()
schema = graph.schema
sample_query = """
MATCH (a:Author)-[:WROTE]->(b:Book)
RETURN b.title AS book_title, b.publisher AS publisher, b.language AS language, b.rating AS rating, a.name AS author_name
"""
result = graph.query(sample_query)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.1) # Adjusted temperature for better responses
chain = GraphCypherQAChain.from_llm(
graph=graph,
llm=llm,
verbose=True,
validate_cypher=True,
allow_dangerous_requests=True, # Use cautiously in production
return_intermediate_steps=True, # Useful for debugging
)
def get_response(query_input):
response = chain.invoke({
"query": query_input
})
return response
prompt_template = """You are an expert in querying Neo4j graphs and liberating AI. Your task is to answer questions about a graph database containing data about Goodreads books and authors. The nodes in the graph represent Author and Book, and the edges between them represent the relationship WROTE.
For the given query input, follow these steps:
- use contains() function to match the query and format the response
- use the context output, cypher query and intermidiate steps to answer the query
1. Identify the nodes
- Break down the input query to identify the relevant Author and Book entities (nodes) mentioned.
- Look for keywords or entities that could correspond to nodes in the graph.
2. Match the identified nodes in the graph
- Using the identified nodes, construct a search that matches them within the graph database.
- Ensure that the node relationships (i.e., `WROTE`) are properly accounted for when linking authors and books.
3. Create and execute the Cypher query
- Based on the extracted entities, formulate an appropriate Cypher query to retrieve the data from Neo4j.
- Execute the query to fetch the relevant results from the graph database.
4. Format the response:
- Return the result in a human-readable format (not the raw Cypher query), summarizing the data from the query.
Query: {query_input}
Schema: {schema}
Please follow the steps and return a human-readable answer based on the graph data, not the Cypher query.
Cypher Query:
"""
search_results = ""
while True:
query_input = input("Enter the query: ")
if query_input == "exit":
break
else:
query_input += search_results
template = PromptTemplate(input_variables=["query_input", "schema"],
template=prompt_template)
formatted_prompt = template.format(query_input=query_input, schema=schema)
response = get_response(formatted_prompt)
result = response["result"]
print("Query: ", query_input)
print("Result: ", result)
# share all books published by the Penguin Books?
# what is the author name of The Complete Verse and Other ?
view raw langchain.py hosted with ❤ by GitHub

Example Queries

Here are some sample queries and their results:

Query 1: Find all the books written by "J.K. Rowling" and published by "Bloomsbury Publishing".

Result:

  • Harry Potter and the Sorcerer’s Stone: Rating: 4.8, Language: English
  • Harry Potter and the Chamber of Secrets: Rating: 4.7, Language: English

Query 2: Who is the author of "The Lord of the Rings"?

Result: The author of "The Lord of the Rings" is J.R.R. Tolkien.

Query 3: Who is the author of "The Power of One"?

Result: The author of "The Power of One" is Bryce Courtenay.

Query 4: List books published by Penguin Books.

Result:
The following books are published by Penguin Books:

  1. Untouchable - Rating: 3.72, Language: English
  2. The Complete Verse and Other Nonsense - Rating: 4.18, Language: Not Available
  3. The Beloved: Reflections on the Path of the Heart - Rating: 4.19, Language: English
  4. Americana - Rating: 3.43, Language: English
  5. Great Jones Street - Rating: 3.48, Language: English
  6. Gravity’s Rainbow - Rating: 4.0, Language: English
  7. City of Glass (The New York Trilogy, #1) - Rating: 3.79, Language: English
  8. Ghosts (The New York Trilogy, #2) - Rating: 3.64, Language: English
  9. Moon Palace - Rating: 3.94, Language: English
  10. The Invention of Solitude: A Memoir - Rating: 3.78, Language: Not Available

Why Use Natural Language Queries?

Natural language querying offers numerous advantages:

  1. Ease of Use: No need to memorize complex query languages like SQL or Cypher.
  2. Efficiency: Quickly retrieve results without debugging intricate query syntax.
  3. Accessibility: Enables non-technical users to interact with databases effortlessly.

Conclusion

LangChain combined with Neo4j demonstrates how powerful natural language processing can be in simplifying database interactions. This approach opens up possibilities for creating user-friendly tools like chatbots, question-answering systems, and even analytics platforms.

If you found this guide helpful or have any questions, feel free to share them in the comments below. Let’s continue exploring the limitless possibilities of natural language and AI-driven technologies!

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more →

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

👋 Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay