Learn how to automate SQL query generation and execution using LangChain, Google Gemini AI, and MySQL. This guide walks you through setting up a seamless pipeline from transforming natural language questions into SQL queries to executing them and generating insightful answers.
Environment Setup
Creating a Virtual Environment
Isolate your project dependencies by creating a virtual environment:
py -m venv .venv
activate virtual environment
.\.venv\Scripts\activate
Installing Required Packages
Install the necessary Python packages:
pip install mysql-connector-python langchain langchain-community langchain-google-genai python-dotenv
Configuring Environment Variables
Create a .env
file with your configuration:
LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT="company-sql"
GOOGLE_API_KEY=your_google_api_key
SQL_HOST=your_db_host
SQL_USER=your_db_username
SQL_PASSWORD=your_db_password
SQL_DB_NAME=your_database_name
Building the SQL Automation Pipeline
Importing Required Modules
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_community.utilities import SQLDatabase
from langchain_community.tools.sql_database.tool import QuerySQLDatabaseTool
from langchain import hub
from dotenv import load_dotenv
import os
-
ChatGoogleGenerativeAI
: To use Google's Gemini model. -
SQLDatabase
: Manages database connections using SQL URI. -
QuerySQLDatabaseTool
: Executes queries and retrieves results. -
hub
: Accesses pre-defined prompts from LangChain Hub. -
load_dotenv
: Manages environment variables. -
os
: Accesses operating system functionality.
Loading Configuration
load_dotenv(override=True)
SQL_HOST=os.getenv("SQL_HOST")
SQL_USER=os.getenv("SQL_USER")
SQL_PASSWORD=os.getenv("SQL_PASSWORD")
SQL_DB_NAME=os.getenv("SQL_DB_NAME")
This code loads sensitive configuration from your .env
file, keeping credentials secure and out of your source code.
Initializing Gemini AI Model
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")
We're using Google's gemini-1.5-flash
model, which offers excellent performance for SQL generation tasks while being cost-effective.
Establishing Database Connection
connection_Uri = f"mysql+mysqlconnector://{SQL_USER}:{SQL_PASSWORD}@{SQL_HOST}/{SQL_DB_NAME}"
db = SQLDatabase.from_uri(connection_Uri)
This creates a connection string and establishes a connection to your MySQL database using the credentials from your environment variables.
Core Functionality Implementation
SQL Query Generation
query_prompt_template = hub.pull("langchain-ai/sql-query-system-prompt")
def write_query(question: str):
"""Generate SQL query from the user's question."""
prompt = query_prompt_template.invoke(
{
"dialect": db.dialect,
"top_k": 10,
"table_info": db.get_table_info(),
"input": question,
}
)
response = llm.invoke(prompt.to_string())
extraction_prompt = """
Please extract the SQL query from the following text and return only the SQL query without any additional characters or formatting:
{response}
SQL Query:
"""
# Format the prompt with the actual response
prompt = extraction_prompt.format(response=response)
# Invoke the language model with the prompt
parsed_query = llm.invoke(prompt)
return parsed_query.content
- Pulls a specialized SQL generation prompt from LangChain Hub
- Formats the prompt with database schema information
- Sends the prompt to Gemini AI for query generation
- Uses a second prompt to extract just the SQL from the response
Query Execution
def execute_query(query: str):
"""Execute the SQL query."""
execute_query_tool = QuerySQLDatabaseTool(db=db)
return execute_query_tool.invoke(query)
This function creates a query execution tool and runs the generated SQL against your database, returning the raw results.
Natural Language Answer Generation
def generate_answer(question: str, query: str, result: str):
"""Generate an answer using the query results."""
prompt = (
"Given the following user question, corresponding SQL query, "
"and SQL result, answer the user question.\n\n"
f'Question: {question}\n'
f'SQL Query: {query}\n'
f'SQL Result: {result}'
)
response = llm.invoke(prompt)
return response.content
This function takes the original question, generated SQL, and query results, then asks Gemini to formulate a human-friendly answer.
Putting It All Together
question = "Which employee is leading Project Gamma"
query = write_query(question)
result = execute_query(query)
answer = generate_answer(question,query, result )
print(answer)
Output
Charlie Brown is the Marketing Lead for Project Gamma.
Conclusion
This implementation demonstrates how to create a powerful natural language interface for your SQL databases. By combining LangChain's orchestration capabilities with Gemini's advanced language understanding, you can build systems that make data accessible to non-technical users while maintaining the precision of SQL queries.
Complete Code
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_community.utilities import SQLDatabase
from langchain_community.tools.sql_database.tool import QuerySQLDatabaseTool
from langchain import hub
from dotenv import load_dotenv
import os
# Load environment variables
load_dotenv(override=True)
SQL_HOST=os.getenv("SQL_HOST")
SQL_USER=os.getenv("SQL_USER")
SQL_PASSWORD=os.getenv("SQL_PASSWORD")
SQL_DB_NAME=os.getenv("SQL_DB_NAME")
# Initialize the Gemini model
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")
# Connect to the MySQL database
connection_Uri = f"mysql+mysqlconnector://{SQL_USER}:{SQL_PASSWORD}@{SQL_HOST}/{SQL_DB_NAME}"
db = SQLDatabase.from_uri(connection_Uri)
# Pull the SQL query prompt from LangChain Hub
query_prompt_template = hub.pull("langchain-ai/sql-query-system-prompt")
def write_query(question: str):
"""Generate SQL query from the user's question."""
prompt = query_prompt_template.invoke(
{
"dialect": db.dialect,
"top_k": 10,
"table_info": db.get_table_info(),
"input": question,
}
)
response = llm.invoke(prompt.to_string())
extraction_prompt = """
Please extract the SQL query from the following text and return only the SQL query without any additional characters or formatting:
{response}
SQL Query:
"""
# Format the prompt with the actual response
prompt = extraction_prompt.format(response=response)
# Invoke the language model with the prompt
parsed_query = llm.invoke(prompt)
return parsed_query.content
def execute_query(query: str):
"""Execute the SQL query."""
execute_query_tool = QuerySQLDatabaseTool(db=db)
return execute_query_tool.invoke(query)
def generate_answer(question: str, query: str, result: str):
"""Generate an answer using the query results."""
prompt = (
"Given the following user question, corresponding SQL query, "
"and SQL result, answer the user question.\n\n"
f'Question: {question}\n'
f'SQL Query: {query}\n'
f'SQL Result: {result}'
)
response = llm.invoke(prompt)
return response.content
question = "Which employee is leading Project Gamma"
query = write_query(question)
result = execute_query(query)
answer = generate_answer(question,query, result )
print(answer)
Top comments (0)