Exson Joseph

Posted on Apr 2

Automating SQL Queries with LangChain and Gemini : A Step-by-Step Guide

#sql #llm #gemini #langchain

Learn how to automate SQL query generation and execution using LangChain, Google Gemini AI, and MySQL. This guide walks you through setting up a seamless pipeline from transforming natural language questions into SQL queries to executing them and generating insightful answers.

Environment Setup

Creating a Virtual Environment

Isolate your project dependencies by creating a virtual environment:

py -m venv .venv

activate virtual environment

.\.venv\Scripts\activate

Installing Required Packages

Install the necessary Python packages:

pip install mysql-connector-python langchain langchain-community langchain-google-genai python-dotenv

Configuring Environment Variables

Create a .env file with your configuration:

LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT="company-sql"
GOOGLE_API_KEY=your_google_api_key
SQL_HOST=your_db_host
SQL_USER=your_db_username
SQL_PASSWORD=your_db_password
SQL_DB_NAME=your_database_name

Building the SQL Automation Pipeline

Importing Required Modules

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_community.utilities import SQLDatabase
from langchain_community.tools.sql_database.tool import QuerySQLDatabaseTool
from langchain import hub
from dotenv import load_dotenv
import os

ChatGoogleGenerativeAI : To use Google's Gemini model.
SQLDatabase : Manages database connections using SQL URI.
QuerySQLDatabaseTool : Executes queries and retrieves results.
hub : Accesses pre-defined prompts from LangChain Hub.
load_dotenv : Manages environment variables.
os : Accesses operating system functionality.

Loading Configuration

load_dotenv(override=True)

SQL_HOST=os.getenv("SQL_HOST")
SQL_USER=os.getenv("SQL_USER")
SQL_PASSWORD=os.getenv("SQL_PASSWORD")
SQL_DB_NAME=os.getenv("SQL_DB_NAME")

This code loads sensitive configuration from your .env file, keeping credentials secure and out of your source code.

Initializing Gemini AI Model

llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")

We're using Google's gemini-1.5-flash model, which offers excellent performance for SQL generation tasks while being cost-effective.

Establishing Database Connection

connection_Uri = f"mysql+mysqlconnector://{SQL_USER}:{SQL_PASSWORD}@{SQL_HOST}/{SQL_DB_NAME}"
db = SQLDatabase.from_uri(connection_Uri)

This creates a connection string and establishes a connection to your MySQL database using the credentials from your environment variables.

Core Functionality Implementation

SQL Query Generation

query_prompt_template = hub.pull("langchain-ai/sql-query-system-prompt")

def write_query(question: str):
    """Generate SQL query from the user's question."""
    prompt = query_prompt_template.invoke(
        {
            "dialect": db.dialect,
            "top_k": 10,
            "table_info": db.get_table_info(),
            "input": question,
        }
    )
    response = llm.invoke(prompt.to_string())

    extraction_prompt = """
    Please extract the SQL query from the following text and return only the SQL query without any additional characters or formatting:

    {response}

    SQL Query:
    """
    # Format the prompt with the actual response
    prompt = extraction_prompt.format(response=response)
    # Invoke the language model with the prompt
    parsed_query = llm.invoke(prompt)
    return parsed_query.content

Pulls a specialized SQL generation prompt from LangChain Hub
Formats the prompt with database schema information
Sends the prompt to Gemini AI for query generation
Uses a second prompt to extract just the SQL from the response

Query Execution

def execute_query(query: str):
    """Execute the SQL query."""
    execute_query_tool = QuerySQLDatabaseTool(db=db)
    return execute_query_tool.invoke(query)

This function creates a query execution tool and runs the generated SQL against your database, returning the raw results.

Natural Language Answer Generation

def generate_answer(question: str, query: str, result: str):
    """Generate an answer using the query results."""
    prompt = (
        "Given the following user question, corresponding SQL query, "
        "and SQL result, answer the user question.\n\n"
        f'Question: {question}\n'
        f'SQL Query: {query}\n'
        f'SQL Result: {result}'
    )
    response = llm.invoke(prompt)
    return response.content

This function takes the original question, generated SQL, and query results, then asks Gemini to formulate a human-friendly answer.

Putting It All Together

question = "Which employee is leading Project Gamma"

query = write_query(question)
result = execute_query(query)
answer = generate_answer(question,query, result )
print(answer)

Output

Charlie Brown is the Marketing Lead for Project Gamma.

Conclusion

This implementation demonstrates how to create a powerful natural language interface for your SQL databases. By combining LangChain's orchestration capabilities with Gemini's advanced language understanding, you can build systems that make data accessible to non-technical users while maintaining the precision of SQL queries.

Complete Code

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_community.utilities import SQLDatabase
from langchain_community.tools.sql_database.tool import QuerySQLDatabaseTool
from langchain import hub
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv(override=True)

SQL_HOST=os.getenv("SQL_HOST")
SQL_USER=os.getenv("SQL_USER")
SQL_PASSWORD=os.getenv("SQL_PASSWORD")
SQL_DB_NAME=os.getenv("SQL_DB_NAME")

# Initialize the Gemini model
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")

# Connect to the MySQL database
connection_Uri = f"mysql+mysqlconnector://{SQL_USER}:{SQL_PASSWORD}@{SQL_HOST}/{SQL_DB_NAME}"
db = SQLDatabase.from_uri(connection_Uri)

# Pull the SQL query prompt from LangChain Hub
query_prompt_template = hub.pull("langchain-ai/sql-query-system-prompt")


def write_query(question: str):
    """Generate SQL query from the user's question."""
    prompt = query_prompt_template.invoke(
        {
            "dialect": db.dialect,
            "top_k": 10,
            "table_info": db.get_table_info(),
            "input": question,
        }
    )
    response = llm.invoke(prompt.to_string())

    extraction_prompt = """
    Please extract the SQL query from the following text and return only the SQL query without any additional characters or formatting:

    {response}

    SQL Query:
    """
    # Format the prompt with the actual response
    prompt = extraction_prompt.format(response=response)
    # Invoke the language model with the prompt
    parsed_query = llm.invoke(prompt)
    return parsed_query.content

def execute_query(query: str):
    """Execute the SQL query."""
    execute_query_tool = QuerySQLDatabaseTool(db=db)
    return execute_query_tool.invoke(query)

def generate_answer(question: str, query: str, result: str):
    """Generate an answer using the query results."""
    prompt = (
        "Given the following user question, corresponding SQL query, "
        "and SQL result, answer the user question.\n\n"
        f'Question: {question}\n'
        f'SQL Query: {query}\n'
        f'SQL Result: {result}'
    )
    response = llm.invoke(prompt)
    return response.content

question = "Which employee is leading Project Gamma"
query = write_query(question)
result = execute_query(query)
answer = generate_answer(question,query, result )
print(answer)

DEV Community