DEV Community

Akmal Chaudhri for SingleStore

Posted on • Updated on

Quick tip: Using SingleStoreDB with LlamaIndex

Abstract

In a previous article, we saw an example of the integration of SingleStoreDB with LangChain. Another emerging framework is LlamaIndex. In this short article, we'll take an example from the LlamaIndex documentation and apply it to SingleStoreDB.

Introduction

In this short article, we'll test the ability of SingleStoreDB to work with LlamaIndex using an example from the documentation. This should be very straightforward. In later articles, we'll see if we can achieve tighter integration.

As described in a previous article, we'll follow the instructions to create a SingleStoreDB Cloud account, Workspace Group, Workspace, and Notebook.

Create the Database

From the left nav in our SingleStoreDB Cloud account, we'll select the SQL Editor and run the following commands:

CREATE DATABASE IF NOT EXISTS demo_db;

USE demo_db;

DROP TABLE IF EXISTS city_stats;
Enter fullscreen mode Exit fullscreen mode

Fill out the Notebook

We'll follow the example described in the documentation and modify it, where required, to work with SingleStoreDB.

First, we'll suppress some warnings which result from SQLAlchemy as we'll use the version installed by LlamaIndex rather than the version installed by SingleStore:

import warnings

warnings.filterwarnings("ignore")
Enter fullscreen mode Exit fullscreen mode

Next, we'll install a recent version of LlamaIndex:

!pip uninstall sqlalchemy-singlestoredb -y --quiet
!pip install llama-index==0.6.35 --quiet
Enter fullscreen mode Exit fullscreen mode

We'll now enter and save our OpenAI API Key:

import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
Enter fullscreen mode Exit fullscreen mode

We'll keep the logging-level commented out:

#import logging
#import sys

#logging.basicConfig(stream=sys.stdout, level=logging.INFO)
#logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
Enter fullscreen mode Exit fullscreen mode

And directly import from IPython.display:

from IPython.display import Markdown, display
Enter fullscreen mode Exit fullscreen mode

Create Database Schema

We'll get the appropriate imports from SQLAlchemy:

from sqlalchemy import (
    create_engine,
    MetaData,
    Table,
    Column,
    String,
    Integer,
    select,
    column,
)
Enter fullscreen mode Exit fullscreen mode

Next, we'll create the connection:

engine = create_engine("mysql+pymysql://admin:<password>@<host>:3306/demo_db")
metadata_obj = MetaData()
Enter fullscreen mode Exit fullscreen mode

We'll replace the <password> and <host> with the values from our SingleStoreDB Cloud account.

Now we'll create a table:

# create city SQL table
table_name = "city_stats"
city_stats_table = Table(
    table_name,
    metadata_obj,
    Column("city_name", String(16), primary_key=True),
    Column("population", Integer),
    Column("country", String(16), nullable=False),
)
metadata_obj.create_all(engine)
Enter fullscreen mode Exit fullscreen mode

Define SQL Database

We'll continue with the LlamaIndex documentation:

from llama_index import SQLDatabase, ServiceContext
from langchain import OpenAI
from llama_index import LLMPredictor
Enter fullscreen mode Exit fullscreen mode
llm = OpenAI(temperature=0, model="text-davinci-002")
service_context = ServiceContext.from_defaults(llm=llm)
Enter fullscreen mode Exit fullscreen mode
sql_database = SQLDatabase(engine, include_tables=["city_stats"])
Enter fullscreen mode Exit fullscreen mode
sql_database.table_info
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

'\nCREATE TABLE city_stats (\n\tcity_name VARCHAR(16) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL, \n\tpopulation INTEGER(11), \n\tcountry VARCHAR(16) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL\n)\n\n/*\n3 rows from city_stats table:\ncity_name\tpopulation\tcountry\n\n*/'
Enter fullscreen mode Exit fullscreen mode

Next, we'll continue with the code:

sql_database = SQLDatabase(engine, include_tables=["city_stats"])
from sqlalchemy import insert

rows = [
    {"city_name": "Toronto", "population": 2930000, "country": "Canada"},
    {"city_name": "Tokyo", "population": 13960000, "country": "Japan"},
    {"city_name": "Chicago", "population": 2679000, "country": "United States"},
    {"city_name": "Seoul", "population": 9776000, "country": "South Korea"},
]
for row in rows:
    stmt = insert(city_stats_table).values(**row)
    with engine.connect() as connection:
        cursor = connection.execute(stmt)
        connection.commit()
Enter fullscreen mode Exit fullscreen mode
# view current table
stmt = select(city_stats_table.c["city_name", "population", "country"]).select_from(
    city_stats_table
)

with engine.connect() as connection:
    results = connection.execute(stmt).fetchall()
    print(results)
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

[('Seoul', 9776000, 'South Korea'), ('Toronto', 2930000, 'Canada'), ('Tokyo', 13960000, 'Japan'), ('Chicago', 2679000, 'United States')]
Enter fullscreen mode Exit fullscreen mode

Query Index

We'll now run the following query:

from sqlalchemy import text

with engine.connect() as con:
    rows = con.execute(text("SELECT city_name from city_stats"))
    for row in rows:
        print(row)
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

('Toronto',)
('Tokyo',)
('Chicago',)
('Seoul',)
Enter fullscreen mode Exit fullscreen mode

Natural language SQL

We can also prepare a natural language query:

from llama_index.indices.struct_store.sql_query import NLSQLTableQueryEngine

query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["city_stats"],
)
query_str = (
    "Which city has the highest population?"
)
response = query_engine.query(query_str)
Enter fullscreen mode Exit fullscreen mode

Building our Table Index

import openai

openai.api_key = os.environ["OPENAI_API_KEY"]
Enter fullscreen mode Exit fullscreen mode
from llama_index.indices.struct_store.sql_query import SQLTableRetrieverQueryEngine
from llama_index.objects import SQLTableNodeMapping, ObjectIndex, SQLTableSchema
from llama_index import VectorStoreIndex

# set Logging to DEBUG for more detailed outputs
table_node_mapping = SQLTableNodeMapping(sql_database)
table_schema_objs = [(SQLTableSchema(table_name="city_stats"))] # add a SQLTableSchema for each table

obj_index = ObjectIndex.from_objects(
    table_schema_objs,
    table_node_mapping,
    VectorStoreIndex,
)
query_engine = SQLTableRetrieverQueryEngine(
    sql_database, obj_index.as_retriever(similarity_top_k=1)
)
Enter fullscreen mode Exit fullscreen mode

Now we can use the natural language query:

response = query_engine.query("Which city has the highest population?")
display(Markdown(f"<b>{response}</b>"))
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

Tokyo has the highest population with 13,960,000 people.
Enter fullscreen mode Exit fullscreen mode

The raw output can also be obtained using the following:

# you can also fetch the raw result from SQLAlchemy! 
response.metadata["result"]
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

[('Tokyo', 13960000)]
Enter fullscreen mode Exit fullscreen mode

Additional context info can be added as follows:

# manually set context text
city_stats_text = (
    "This table gives information regarding the population and country of a given city.\n"
    "The user will query with codewords, where 'foo' corresponds to population and 'bar'"
    "corresponds to city."
)

table_node_mapping = SQLTableNodeMapping(sql_database)
table_schema_objs = [(SQLTableSchema(table_name="city_stats", context_str=city_stats_text))]
Enter fullscreen mode Exit fullscreen mode

Using LangChain for Querying

LangChain can also be used:

from langchain import OpenAI, SQLDatabase, SQLDatabaseChain
Enter fullscreen mode Exit fullscreen mode
llm = OpenAI(temperature=0)
Enter fullscreen mode Exit fullscreen mode
# set Logging to DEBUG for more detailed outputs
db_chain = SQLDatabaseChain(llm=llm, database=sql_database)
Enter fullscreen mode Exit fullscreen mode
db_chain.run("Which city has the highest population?")
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

'Tokyo has the highest population with 13960000 people.'
Enter fullscreen mode Exit fullscreen mode

Summary

In this quick example, we have seen how to use LlamaIndex with SingleStoreDB. Only a few minor code modifications were required to the example from the LlamaIndex documentation.

License

The MIT License

Copyright © Jerry Liu

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Top comments (0)