DEV Community

Cover image for ๐Ÿค–๐Ÿ“š Take Your First Steps into RAG: Building a LlamaIndex Retrieval Application using OpenAIโ€™s gpt-3.5-turbo
Tim Pap
Tim Pap

Posted on

๐Ÿค–๐Ÿ“š Take Your First Steps into RAG: Building a LlamaIndex Retrieval Application using OpenAIโ€™s gpt-3.5-turbo

๐Ÿ”—ย Retrieval-Augmented Generation (RAG) combines the knowledge look up and retrieval capacities of search engines with the fluent language generation capabilities of large language models.

๐Ÿฆ™ย LlamaIndex is an open-source data framework that enables developers to ingest, structure, and query data for use in large language model (LLM) applications. It facilitates a Retrieval Augmented Generation (RAG) approach, where relevant knowledge is retrieved from data sources before being fed to language models to generate high-quality responses.

LlamaIndex handles the complexity of connecting to data, building indexes and retrieval pipelines, and integrating LLMs. Whether your data lives in APIs, databases, or documents, LlamaIndex makes it seamless to leverage for AI applications. With intuitive APIs for querying knowledge and conversing with chatbots to customizable search functionality, LlamaIndex lets you focus on creating performant, tailored LLM experiences.

In this post, I'll provide a step-by-step tutorial for building your first RAG application with LlamaIndex and OpenAIโ€™s gpt-3.5-turbo

If you already have a Python environment configured, you can skip the next section and start building your LlamaIndex application directly.

Alternatively, if you have Docker installed, you can leverage a VS Code development container for a ready-made environment without any additional setup.

Otherwise, the following section will guide you through installing Python and setting up a virtual environment to run LlamaIndex smoothly. The choice depends on your existing tools and preference.

(optional) setup development environment

  • open wsl
  • install build essential sudo apt-get install build-essential
  • install deps sudo apt-get install build-essential libssl-dev zlib1g-dev libbz2-dev \ libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev \ xz-utils tk-dev libffi-dev liblzma-dev python3-openssl git
  • install pyenv curl [https://pyenv.run](https://pyenv.run/) | bash
  • install python pyenv install 3
  • install pipx sudo apt install pipx, pipx ensurepath
  • install poetry pipx install poetry
  • poetry config virtualenvs.in-project true
  • pyenv local 3
  • poetry new rag, cd rag
  • poetry shell. If you want to deactivate use deactivate

Setup

LlamaIndex utilizes OpenAI's gpt-3.5-turbo model for text generation and text-embedding-ada-002 for retrieval operations by default. To leverage these models, you need an OpenAI API key configured as the OPENAI_API_KEY environment variable.

To obtain an API key:

  1. Log into your OpenAI account
  2. Create a new API key

This unique key authorizes LlamaIndex to call OpenAI models on your account's behalf.

install libraries

OPENAI_API_KEY=ADD_YOUR_KEY_HERE

Enter fullscreen mode Exit fullscreen mode

To provide custom data for LlamaIndex to ingest, first create a text file named my-file.txt within the rag/data directory. Add whatever content you would like LlamaIndex to have access to - this can be any freeform text.

For example:

hey my name is Tim
the secret number is 12

Enter fullscreen mode Exit fullscreen mode

Now LlamaIndex can ingest this data file and allow querying over the content using natural language

Application

To build your LlamaIndex application, first create a Python file called app.py in the rag/ directory.

This app.py will hold the code powering your application. Next we'll start adding Python logic to initialize LlamaIndex, load data, define queries, and ultimately enable asking questions over your custom knowledge.

from dotenv import load_dotenv
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
)

load_dotenv()

documents = SimpleDirectoryReader("./rag/data/").load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
response = query_engine.query("What is the secret number?")
print(response)

Enter fullscreen mode Exit fullscreen mode

run the app python rag/app.py

By calling load_dotenv(), we populate the environment variables for the current Python process from the .env file. This allows us to store configuration, credentials, and other sensitive information in .env rather than hard-coding them in our application code. The .env file is gitignored by default so it won't be committed into source control.

documents = SimpleDirectoryReader("./rag/data/").load_data()

The SimpleDirectoryReader provides a straightforward method to ingest local files into LlamaIndex. While more robust Readers from LlamaHub may be better suited for production systems, the SimpleDirectoryReader offers a simple on-ramp to start loading data and experimenting with LlamaIndex.

A Document encapsulates a data source such as a PDF, API response, or database query result. Within LlamaIndex, data is divided into discrete Node objects representing atomic semantic units. For example, a node could contain a paragraph of text or table from a document.

Nodes maintain metadata linking them to their parent Document and any related Nodes. This connectivity between nodes and back to source documents creates a rich knowledge graph for targeted information retrieval.

index = VectorStoreIndex.from_documents(documents)

Vector stores play a vital role in retrieval-augmented generation by efficiently indexing vector embeddings. You'll leverage vector stores, whether directly or behind the scenes, in most LlamaIndex applications.

A vector store ingests Node objects, analyzing the data to construct an optimized search index.

The most straightforward approach for indexing data utilizes the vector store's from_documents method. Simply pass in your documents and the vector store handles building the index

Indexes and Embeddings

After loading data, LlamaIndex facilitates indexing to optimize retrieval. Indexing transforms the raw content into vector embeddings - numeric representations of semantic meaning. These embeddings get stored in a vector database engine specialized for efficient similarity searches.

The index may also track extra metadata like relationships between nodes. This supplementary information bolsters the relevance of fetched content.

To locate relevant context for a query, LlamaIndex first converts the search terms into an embedding vector. It then identifies stored nodes with the closest matching embeddings to the query vector. This vector similarity search allows retrieving the most contextually related data points for any natural language query.

query_engine = index.as_query_engine()

A query engine enables querying a knowledge base through natural language. It takes a question expressed in plain text, retrieves the most relevant supporting content from the indexed data, supplies both the question and contextual information to a language model, and returns the model's response. This end-to-end pipeline allows users to extract information from data by simply asking questions in everyday language.

Simple storage for embedings

The vector embeddings representing your indexed data reside in memory by default. You can optimize performance by persisting these embeddings to local storage instead. Add this line to save the index:

index.storage_context.persist()

Enter fullscreen mode Exit fullscreen mode

The data will persist to the "storage" directory by default. To customize this, pass the desired location to the persist_dir parameter.

To leverage a persisted index, check if one exists and load it. If not found, generate a new index before persisting

import os.path
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)

# check if storage already exists
PERSIST_DIR = "./storage"
if not os.path.exists(PERSIST_DIR):
    # load the documents and create the index
    documents = SimpleDirectoryReader("data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    # store it for later
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)

Enter fullscreen mode Exit fullscreen mode

Inspecting Activity with Logging

To understand everything occurring within your LlamaIndex application, configure logging to output internal events and queries. At the start of starter.py, add:

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
Enter fullscreen mode Exit fullscreen mode

If you encounter rate limiting while sending requests to OpenAI, you may be using a free API key instead of a paid plan. Rate limits apply to OpenAI's free tier. Also verify you have configured LlamaIndex with a valid OpenAI API key associated with a paid subscription https://platform.openai.com/account/billing/overview

Final application

from dotenv import load_dotenv
import os.path
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)

load_dotenv()

# check if storage already exists
PERSIST_DIR = "./rag/storage"
if not os.path.exists(PERSIST_DIR):
    # load the documents and create the index
    documents = SimpleDirectoryReader("./rag/data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    # store it for later
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)

documents = SimpleDirectoryReader("./rag/data/").load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
response = query_engine.query("What is the secret number?")
print(response)
Enter fullscreen mode Exit fullscreen mode

application tree

โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ poetry.lock
โ”œโ”€โ”€ pyproject.toml
โ”œโ”€โ”€ rag
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ app.py
โ”‚   โ”œโ”€โ”€ data
โ”‚   โ”‚   โ””โ”€โ”€ my-file.txt
โ”‚   โ””โ”€โ”€ storage
โ”‚      โ”œโ”€โ”€ default__vector_store.json
โ”‚      โ”œโ”€โ”€ docstore.json
โ”‚      โ”œโ”€โ”€ graph_store.json
โ”‚      โ”œโ”€โ”€ image__vector_store.json
โ”‚      โ””โ”€โ”€ index_store.json
โ””โ”€โ”€ tests
    โ””โ”€โ”€ __init__.py
Enter fullscreen mode Exit fullscreen mode

The full source code for this LlamaIndex example is located at:

https://github.com/blackpr/llamaindex-rag-first-steps

This repository contains a complete application showcasing core LlamaIndex concepts including:

  • Loading custom documents
  • Indexing via vector embeddings
  • Defining a query engine
  • Enabling querying

Top comments (0)