Local Langflow - A vector RAG application running locally

Retrieval Augmented Generation (RAG) is a foundational and important application of AI agents. Documents are stored in a vector database, and an agent retriever can search the db based on a user's query to gather relevant information to answer the query.

In the cloud version of Langflow, this is a simple setup. In fact, Langflow even includes Vector RAG as a sample flow. You only need to add your API keys.

There are two parts to the flow: Loading Data and Vector Retrieval QA:

Loading Data
Vector Retrieval QA

The vector db (AstraDB serverless vector), embedding model (OpenAI), and the LLM (OpenAI) require internet access and APIs. But what if you want to run everything locally without accessing the internet?

This is also possible with Langflow.

Start by following the instructions to download and run Langflow here: Install Langflow

Once you run Langflow locally, you can access the UI on your local machine:

Langflow Running Locally

At this point, you can start a flow using the same Vector RAG template that you used in the cloud version of Langflow. Simply add your API's, and it works the same.

But, if you want to be completely local, you need to follow a couple more steps.

Follow the instructions on the Ollama website to download Ollama and pull models that you want to use. (Don't forget the embedding models.)

Change the OpenAI embedding model to an Ollama model, and change the OpenAL LLM to an Ollama model.

Finally, change the AstraDB vector store component to a local vector store. ChromaDB is an easy one to start with.

Now, your load data and Vector Retrieval QA flows look something like this:

Local Loading Data
Local Vector Retrieval QA

Now, you can upload your documents using the Local Loading Data flow, then chat with your document using the Local Vector Retrieval QA.

DEV Community

Local Langflow - A vector RAG application running locally

Top comments (0)