Intro
Retrieval augmented generation (RAG) is like a knowledge base to your LLM model that helps it to add your context-specific information into the prompt and answer. In this article, we will go through how to set up an endpoint that will provide an up-to-date answer about Genkit documentation thanks to RAG.
Prerequisites
- Firebase project with billing enabled (you won't be charged if you follow along, but you need to have billing enabled to access Firestore and Firebase Cloud Functions)
- If you have never used Firebase, check out how to get started on the Firebase Blog
-
Genkit installed
- Genkit is an open-source library that abstracts much of the boring stuff when building AI-powered apps and adds great observability and dev tools on top.
- Node.js/local dev environment with gcloud CLI
Getting the data & Chunking
Genkit uses Flows for organization, you can check out my other article to better understand the concept. Let's start with getting the data of the Genkit documentation.
The Firebase team provides the Genkit documentation via the llms.txt standard, which means the whole documentation is segmented into Markdown documents.
For the demo, we will be only interested in the JavaScript part of the documentation, so we can use that domain-specific URL for getting the data. Since the data is provided as text, we don't need any complicated APIs or processing. The LLM will understand Markdown just fine. But to provide specific context, we need to split the large Markdown document into smaller parts - chunks.
There are many different chunking strategies, we will use a fairly simple setup, split by sentences, with minimum chunk length of 1000 characters and max. 2000. We will also have 100 character overlap, to keep context between the chunks.
Here are two config objects that we will reuse through this demo:
const chunkingConfig = {
minLength: 1000,
maxLength: 2000,
splitter: "sentence",
overlap: 100,
delimiters: "",
} as any;
const indexConfig = {
// Firestore collection name
collection: "genkitDocs",
// Firestore field name for the content
contentField: "text",
// Firestore field name for the vector
vectorField: "embedding",
// Embedder to use for the vectorization
embedder: googleAI.embedder("gemini-embedding-001", {
// Dimension of the vector, it's important to use 2048 when using Firestore as the Vector DB
outputDimensionality: 2048,
}),
};
Embedding
The LLMs understand the text differently than we do, that's why we can make it easier for them to understand using embeddings.
In reality it means, that we will convert the text into vectors that can be used by the model to find relevant pieces of text more efficiently.
Once again, Genkit makes it easy for us to use the Google gemini-embedding-001 model to parse the chunked text into embeddings.
async function indexToFirestore(textChunks: string[]) {
for (const text of textChunks) {
const embedding = (
await ai.embed({
embedder: indexConfig.embedder,
content: text,
options: {
outputDimensionality: 2048,
},
})
)[0].embedding;
await firestore.collection(indexConfig.collection).add({
[indexConfig.vectorField]: FieldValue.vector(embedding),
[indexConfig.contentField]: text,
});
}
}
This function will take the created chunks, embed them into vectors and save the results into Firebase Firestore database. It's important to also save the original text, so we can show it back to the user if necessary or reference it in other ways. It's easier to think about it if you were embedding an FAQ document, you can point the user to the specific question/answer in the document alongside the LLM generated answer by the model.
Indexing flow
Here is the full flow (missing the required file imports that can be found at the end of the article alongside the full code sample) that we can use to get, chunk, embed and save the Genkit JS documentation, or any other markdown file URL for that matter.
export const indexGenkit = ai.defineFlow(
{
name: "indexGenkit",
inputSchema: z.object({
urlLlmsTxt: z
.string()
.describe("URL for the .llms docs")
.default("https://genkit.dev/llms-js.txt"),
}),
outputSchema: z.object({
success: z.boolean(),
error: z.string().optional(),
documentsIndexed: z.number().optional(),
}),
},
async ({ urlLlmsTxt }) => {
try {
// Download the Genkit documentation as Markdown text
const docTxt = await ai.run("extract-text", async () => {
const response = await fetch(urlLlmsTxt);
const text = await response.text();
return text;
});
// Chunk the documentation into smaller pieces with the defined chunking config
const chunks = await ai.run("chunk-it", async () =>
chunk(docTxt, chunkingConfig)
);
// Delete all existing documents in the collection to avoid duplicates when reindexing
await ai.run("delete-existing-documents", async () => {
await deleteAllDocumentsInCollection(indexConfig.collection);
});
// Create Genkit documents from the chunks
const documents = chunks.map((text) => {
return Document.fromText(text, { urlLlmsTxt });
});
// Save the created documents to Firestore
await ai.run("index-to-firestore", async () => {
await indexToFirestore(chunks);
});
return {
success: true,
documentsIndexed: documents.length,
};
} catch (err) {
return {
success: false,
documentsIndexed: 0,
error: err instanceof Error ? err.message : String(err),
};
}
}
);
We can use the gcloud CLI to create the required index for Firestore to use embeddings, just make sure to use your project ID and change any fields values if you are not using the config provided above.
gcloud firestore indexes composite create --project={{FIREBASE_PROJECT_SLUG_ID}} --collection-group=genkitDocs --query-scope=COLLECTION --field-config=vector-config='{"dimension":"2048","flat": "{}"}',field-path=embedding
Retrieving data
In RAG, we have to create Retrievers to get the embedded or other data. Genkit provides a pre-created Firestore retriever, but you can learn how to create your own retriever in the Genkit documentation or finish this demo and ask your helpful RAG-powered assistant :)
const firestoreRetriever = defineFirestoreRetriever(ai, {
name: "firestoreRetriever",
// Firestore instance
firestore,
// Firestore collection name
collection: indexConfig.collection,
// Firestore field name for the content
contentField: indexConfig.contentField,
// Firestore field name for the vector
vectorField: indexConfig.vectorField,
// Embedder to use for the vectorization
embedder: indexConfig.embedder,
// Distance measure to use for the similarity search
distanceMeasure: "COSINE",
});
The finished Flow
Putting it all together, we have the genkitQAFlow, that takes in the question, uses the Firestore Retriever to get the relevant documents and sends them to the LLM model (gemini-2.5-flash in this case) to generate the answer.
export const genkitQAFlow = ai.defineFlow(
{
name: "genkitQA",
inputSchema: z.object({
query: z.string().max(500).default("What is Genkit?"),
}),
outputSchema: z.object({ answer: z.string() }),
},
async ({ query }) => {
// Retrieve relevant documents
const docs = await ai.retrieve({
// Retriever to use for the similarity search using Firestore as the Vector DB
retriever: firestoreRetriever,
query: query,
options: {
k: 3, // Number of top documents to retrieve
},
});
// Generate a response
const { text } = await ai.generate({
model: googleAI.model("gemini-2.5-flash"),
prompt: `
You are acting as a helpful AI assistant that can answer
questions about the Genkit documentation.
Use only the context provided to answer the question.
Question: ${query}`,
docs,
});
return { answer: text };
}
);
Trying it out
You can find the full code sample on GitHub. The sample contains Firebase setup, env and other parts, that are not necessary for the article, but are required to run the code.
When first running the retrieval Genkit QA flow, you will get an error, asking you to create a relevant Firestore index. Just follow the URL provided in the error to create the index.
Now you can run the following command, to start the Genkit dev UI.
genkit start -- tsx --watch src/genkit-sample.ts
Now you can visit http://localhost:4000 to see the Genkit Dev UI in the browser. Select Flows and the indexGenkit flow.
After pressing run, it will take about 2 minutes to get, chunk, index and save the required documents. For larger datasets, it would be better to optimize this process.
After successful run, you can see the spans along with the details in the right output panel.
After the indexing, we can switch to the genkitQA flow.
We can ask question in the input field and wait for the answer. On the right side we can once again see what is actually happening in the flow - the retriever, embedding and finally putting it all together in the call to the Gemini API.
Conclusion
We have explored the basics of RAG and Genkit, you should be able to take any long-form text and create custom embeddings to answer your domain-specific questions. Using Firebase, we have explored Firestore as a quick and cheap (free if you are within limits) vector database.
Thanks to Dominik Šimoník as a co-author of this article that is based on our talk at DevFest.cz 2025




Top comments (0)