Hands-on: Azure AI Search & AI Foundry for RAG

#rag #azure #tutorial #ai

Index

Introduction
Azure Resources
- Azure AI Search
- Azure AI Foundry
Code
Do not forget to Clean the Cloud
Conclusion

Introduction

In this lab, we will build a full RAG pipeline using Azure. RAG is a technique where, instead of relying solely on a language model's training data, we first retrieve relevant documents from an external knowledge base and then pass them to the model to generate a more accurate and grounded answer.

To do this, we will use two Azure services: Azure AI Search as the vector database to store and retrieve document embeddings, and Azure AI Foundry to deploy the embedding model and the generation model.
By the end of this lab, you will have a working RAG pipeline running on Azure.

Azure Resources

Azure AI Search

Azure AI Search is a cloud search service that supports full-text search, filters, and vector search. In this lab, we are using it as a vector database. We store document embeddings in it and query them using cosine similarity to find the most relevant documents for a given input.

To set it up, we need to provision the resource and get two values: VECTOR_SEARCH_ENDPOINT and VECTOR_SEARCH_KEY, which will be used as environment variables.

Go to the Azure Portal and open AI Search.

Click Create to create a new search service.

Create a new or use an existing resource group. (Suggest: create a new one so its easy to delete the resources later on.)
Give a unique name for Service name.
Make sure the Pricing tier is free, unless you want to experience paid service.
Finally, click Review + Create button at the bottom and then click Create button to create the resource.

Next, do to the resource dashboard and copy the Url from the Essentials. This Url will be used as VECTOR_SEARCH_ENDPOINT.

To get the VECTOR_SEARCH_KEY go to the Keys tab under Settings section from the left navbar. From this screen, copy the Primary admin key.

Azure AI Foundry

Azure AI Foundry is a platform for deploying and managing AI models on Azure. It lets you deploy base models (like OpenAI models) as endpoints that you can call from your own code. In this lab, we are using it to deploy two models: text-embedding-3-small to generate embeddings, and a generation model to produce the final answer.

To set it up, we need to provision the resource and get two values: AZURE_OPEN_API_KEY and AZURE_OPEN_API_ENDPOINT, which will be used as environment variables.

Go to the Azure AI Foundry
Click Create new button to create a new project.
For the resource type, keep the recommended option and click Next.

Give it a good name, keep everything else default and click Create.

Finally, from the project overview page, copy API Key to use as AZURE_OPEN_API_KEY and Microsoft Foundry project endpoint to use as AZURE_OPEN_API_ENDPOINT.

Now from the left navbar, click Models + endpoints under My assets section.
Click Deploy base model. Now we will deploy an embedding model and a generation model.

Search for text-embedding-3-small and click Confirm.

Change the Deployment type to Standard and click Deploy to deploy the model.

Code

Open this notebook in Google Colab.
Add the following environment variables by clicking on this key button, and grant them notebook access
- AZURE_OPEN_API_ENDPOINT
- AZURE_OPEN_API_KEY
- VECTOR_SEARCH_ENDPOINT
- VECTOR_SEARCH_KEY

Finally, you can run the commands in the notebook.

Clean the Cloud

On Azure Portal go to All Resources and delete all the resources we created for this lab.

Conclusion

In this lab, we set up a full RAG pipeline on Azure using Azure AI Search as the vector database and Azure AI Foundry to deploy the embedding and generation models. Thanks for reading! If you want to understand the math behind how the retrieval step works, check out my other blog on the Math behind Embeddings and Cosine Similarity.