DEV Community

Cover image for AI Assistant for Company-Wide Software Best Practices with Gemini, LlamaIndex & RAG
Muhammad Ahsan Ayaz
Muhammad Ahsan Ayaz

Posted on

AI Assistant for Company-Wide Software Best Practices with Gemini, LlamaIndex & RAG

In modern software development, ensuring that best practices are followed across teams and projects is essential to maintain code quality, efficiency, and scalability. However, keeping everyone updated on the latest standards and best practices can be a challenge, especially when different teams work on various parts of the codebase.

An AI-powered assistant that can provide instant answers to questions about your company’s coding standards or software best practices can help alleviate this issue. Using Retrieval-Augmented Generation (RAG), you can combine the power of large language models (LLMs) with the ability to search your company’s documentation in real-time, giving employees quick access to guidelines, code examples, and answers to frequently asked questions.

In this article, we'll walk through how to build a RAG-based assistant that developers can use to query software best practices, guidelines, or standards specific to your organization.

Why RAG for Software Best Practices?

Retrieval-Augmented Generation (RAG) allows a model to fetch relevant information from external documents in real-time while generating answers. This is especially useful when dealing with dynamic and context-specific content, like company coding standards or documentation that may evolve over time.

Unlike pre-trained models that rely solely on their internal knowledge, a RAG-based assistant pulls up-to-date information from your company's repositories or documentation files, ensuring accurate, real-time responses tailored to your exact guidelines.

Prerequisites

This article uses Python and Google Colab for demonstrating RAG with LlamaIndex and Google Gemini.

Prerequisite 1:

Obtain a Google API Key. Since we're using Google Gemini as the Generative AI model, we will need it. You can create one in the Google AI Studio

Prerequisite 2:

Add the GOOGLE_API_KEY in the Colab secrets as shown in the image below.
Google Colab secret

Prerequisite 3:

Install dependencies and authorize with Google to get the Gemini (google) API Key

!pip install llama_index
!pip install huggingface-hub
!pip install llama-index-embeddings-gemini
!pip install llama-index-llms-gemini
!pip install google-generativeai
Enter fullscreen mode Exit fullscreen mode
from google.colab import auth
import os
from google.colab import userdata

auth.authenticate_user()
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
Enter fullscreen mode Exit fullscreen mode

Step 1: Gather and Store Your Best Practices Documentation

Before setting up your AI assistant, collect and organize the various documents containing your company's best practices. These could include documents on coding standards, design principles, software architecture guidelines, and more. Ideally, these should be stored in a centralized repository like GitHub, Google Drive, or your company’s knowledge base.

For demonstration purposes, let’s assume you’ve stored a file in a GitHub repository that contains your company’s best practices, covering topics such as DRY (Don't Repeat Yourself), SOLID principles, and clean code practices.

Step 2: Fetching Documentation from the Repository

To enable the assistant to fetch your documents, we’ll download them from your repository using Python’s requests library. This way, your assistant will always have access to the latest version of the documentation.

# import necessary packages
from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex
from llama_index.core import Document
from llama_index.llms.gemini import Gemini
from llama_index.embeddings.gemini import GeminiEmbedding
from llama_index.core import Settings

from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.node_parser import TokenTextSplitter
import requests
Enter fullscreen mode Exit fullscreen mode
# retrieve document
tsPracticesDoc = "https://raw.githubusercontent.com/AhsanAyaz/gemini-rag-llamaindex-ts/refs/heads/main/data-sources/typescript_best_practices.txt"
response = requests.get(tsPracticesDoc)
if response.status_code == 200:
    content = response.text
else:
    raise Exception(f"Failed to download the file from {url}")

documents = [Document(text=content)]
Enter fullscreen mode Exit fullscreen mode

Step 3: Setting Up the Language Model and Embeddings

Now, let’s configure a large language model (LLM) to handle text generation, and use Gemini embeddings to process and represent your documents in a vector space, allowing for fast and accurate retrieval. We will also configure LlamaIndex's global settings to use our models.

# models and text chunk splitters
llm = Gemini(model_name="models/gemini-1.5-flash-latest")
embed_model = GeminiEmbedding(model_name="models/embedding-001", generation_config = {"temperature": 0.7, "topP": 0.8, "topK": 40})

text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=20)
splitter = TokenTextSplitter(chunk_size=1024, chunk_overlap=20)

# global LlamaIndex settings
Settings.llm = llm
Settings.embed_model = embed_model
Settings.text_splitter = text_splitter
Enter fullscreen mode Exit fullscreen mode

Step 4: Generating the VectorIndex and Query Engine

For optimal performance, we are breaking the documentation into smaller chunks. This allows the assistant to search for and retrieve specific sections of the text more effectively. We can now create the Vector index using LlamaIndex and create our Query Engine as follows:

index = VectorStoreIndex.from_documents(documents, show_progress=True)

query_engine = index.as_query_engine()
Enter fullscreen mode Exit fullscreen mode

Step 5: Querying the AI Assistant

Once the document is indexed, your RAG-based assistant is ready to answer questions. The assistant will retrieve relevant sections from the document and generate a response based on the retrieved information.

response = query_engine.query("What are the benefits of DRY in TypeScript? Can you give some examples and give some code samples?")
print(response)
Enter fullscreen mode Exit fullscreen mode

How the AI Assistant Works

  • Document Retrieval: When a query is made, the assistant searches the document index using the embeddings to retrieve the most relevant chunks of text.
  • Text Generation: The LLM then uses the retrieved chunks to generate a comprehensive and context-specific answer, often including code snippets and examples.
  • Continuous Learning: As your documentation evolves, simply updating the repository allows the assistant to work with the latest version of your guidelines.

For instance, if an engineer asks, “What are the benefits of DRY in TypeScript? Can you give some examples and give some code samples?” the assistant might respond with:

"DRY (Don't Repeat Yourself) in TypeScript helps to avoid repetition by centralizing logic, making code more maintainable and easier to understand.

For example, instead of repeating the same calculation multiple times, you can create a function that encapsulates the logic. This makes the code more concise and easier to modify if the calculation needs to be changed.

Here's an example:

// Bad Example: Repeating logic
let area1 = 10 * 20;
let area2 = 15 * 30; 

// Good Example: Centralized logic
function calculateArea(width: number, height: number): number {
  return width * height;
}

let area1 = calculateArea(10, 20);
let area2 = calculateArea(15, 30);

Another benefit of DRY is that it reduces the risk of introducing bugs. When you have the same logic in multiple places, it's easy to make a mistake and update only some of the instances. By centralizing the logic, you ensure that any changes are made consistently."

Cool, right? This response not only explains the principle but also provides clear examples of how to implement it.

Step 6: Customizing the Assistant for Your Company

The RAG-based AI assistant can be customized to fit any company's needs. For example:

  • Multiple Documents: You can index multiple documents, such as API design guidelines, security protocols, or team-specific coding standards.
  • Different Models: Depending on the complexity of your queries, you can switch between different LLMs or fine-tune a model specifically for your use case.
  • User-Specific Queries: Customize the assistant to provide different levels of detail based on the user’s role (e.g., junior developers vs. senior architects).

Conclusion

By building a RAG-based AI assistant for company-wide software best practices, you can provide your development teams with instant access to critical information, helping them follow coding standards and guidelines more effectively. This AI assistant reduces the need for manual searches, ensures consistency across teams, and scales with your company’s evolving documentation.

The assistant can also serve as a foundation for broader use cases, such as onboarding new developers, handling code reviews, or even integrating with continuous integration (CI) systems to flag violations of coding standards during development.

This is just the beginning—by leveraging RAG, you can build powerful AI tools that make knowledge accessible to everyone in your organization.

Code

The code for the tutorial can be found here:
https://github.com/AhsanAyaz/gemini-rag-llamaindex-example

Feel free to react to this post, and to give the github repo a star if you found this useful :)

Top comments (0)