RAG Quality in Side Projects: Is Perfection Always Necessary?

#rag #ai #sideprojects #codequality

Introduction: The Perfection Trap and Side Project Realities

This morning, while sitting at my computer, I was reviewing the logs of the RAG (Retrieval-Augmented Generation) system I developed for my personal financial analysis tool last month. The tool analyzes a user's financial data and provides insights by comparing it with general market trends. The core purpose of RAG was to enrich the large language models' (LLM) general knowledge with my specific and up-to-date financial data. About 3 weeks ago, I noticed a minor inconsistency in a response from the system. While evaluating the past performance of a stock, the LLM overlooked the downward trend from the last 2 weeks in the dataset I provided, instead offering a more general interpretation of "stable growth." This prompted me to reflect on "RAG quality" and "perfection."

Side projects are our small worlds, often developed with passion in our spare time outside of our main work. In these projects, we experiment with technologies and bring new ideas to life. My financial analysis tool is exactly such a project. The biggest question here is: Does the quality of RAG systems I use in my side projects have to be as "perfect" as in my main projects? Or does a "good enough" level offer a more sustainable roadmap for both the developer and the project? In this post, I will share my experiences to explain the importance of RAG quality, why it may not always need to be at the highest level, and how I strike that balance.

The Basic Functionality of RAG and Factors Affecting Quality

Retrieval-Augmented Generation (RAG) is a powerful architecture used to make LLM responses more accurate, up-to-date, and contextual. It fundamentally consists of two main components: Retrieval and Generation. In the retrieval phase, the most relevant documents or data snippets related to the user's query are fetched, typically from a vector database (e.g., FAISS, Pinecone, or ChromaDB). In the generation phase, this retrieved context is sent along with the original query to the LLM, enabling the model to generate a response based on this context.

Several factors influence whether the quality in this process is "good." The most important ones in the retrieval phase are:

Embedding Model Selection: The quality of the embedding model that converts data snippets and queries into vectors directly impacts the relevance of the retrieved documents. Different models (e.g., all-MiniLM-L6-v2, text-embedding-ada-002, or models specific to certain domains) show varying success rates.
Chunking Strategy: How documents are divided into smaller pieces (chunks) determines how granular the retrieval will be. Chunks that are too small might lose context, while chunks that are too large might overwhelm the LLM with unnecessary information.
Vector Database and Indexing: The performance of the vector database used and its indexing method (e.g., HNSW, IVF) affect query speed and retrieval accuracy.
Re-ranking Mechanisms: Re-ranking algorithms (e.g., Cohere Re-ranker, Cross-Encoders) used to further improve the relevance of the initial retrieval results can also enhance quality.

In the generation phase, factors affecting quality include:

LLM Selection: The capabilities of the LLM used (context understanding, reasoning, creativity) and its training data determine the quality of the response it will generate.
Prompt Engineering: The clarity of the prompt sent to the LLM and how it's instructed to use the retrieved context directly impact the quality of the response.
Temperature and Top-p Parameters: These parameters control the randomness and diversity of the LLM's generated response. A lower temperature produces more deterministic and focused answers, while a higher temperature can lead to more creative but potentially less accurate answers.

Each of these factors creates a complex equation that affects the overall quality of a RAG system. Making every variable in this equation "perfect" for side projects typically requires a significant investment of time and resources.

What Does "Good Enough" Quality Mean in Side Projects?

When I say "good enough" quality, I mean that the RAG system can perform its basic function, i.e., generate reasonably relevant and understandable answers to user queries. This doesn't always mean getting the fastest, most accurate, or most detailed answer. In the case of my financial analysis tool, the general market trend information the system provided was still a valuable starting point for making investment decisions, even if it overlooked the downward trend of the last two weeks. In fact, this minor inconsistency even contributed to the project's development by prompting me to ask, "How can I integrate this data better?"

When determining the "good enough" level for side projects, I consider the following criteria:

Basic Functionality: Is the RAG working on the fundamental task of retrieving and understanding information? Are the answers completely nonsensical or irrelevant?
User Experience: Is the system easy to use? Are the answers understandable? Can I, as the developer, interact with the output reasonably?
Development Resources: How much time and effort do I need to invest to achieve this quality level? Would spending additional time for higher quality cause me to neglect other aspects of the project?
Project Goal: What is the main objective of the side project? Is it to learn a technology, find a solution to a specific problem, or just to have fun? The project's goal plays a key role in determining the quality standard.

If a side project is undertaken to experiment with a specific technology stack, prove a concept, or simply satisfy personal curiosity, a "reasonable" system is usually sufficient instead of a "perfect" RAG system. This allows me to iterate faster, try and fail with more ideas, and ultimately learn more.

ℹ️ Developer Tip

When setting quality standards for side projects, focus on "effectiveness" rather than "perfection." If the system can perform the basic functions you've defined and move you to the next step, it has achieved sufficient quality.

Real-World Scenario: My Financial Analysis Tool and RAG Inconsistency

Let's get to the specific problem my financial analysis tool encountered. About 3 weeks ago, while reviewing an analysis of a stock by the tool, I noticed it completely ignored the 15% downward trend over the last 10 trading days. The RAG system had generally retrieved reports indicating the stock's "stable growth potential" and presented this information to the LLM. However, the last 2 weeks of data, which were in my dataset and vectorized in ChromaDB, contradicted this general trend.

Here's a snippet from the logs at that moment (anonymized and simplified):

{
  "query": "Analysis of XYZ stock's current status and future",
  "retrieved_docs": [
    {"id": "doc_101", "score": 0.92, "content": "XYZ stock has entered a stable growth trend, showing a 45% increase in the last year..."},
    {"id": "doc_105", "score": 0.88, "content": "Market analysts predict XYZ could grow by another 20% in the next 2 years with its new product launch..."},
    {"id": "doc_112", "score": 0.75, "content": "According to fundamental analyses, XYZ is outperforming the sector average..."}
    // Note: Data from the last 2 weeks or chunks containing this data did not score high enough during retrieval.
  ],
  "llm_prompt": "User query: Analysis of XYZ stock's current status and future.\n\nContext:\nXYZ stock has entered a stable growth trend, showing a 45% increase in the last year...\nMarket analysts predict XYZ could grow by another 20% in the next 2 years with its new product launch...\nAccording to fundamental analyses, XYZ is outperforming the sector average...\n\nPlease provide an analysis to the user based on this information.",
  "llm_response": "XYZ stock exhibits stable growth potential, in line with general market trends. Recent performance data supports the company's strong fundamentals and promising future projects. Analysts predict the company will maintain its growth momentum in the coming years."
}

This output was not the "perfect" RAG output I expected. Ignoring the 15% downward trend weakened the project's core purpose for me. However, at this point, there were two paths:

Pursue Perfection: Change the embedding model, optimize the chunking strategy, perhaps add a re-ranking layer, or try more sophisticated indexing methods in the vector database. This would be a debugging and optimization process that could take hours, even days.
Adopt the "Good Enough" Approach: Accept that the system is generally working well and consider this specific inconsistency an acceptable limitation. In this case, knowing that the LLM's generated response reflects the "general trend" but misses the most current "micro" data, I would present it to the user.

I chose the second path. Why? Because this was a side project, and my main goal was to quickly implement a concept and learn how to leverage LLMs with my own data for financial analysis. This minor inconsistency did not undermine the project's fundamental purpose; it only slightly diminished the nuance of the analysis.

Trade-offs: Time, Resources, and the Cost of Perfection

The costs of aiming for "perfect" RAG quality in side projects are quite high. I generally categorize these costs into three main areas:

Time Cost: The most obvious cost is time. Experimenting with embedding models, playing with chunking parameters, optimizing vector database settings, and refining LLM prompts can take considerable time. Spending such a long period on a side project can cause the project to deviate from its main goal or never be completed. For example, in my financial analysis tool, changing the embedding model and re-processing the entire dataset with new embeddings could have taken me several hours. I could have spent this time developing the tool's user interface or integrating more financial data sources.
Resource Cost: High-quality embedding models generally require more computational power. Some embedding models (e.g., more advanced models from Google or OpenAI) can be costly per API call. Running and managing vector databases also requires server resources. If I am setting up a dedicated infrastructure for this side project, I also need to consider the cost of these resources. While running ChromaDB on my own VPS, I observed an operation that took about 2 hours and used 80% CPU to index a 500 GB dataset. This is a significant resource consumption if it's a frequently repeated process.
The Perfection Paradox: Sometimes, pursuing perfection can lead us to overlook what is already "good." In side projects, while trying to build a flawless system, we might not even be able to see the fundamental benefits the project can offer. If my tool can perform financial analysis with 95% accuracy, and this is sufficient for me, spending weeks for 99% accuracy might not be logical. This can lead to a state of "analysis paralysis."

Considering these trade-offs, in my side projects, I aim for an "effective" and "educational" quality level rather than "perfect." This allows the project to progress quickly and maximizes the learning process.

⚠️ Developer Warning

Ignoring resource and time constraints in side projects can prevent the project from being completed. Always consider your main goals and the minimum effort required to achieve them.

Practical Steps to Improve RAG Quality (For Side Projects)

While perfection isn't always mandatory, maintaining RAG systems at a certain quality level is important. Here are some practical and relatively less costly steps I apply to improve quality in side projects:

Choosing the Right Embedding Model: Opt for popular, well-performing open-source embedding models (e.g., models in the sentence-transformers library) that are suitable for your domain. Models like all-MiniLM-L6-v2 are a good starting point for most general-purpose tasks and can be run locally. I used this model initially in my financial analysis tool.
Smart Chunking Strategies: Divide your documents into pieces that preserve their logical integrity. Chunking methods that consider the document's structure (paragraphs, headings) (e.g., RecursiveCharacterTextSplitter) can yield better results. In financial reports, treating each section (introduction, analysis, conclusion) as a separate chunk would better preserve context.
Simple Yet Effective Prompt Engineering: Give clear instructions to the LLM. Specify how it should use the retrieved documents, and the tone and format of the response. For example: "Using the context below, answer the user's question. Base your answer solely on the information provided in the context. If information is not present in the context, state 'Information not available on this topic.'" This simple prompt can prevent the LLM from making unnecessary speculations.
Query Transformations: Transforming complex user queries into simpler, more effective ones for retrieval (e.g., using techniques like Hypothetical Document Embeddings - HyDE) can improve retrieval quality. This involves asking the LLM to generate a "hypothetical" answer and then using the embedding of this answer to find more relevant documents. Such techniques can provide significant improvements at a relatively low cost.
Limited but Effective Re-ranking: If the initial retrieval results are not satisfactory, consider adding a simple re-ranking layer. For example, you can take the top 10 results, re-sort them with a tool like Cohere's free re-ranker API, and send the top 3-5 most relevant results to the LLM. This increases computational cost but can significantly improve the quality of the response.

These steps focus on creating a "working" and "useful" system rather than building a "perfect" RAG system. Striking this balance in side projects is crucial for both learning the technology and bringing the project to a tangible outcome.

Conclusion: Perfection is a Journey, Not a Destination

In conclusion, I've found that I don't always have to be "perfect" regarding the quality of RAG systems in my side projects. The minor inconsistency in my financial analysis tool didn't undermine the project's main goal; instead, it offered me more opportunities to think and learn. Such side projects are excellent arenas for experimenting with technology, taking risks, and learning from mistakes.

If you are using a RAG system in a side project, first clarify your project's objective. If your goal is to learn a new technology, experimenting with the most complex and "perfect" solutions might be logical. However, if your goal is to find a practical solution to a specific problem, a "good enough" quality level will usually suffice. This saves you time, allows you to use your resources more efficiently, and increases the likelihood of your project being completed.

Remember, perfection in the tech world is often not a destination but a continuous journey. Side projects are the laboratories that guide us on this journey. Sometimes, we learn the most valuable lessons from imperfect outputs.

As I mentioned in my previous post on [related: AI model evolution and cost analysis], understanding the potential of the technology being used is as important as knowing for what purpose and with how much effort you will use that technology. The pursuit of perfection can sometimes lead us away from this fundamental balance point.