DEV Community: Devpratap Tomar

Introduction to RAG (Retrieval-Augmented Generation)

Devpratap Tomar — Mon, 30 Mar 2026 09:56:05 +0000

I’ve been diving into Generative AI lately, and one thing is clear: if you’ve spent any time with LLMs, you’ve probably run into their limitations. You ask an LLM a highly specific question about a new software library, your company's internal documents, or breaking news, and it either politely declines to answer or confidently makes up a complete lie.

The LLMs don't know about the recent information because they are limited to knowledge available till the time they were trained. And they don’t know about your company's internal info or documents because they were never trained on them and it is not safe because anyone with access to that model can get your internal information.

So, if you want the LLM to answer the questions based on information or data you provide to it without making it available publicly, how will you do it? Well there are plenty of ways by which you can make this possible. The ways are:

Fine-tuning: With the fine-tuning you can train the existing AI model on your private data and users can then ask questions based on that data. It is a high computation task, time consuming and is very costly.
Large Context Window Prompting: You provide all the information in your prompt and tell the LLM to answer based on your provided data only.

Although these methods can work they have their own limitations. Fine-tuning the LLM on your data is very costly and it requires high resources for computation and if your information is updated you have to fine-tune again which makes it an even more expensive approach. The LLMs have limited context window and when you query a LLM, you are not only sending a query to the user, you also send other stuff like system prompt and chat history along with your query. So, if you give large no. of information or data in prompt, you might hit the context window limit

Now, we need a solution which is cost and computation efficient and you can easily update your information without hitting the context window limit. This is where RAG comes into the picture.

What is RAG?

Retrieval-Augmented Generation (RAG) is an AI framework that improves LLM accuracy by fetching relevant, up-to-date information from external, trusted sources (like databases or company documents) before generating a response. It prevents misinformation ("hallucinations") by grounding the answer in verified data rather than relying solely on training data.

The RAG is implemented in two phases:

Indexing Phase: Indexing phase involves gathering the information or documents, generating embeddings and storing it in vector stores or vector DB.
Retrieval Phase: This phase involves retrieving the relevant information from vector store or DB and providing it to LLM for generating response based on the user's query.

Imagine you are hosting a major award show and you have a massive script that contains every single detail. This script includes the names of all the winners, the order of every performance, and every word you need to say. The problem is that the script is far too long for you to memorize entirely, and carrying the whole book on stage is too clumsy when you need to find a specific name quickly.

To solve this, you go through the script and pick out the most important facts to write on small cue cards. Each card contains just one specific piece of information, like the winner of a single category. You keep these cards organized in a small box so you can grab the right one at the right moment. When it is time to announce the Best Actor award, you don't flip through the whole script. You simply pull out the specific cue card for that award, read the name, and announce it to the crowd.

This process is exactly how RAG works. The long script represents your massive collection of data or documents. The difficulty of memorizing that script is the same as the limit on how much information an AI can "remember" or process at once. Breaking the script down into small cue cards is the same as turning your documents into small chunks. Storing those cards in a box is like putting those chunks into a vector store. Finally, finding the right card and reading the winner's name is the same as the AI finding the most relevant piece of information and using it to give you a perfect answer.

Components of RAG

A RAG pipeline consists of various components which makes the implementation of indexing phase and retrieval phase possible.

Document Ingestion & Indexing: This is responsible for gathering the information or documents from various sources and loading them.

Text Splitters: This is a crucial part of a RAG pipeline because this is responsible for dividing your large documents into smaller and manageable chunks. If we don’t convert large documents into small chunks it will not give us good results at the time of retrieval.

Vector Embeddings: This phase involves converting these small chunks into vector embeddings with the help of embedding models. This will help us to capture the semantic meaning of our texts.

Vector Stores & VectorDB: Stores embeddings and enables similarity search for fast information retrieval.

Retriever: Finds and returns the most relevant chunks from the database based on query similarity.

Prompt Augmentation Layer: Combines retrieved chunks with the user’s query to provide context to the LLM.

The Output: The LLM reads the context and provides highly accurate, hallucination-free answers.

Disadvantages of RAG

From the above discussions the advantages of RAG is clear but everything has its own limitations and disadvantages, and so does RAG. Here are some key disadvantages of RAG:

Retrieval Dependency and Quality: RAG depends entirely on the retriever component. If the retriever fetches irrelevant, outdated, or incomplete data, the LLM will generate poor-quality, inaccurate, or "hallucinated" answers.
Performance and Latency Bottlenecks: The extra step of searching a database (vector retrieval + generation) increases latency, making it unsuitable for some real-time applications.
System Complexity and Maintenance: Implementing and maintaining a RAG system involves managing databases, embedding models, and retrieval techniques. Updating knowledge bases requires complex re-indexing and re-embedding.
Data Security and Privacy Risks: RAG systems may expose sensitive or proprietary internal data to unauthorized users if access controls are not robustly implemented.
Contextual Understanding Failures: RAG systems can struggle with complex, interdisciplinary queries or connecting disparate pieces of retrieved information, leading to incoherent outputs.
Cost: Running RAG systems can be expensive, requiring both vector storage infrastructure and increased compute for the retrieval and generation processes
Chunking Errors: Improperly splitting documents can cause vital information to be missing or disjointed.
Debugging Difficulties: Due to the multiple moving parts (retriever + LLM), identifying the root cause of a poor answer is complex.

Conclusion

In short, RAG is a practical way to make AI smarter and more useful for your specific needs. Instead of spending a lot of time and money trying to "teach" an AI everything from scratch, RAG simply gives the AI the ability to look up facts from your own private documents before it answers a question, much like taking an open-book test.

The Evolution of Humanoid Robots: From Ancient Mythology to Futuristic Companions

Devpratap Tomar — Tue, 20 Jan 2026 19:10:57 +0000

Ancient History and Mythical Origins:
The story of humanoid robots starts not in labs or factories, but in the distant corners of human imagination. Around the 4th century BCE, Greek mythology introduced the idea of mechanical beings designed in human shape. **Hephaestus **had golden handmaidens that helped him in his forge, and Talos was a bronze automaton that protected Crete by throwing boulders at invaders. These stories showed humanity’s early interest in making life-like machines to serve and protect.

Meanwhile, in the 3rd century BCE in China, a Taoist text talked about an inventive engineer named Yan Shi. He presented a realistic mechanical figure to King Mu of Zhou. This automaton could walk, sing, and even flirt but was taken apart when it became too convincing. These mythological stories sparked the belief that humans could someday create companions that resemble us in form and function.

"Automaton: A self-operating machine or mechanism, often resembling a human or animal, that performs tasks automatically"

The Renaissance era bridged the gap, turning distant fantasies into tangible designs. In 1495, the visionary Leonardo da Vinci sketched what is considered one of the first verifiable blueprints for a humanoid automaton: a mechanical knight wearing armor, operated by a complex system of pulleys, cables, and gears.

This device was designed to sit up, move its arms, and turn its head, showing da Vinci’s mix of anatomy and engineering. Although it was never built during his lifetime, it marked an important shift from myth to mechanics. It inspired future inventors to create automated, human-like figures.

The Birth of the Modern Concept (Early 20th Century):
The beginning of the 20th century turned these ideas into cultural and scientific discussions. In 1921, Czech writer Karel Čapek introduced the term “robot” in his play R.U.R. (Rossum’s Universal Robots). The term comes from the Czech word “robota,” which means forced labor.

The play showed artificial workers rising against their creators. It sparked global conversations about the ethics and risks of synthetic beings. Building on this, in 1942, science fiction author Isaac Asimov came up with the “Three Laws of Robotics” in his short story “Runaround.” These laws stated that robots must protect humans, obey orders, and keep themselves safe. These principles would shape robotics ethics in the real world for many years.

Post-War Foundations and Early Autonomy (1940s–1960s):
After World War II, advancements moved the field from fiction to prototypes. In 1949, British scientist William Grey Walter created simple, tortoise-shaped robots. They had sensors that allowed them to navigate their environments on their own and avoid obstacles with basic feedback loops. This showed that self-directed machines were possible.

Then, in the 1960s, practical industrial automation began to develop. George Devol patented the first programmable robotic arm in 1954. Joseph Engelberger commercialized it as Unimate, which was installed in a **General Motors **factory in 1961 to handle hot die-cast metal parts. Although not humanoid, Unimate laid the foundation for robotic manipulation, proving that machines could perform repetitive and dangerous tasks reliably.

The Dawn of True Humanoid Robots (1970s–1980s):
The 1970s marked the emergence of real humanoid robots. In 1972, researchers at Waseda University in Japan introduced WABOT-1, the world’s first full-scale anthropomorphic robot. It had limbs and a head, enabling it to walk on two legs, grasp objects, and hold simple conversations using voice recognition and synthesis. This breakthrough demonstrated that it was possible to combine movement, handling objects, and interaction. Building on this success, Waseda launched WABOT-2 in 1984. This enhanced version could read sheet music and play a keyboard organ, blending robotics with art and showcasing improvements in vision and dexterity.

Mobility and Human-Like Movement (1990s–2000s):
The 1990s and early 2000s experienced fast growth in mobility and human interaction. In 2000, Honda launched ASIMO, a compact humanoid robot that could walk, climb stairs, recognize faces and gestures, and even run at speeds of up to 3.7 miles per hour. ASIMO became a global symbol by appearing in everyday situations, like serving drinks or conducting orchestras. Soon after, in 2003, Sony introduced QRIO, a smaller and more agile humanoid. QRIO could dance, run, and recover from falls, demonstrating its balance and flexibility.

Improvements And Real-World Applications:
As the decade went on, companies like Boston Dynamics made significant improvements in their physical capabilities. Their Atlas **robot, which developed throughout the 2010s, could perform acrobatic moves like backflips and parkour, thanks to better hydraulics and AI. In 2021, Tesla joined the effort by announcing **Optimus, a general-purpose humanoid designed for household tasks. This indicated a shift toward consumer applications.

Commercialization and Mass Deployment:
This progress, from basic autonomy in the mid-20th century to more complex AI integration, has been fueled by lower hardware costs, increased computing power, and improvements in machine learning. By the 2020s, humanoid robots moved from prototypes to commercial tools, helping to fill labor shortages in aging populations and dangerous industries.

Today, in early 2026, humanoid robots are being used in real-world settings. At CES 2026, more than 35 models were displayed, indicating a quick move to commercialization. For instance, UBTECH delivered over 1,000 Walker S2 units to Chinese factories in 2025 for tasks like object manipulation and navigation.

Looking forward, humanoid robots are expected to play a significant role in society. By the late 2020s and into the 2030s, experts predict that 10–20 million units will be in use worldwide. This number could increase to 1–3 billion by 2050 as prices drop to $10,000-$25,000 per unit. Markets are expected to reach $38 billion by 2035 and possibly $5 trillion by 2050, with applications in manufacturing (30% share), logistics (25%), healthcare (20%), and households (15%).

Advancements in AI are bringing us closer to artificial general intelligence. This will enable adaptive learning. Lighter materials and biomechanics will offer near-human dexterity for tasks like eldercare, precise surgery, and disaster response. By 2030, these technologies could automate 30% of work hours in the U.S., impacting economies and city layouts. However, we need to tackle regulatory, ethical, and social challenges, such as job loss and safety standards. China is poised to lead in scale due to government support. Still, global collaboration could lead to a time when humanoids transition from being assistants to vital partners, fulfilling the age-old visions that started it all.