In just over a decade, we’ve gone from talking about AI as a science-fiction concept—something only Sarah Connor took seriously—to having it almost by default, even in our electric toothbrushes. And if you don’t see it that way, I hope you’ve got a self-sufficient farm running somewhere, because this is the direction the software industry is heading, and denying it won’t make it disappear.
Dog years
One year in the human world is roughly equivalent to five years when it comes to AI-related technologies. If you started using an AI-based tool in a project a couple of years ago, chances are that the technology you relied on is now completely obsolete. New tools and paradigms emerge every day, leaving behind concepts that were considered revolutionary and innovative just a few months ago.
Staying on top of things is a must in this industry. It’s no secret that you need one eye on the latest updates of the stack you’re working with, and another on the new technologies gaining traction, if you don’t want to fall out of the game. But this pace is dizzying even for those of us who are just trying to make a living—and pay our streaming subscriptions—in the software industry. That’s why I’ve tried to put together a short list of terms that might be helpful if, like me, you have the feeling that this train is leaving without you (the paradox being that this list will very likely be outdated by the time you finish reading it).
1-Agentic AI:
This is the top one, no discussion here. These days, unless you’ve been living under a rock, you’ve probably noticed that almost everyone is building something that claims to involve AI agents. This is a good moment to pause for a second and clarify terms: what is an AI agent, really?
Let’s start with what it is not. An AI agent is not a chatbot. A chatbot responds to a prompt and stops there. An agent, by contrast, operates in a more autonomous way: it can receive information from its environment (inputs, state, APIs), reason about it, decide which action to take, and execute it. It then observes the outcome and, if necessary, repeats the cycle.
This perceive–reason–act–observe loop, together with some degree of memory and tool usage, is what differentiates an agent from a simple conversation with a language model. This approach can be applied to almost any programmatically definable context—from travel agency workflows to data analysis or DevOps engineering tasks.
2-Large Reasoning Model:
Many agents rely on language models optimized for reasoning tasks, often informally referred to as Large Reasoning Models. Before going deeper, it’s worth clarifying what they are—and what they are not.
In most commercial chatbots, one of the main priorities is to deliver fast responses. This works well for simple tasks, but as problems become more complex, multiple iterations are often required to reach a satisfactory result.
In some cases, the system takes longer to generate a response, which users usually perceive as the familiar “thinking…” message. This doesn’t mean a different model is being used, but rather that the model is producing a longer output or following a more elaborate internal reasoning process.
When we talk about Large Reasoning Models, we generally mean models that are better tuned to break problems down into intermediate steps, maintain coherence throughout the reasoning process, and handle tasks that require planning. This kind of behavior—whether emerging from the model itself or supported by external logic—is exactly what AI agents need to operate effectively.
3-Vector Database:
If you’ve ever spent even a moment thinking about AI from a technical perspective, you’ve probably asked yourself this question: where is all this information stored? And it’s not a trivial question.
Even without a deep understanding of how LLMs work internally, it’s easy to see that the real challenge isn’t storing data—traditional databases are still perfectly fine for that—but rather being able to search and relate information by meaning, not just by exact matches.
Before diving into vector databases, we need to introduce two more key concepts: embedding models and vectors.
**4-EMbeding Model: An embedding model is responsible for transforming data such as text, images, or audio into numerical vectors. These vectors are mathematical representations that allow us to capture similarity relationships between different pieces of data.
5-What is a vector?
A vector is essentially a list of numbers that represents an object within a high-dimensional mathematical space. While individual dimensions are usually not directly interpretable, the distance between vectors allows us to measure how semantically similar two elements are.
This is where vector databases come into play: they make it possible to perform mathematical operations to find nearby vectors, which translates into searching and working with semantically similar content.
For example, in a traditional database, an image is typically stored as a BLOB (Binary Large Object), which allows us to store and retrieve it but tells us nothing about its content. In a vector database, that same image is processed using an embedding model to generate a numerical vector.
As a simplification, we can imagine some dimensions representing concepts like “landscape,” “mountains,” or “people.” In practice, these dimensions are neither explicit nor human-readable, but the result is the same: images with similar content end up close to each other in the vector space, making search and comparison much easier.
That, essentially, is how vector databases work.
6. RAG (Retrieval Augmented Generation):
This is where things get interesting, because RAG brings together many of the concepts we’ve covered so far. At its core, RAG is a technique that enriches an LLM’s generation with relevant external information retrieved dynamically at runtime.
The process typically works as follows: when a user submits a prompt, it is passed to a component called the retriever. The retriever converts the query into a vector using an embedding model and performs a similarity search against a vector database. Rather than returning a single result, it usually retrieves multiple relevant chunks of information.
These chunks—now in plain text, not vectors—are then injected into the prompt sent to the LLM. This allows the model to generate responses grounded in specific, up-to-date, or private context, without having been trained on that information beforehand.
7. MCP (Model Context Protocol):
For an LLM to be truly useful, it needs to be able to interact with external resources. It’s not enough for it to generate text in isolation; it must connect to databases, services, and other tools. MCP is a pattern that defines how information from these sources is managed and channeled to the LLM.
The MCP server acts as an intermediary between the model and external services—such as databases or email systems—so developers don’t have to reinvent the connection every time they want the LLM to access a new resource.
8.MOE (Mixture of Experts):
MOE splits an LLM into multiple specialized subnetworks, called “experts.” A routing mechanism decides which of these experts to activate for a given task, so that only the necessary ones are used. Each expert produces an output that is then combined, usually through a weighted sum determined by the routing mechanism.
This architecture allows scaling to models with billions of parameters without consuming all the model’s resources at each step, significantly optimizing efficiency and performance.
9.Agentic RAG:
If the concept of RAG is already somewhat abstract, Agentic RAG takes it a step further. While traditional RAG has simple retrieval, limited adaptability, and relies on static knowledge, Agentic RAG incorporates AI agents that can decide which tools to use, formulate retrieval strategies, and refine queries for more accurate and flexible responses.
At a high level, the workflow is as follows:
1- The user query is directed to an AI agent for processing.
2- The agent uses short-term and long-term memory to track query context, defines a retrieval strategy, and selects the most appropriate tools.
3- The data fetching process can use tools such as vector search, multiple agents, or MCP servers to gather relevant information from the knowledge base.
4- The agent combines the retrieved data with the query and system prompt and passes it to the LLM.
5- The LLM processes this optimized input to generate a response to the user’s query.
10. ASI (Artificial Super Intelligence):
And this is where one might be tempted to throw the computer out the window and consider a career as a farmer. The concept of ASI, as far as I know, is more theoretical than practical, but it exists and is important to keep in mind, especially given the speed at which technology evolves.
All the tools and concepts described so far, which we’re somewhat familiar with, slowly approach the paradigm of AGI (Artificial General Intelligence). Simplifying, this means systems capable of performing any task at the level of a human expert. ASI goes a step further, implying a system with a much broader scope, capable of self-improvement, solving problems “better” than a human expert, and even posing problems we cannot yet imagine.
And so, between autonomous agents, databases that “understand” your photos, architectures that decide which expert to activate, and the ever-elusive promise of ASI, it’s hard not to feel a bit overwhelmed… and utterly fascinated at the same time. The good news is, you don’t need to be a guru to hop on this train: all it takes is curiosity, patience, and a strong cup of coffee. Who knows, maybe in a few years my toaster will have its own RAG agent and recommend the perfect breakfast recipe while I’m still trying to figure out what my LLM is doing.
Top comments (0)