DEV Community

Cover image for From 70s Vectors to Modern AI Agents
Maxim
Maxim

Posted on

From 70s Vectors to Modern AI Agents

The Puzzle is Complete 🧩

Started by scrutinizing how things actually work under the hood. When you dig into the pipeline, you realize that a modern AI agent is essentially just a marriage between a regular chatbot and a pre-trained model.

Wrapped all together, they gave it the name "AI Agent":

  • Vector DBs: Marketed as a “new” type of database for the masses (Pinecone, etc.), yet the core concepts of vector representations and spatial indexing date back to the 1970s.
  • API and Cloud services: The delivery mechanism.
  • RAG pipelines and LLMs: Serving as the interface for the everyday user.

The ideas of similarity and finding nearest neighbors are far from new. These concepts have been used for decades in search, computer vision, pattern recognition, and research systems—long before the public AI hype.

The only "novelty" is that we can now access this through the API!


What Happens Under the Hood of an AI Agent? (The Pipeline!)

When we write a prompt query, these steps occur:

1. The Tokenizer

The text is broken down into tokens. A word or symbol becomes an ID. Essentially, this involves a simple data structure (Map) and some "cutting" algorithm.

Note: In open-source projects, these are just vocab.json or merges.txt files located next to the model.

const userQuestion = "How does Vector DB work?"

// Tokenizer / vocabulary in process:
// input  -> ["How", " does", " Vector", " DB", " work", "?"]
// output -> [2437, 857, 12944, 6212, 990, 30]
Enter fullscreen mode Exit fullscreen mode

Well, this is basic—it’s essentially a lookup table.

2. The Embedding Layer

Tokens enter the model and are converted into vectors (high-dimensional coordinates).

2437 -> [0.12, -0.03, 0.88, ...]

3. Attention

This mechanism calculates how the tokens in the context relate to each other. In the question "How does Vector DB work?", the model connects "work" to "Vector DB" to understand the meaning. It is almost like finding the keywords that define the intent of the question.

"At this stage, it hit me—this was all covered in Andrew Ng's Machine Learning videos from Stanford University on Coursera, which I watched a couple of years ago."

4. Matrix Multiplication

This is the core math. Vectors are multiplied by the weights of the pre-trained model (like ChatGPT or Gemini).

Weights as a Weighted Average: Instead of a simple average where all data points are equal, a weighted average is used. Data points with greater weight have a stronger influence on the final result. The model simply runs the data through a function it already knows:

Vectors × Weights ⮕ New Vectors / Scores (Probability Calculation)

5. Next Token Prediction

The output gives us the probability for the next token. We choose one, add it to the context, and repeat the entire cycle (The Loop).


The Reality Check

Back then, we ran parameters through a function, calculated the error/cost, and moved toward the minimum:
Input ⮕ Trained Function ⮕ Prediction.

This is all covered in a Coursera course that came out over 12 years ago! If these concepts were being taught openly so long ago, it makes you wonder: how many closed corporate, research, or military systems have been leveraging these techniques for decades under the radar?

  • Air and Missile Defense: Real-time trajectory and decision systems.
  • Satellite Intelligence: Pattern recognition and data filtering.
  • Medical Systems: Expert systems like MYCIN, which date back to the 70s and 80s.

4. The "Context Window" problem and RAG 🧠

Too much data for the context windows!! The application simply cuts off the older data following the FIFO (First In, First Out) rule. The model doesn't "forget" anything; it just isn't sent part of the conversation history.

while (countTokens(messages) > contextLimit) {
  messages.shift() // Old context is simply dropped
}
Enter fullscreen mode Exit fullscreen mode

Want to make it better? Try to "feed" the model with knowledge using RAG (Retrieval-Augmented Generation).

What is it?
RAG is a technique where the model first searches for relevant information in external sources or what you provided (links), and then generates an answer based on that data.

How it works:
User questionDocument searchFound fragments + QuestionLanguage modelAnswer.

Why is it needed?

  • Eliminates "hallucinations" — because the model doesn't answer based on its own "memory."
  • Relevance: RAG allows the model to use fresh data without retraining it (it's cheaper that way).

5. Orchestration: Who Runs the Process? 🎮

In standard ChatGPT, the model itself doesn't decide which chat history to use or which tools to call. This is managed by the orchestration layer—a tool that works around the model, not inside it.

When a simple chat isn't enough, libraries like LangGraph come into play, offering an imperative approach to the process: Do A first, then continue to B (deterministic flow control).

IF the LLM is a probabilistic engine (calculating probability and cost loss minimum)—exactly what Andrew Ng discussed in his Stanford University course—THEN orchestration is a State Machine.

Example of what orchestration does:

"First, go to SQL for facts. If that's not enough, check the Vector DB. If you're tired, go drink coffee (just kidding). If the result is bad, go for a repeat cycle."

Process Schema:

  1. User question (prompt) ⮕ runs internal orchestration layer.
  2. Check state / memory.
  3. Decide route.
  4. Maybe search Vector DB / retrieved context.
  5. Maybe call tool.
  6. Build final context ⮕ Call LLM.
  7. Validate / post-process answer.
  8. Finish or loop again.

Conclusion: The Agent is the Architecture 🏗️

Strip away the marketing, and an "AI Agent" is just an imperative state machine wrapped around a probabilistic engine. The LLM handles the "vibes," but the architecture handles the execution.

We aren't building minds; just running complex and expensive cloud pipelines. Are we living in the Matrix? Maybe just in the statistical Matrix.

AI Agents operate according to rules that we cannot know—as they are not transparent enough—regarding logic management, strict censorship, and internal protocols. However, you can do all of this on your local machine and build your own model for your specific tasks.

Top comments (0)