Ask vs Act: RAG, Tool Use and AI agents

#agents #ai #architecture #rag

Ask vs Act: RAG, Tool Use and AI agents

Series: Part 3 of 5 — CQRS and architecture for AI agents.

Reading time: ~6 min.

In the previous article we covered CQRS fundamentals and practice in Elixir with Queries and Commands. Here we apply that split to AI agent orchestration: Ask (RAG, read) vs Act (Tool Use, commands).

The "Ask vs Act" paradigm: CQRS in AI agent automation

Applying the CQRS pattern proves not only beneficial but mandatory in orchestrating workflows driven by AI agents. When dealing with autonomous Artificial Intelligence, the distinction between observing the environment (Perception) and interacting with it (Action) mirrors the Query vs Command split.

Treating knowledge retrieval the same way as corporate action execution is a serious mistake. An agent’s architecture must have two distinct ducts, shaped by CQRS.

The query side: RAG, reasoning and vector stores

When an AI agent needs information to plan its next steps or answer a question, it operates on the read path (Query Path). The dominant technique in this setting is Retrieval-Augmented Generation (RAG).

In RAG, the agent turns a request into a high-dimensional mathematical vector (an embedding) and queries a vector database to find the most semantically similar information segments (via Approximate Nearest Neighbor — ANN algorithms such as HNSW or LSH). Those text or structured data segments are appended to the original prompt, contextualizing the foundation model’s reasoning and strongly curbing hallucinations. Flow in short: user question → embedding → vector search → returned documents → context injected into prompt → LLM → answer.

From an architectural standpoint, the RAG flow is a CQRS projection par excellence. Because read operations are passive and side-effect free, the retrieval system can exploit great flexibility. The agent can query an analytical Data Warehouse (e.g. Snowflake) for historical financial metrics and a distributed vector index (e.g. Pinecone, Weaviate or Milvus) for unstructured corporate policies at the same time.

The eventual consistency inherent in CQRS is perfectly acceptable at this stage: LLM inference does not suffer catastrophically if the vector knowledge base is a few seconds behind real-time system state. Read-side scalability supports the massively concurrent pattern of AI — where before one user clicked a dashboard, a multi-agent architecture can fire hundreds of simultaneous searches to validate hypotheses in parallel.

Advanced optimization of the vector read model

Data engineering offers several ways to materialize this vector read model:

Multimodel databases (e.g. pgvector on PostgreSQL): extend robust relational repositories with vector search. The pgvector extension allows queries that combine semantic similarity with strict structured filters via traditional SQL. They benefit from no extra operational cost for separate infrastructure and show very high performance for small and medium datasets.
Managed vector services (e.g. Pinecone, Qdrant): built for massive scale, supporting horizontal partitioning and management of hundreds of millions (or billions) of high-dimensional vectors with sub-10ms latency at the 95th percentile (p95).

Agent accuracy can be improved by replacing naive RAG strategies with advanced methods such as Contextual Retrieval combined with hybrid search. Hybrid search mixes semantic search based on cosine similarity with traditional lexical retrieval based on keywords (using ranking functions like BM25), so that exact-match granular information is not lost.

The command side: tool execution and transactional integrity

In polar opposition, when the agent’s reasoning leads to a decision to act on the environment (e.g. process a refund via Stripe API, approve an HR request, or run a deploy utility), it moves to tool execution (Tool Execution), entering the Command domain.

The write path does not tolerate the probabilistic nature of large language models. Commands issued by the AI must be processed against the unified "source of truth" (Source of Authority). CQRS requires that the agent’s request does not change data directly but goes through a defensive orchestration layer.

At this stage deterministic business validations take over. If an agent tries to refund an account above corporate policy, it is the Command Handler’s domain model that blocks the operation — not a fragile prompt asking the LLM to "please don’t exceed the limit".

Critically, encapsulating actions as granular commands allows natural use of Human-in-the-Loop (HITL) for destructive or high financial-risk operations. The agent states the intention (the Command), the CQRS system holds the command in a moderation queue and marks it processed only when a human operator explicitly authorizes the state transition. This separation protects business entities from "operational hallucinations" and ensures forensic traceability of which model caused which change in the environment.

Operation dimension	Read Path (RAG / context)	Write Path (Tool Use / commands)
Nature of operation	Observational, semantic, exploratory, side-effect free.	Mutating, deterministic, structured, imperative.
Data source	Vector DBs, Data Lakes, caches, denormalized replicas.	Central transactional model, main API, Source of Authority.
Consistency	Moderate tolerance for Eventual Consistency (slightly stale data is acceptable).	Strict Strong Consistency and local transactional isolation (ACID).
Scale and concurrency	Massive parallelization; distributed read optimized for latency.	Sequential logical processing, optimistic, to prevent state violations.
AI governance	Mitigating text hallucinations via factual supplementation.	Mitigating damage to corporate APIs via interceptors and Human-in-the-Loop.