suissAI

Posted on Mar 8

Vectorless RAG: Entenda Como Fazer RAG Sem Vector Database

#rag #ai #llm #database

A indústria de Retrieval-Augmented Generation (RAG) passou os últimos dois anos orbitando uma mesma ideia: converter texto em embeddings vetoriais e buscar por similaridade.

Isso funciona… até certo ponto.

Quando os documentos ficam longos, estruturados ou altamente semânticos (código, leis, RFCs, papers), o modelo vetorial começa a quebrar contexto e perder relações lógicas.

Surge então um paradigma novo:

Vectorless RAG.

Em vez de buscar por similaridade geométrica em espaço vetorial, o sistema navega estruturas semânticas explícitas do documento.

O LLM passa a agir como um planejador de busca, explorando um índice hierárquico.

Vectorless RAG: Retrieval por Navegação Semântica

O modelo clássico de RAG:

query
 ↓
embedding
 ↓
vector database
 ↓
similarity search
 ↓
top-k chunks
 ↓
LLM

O modelo vectorless:

query
 ↓
reasoning planner
 ↓
document index traversal
 ↓
evidence extraction
 ↓
LLM synthesis

O que muda:

aspecto	Vector RAG	Vectorless
context preservation	fraco	forte
explainability	baixo	alto
long documents	ruim	bom
hierarchical knowledge	ruim	excelente

O Problema Estrutural do Vector RAG

Embeddings exigem chunking.

Isso destrói estrutura.

Exemplo:

Documento original:

Section 3 – Authentication

3.1 OAuth Flow
3.2 Token Refresh
3.3 Revocation

Após chunking:

chunk_17:
OAuth Flow uses a refresh token...

chunk_18:
Revocation endpoints invalidate tokens...

O modelo perde:

hierarquia
relação entre seções
escopo lógico

Vectorless RAG preserva isso.

Estrutura Central: Document Tree Index

O documento é convertido em uma árvore semântica navegável.

Em TypeScript com tipagem semântico-nominal, usamos branded types para impedir mistura semântica entre tipos diferentes.

type Brand<T, B extends string> = T & { readonly __brand: B }

type NodeTitle = Brand<string, "NodeTitle">
type NodeContent = Brand<string, "NodeContent">
type NodeId = Brand<string, "NodeId">

type Node = {
  id: NodeId
  title: NodeTitle
  content: NodeContent
  children: Node[]
}

Indexação

Durante a indexação, o documento é convertido em uma árvore navegável.

type Section = {
  title: string
  text: string
  subsections: Section[]
}

declare function parseSections(doc: string): Section[]

function buildIndex(document: string): Node {

  const root: Node = {
    id: "root" as NodeId,
    title: "Document" as NodeTitle,
    content: "" as NodeContent,
    children: []
  }

  for (const section of parseSections(document)) {

    const node: Node = {
      id: crypto.randomUUID() as NodeId,
      title: section.title as NodeTitle,
      content: section.text as NodeContent,
      children: []
    }

    for (const subsection of section.subsections) {

      const child: Node = {
        id: crypto.randomUUID() as NodeId,
        title: subsection.title as NodeTitle,
        content: subsection.text as NodeContent,
        children: []
      }

      node.children.push(child)

    }

    root.children.push(node)

  }

  return root
}

Resultado:

Document
 ├─ Introduction
 ├─ Authentication
 │   ├─ OAuth Flow
 │   ├─ Token Refresh
 │   └─ Revocation
 └─ Security

Retrieval por Raciocínio

Em vez de buscar por vetor, o LLM decide qual ramo explorar.

type Query = Brand<string, "UserQuery">

declare function llm(prompt: string): Promise<string>
declare function parseSelection(text: string): NodeTitle[]

async function selectNodes(
  query: Query,
  node: Node
): Promise<NodeTitle[]> {

  const titles = node.children.map(c => c.title)

  const prompt = `
User query: ${query}

Available sections:
${titles.join("\n")}

Which sections are relevant?
`

  const response = await llm(prompt)

  return parseSelection(response)

}

Exploração da Árvore

async function traverse(
  query: Query,
  node: Node
): Promise<NodeContent[]> {

  const selected = await selectNodes(query, node)

  const evidence: NodeContent[] = []

  for (const child of node.children) {

    if (selected.includes(child.title)) {

      evidence.push(child.content)

      const deeper = await traverse(query, child)

      evidence.push(...deeper)

    }

  }

  return evidence
}

Evidence Gathering

Após a navegação, o sistema coleta apenas o conteúdo relevante.

type Context = Brand<string, "ContextBlock">

async function gatherContext(
  query: Query,
  index: Node
): Promise<Context> {

  const evidence = await traverse(query, index)

  const combined =
    evidence
      .slice(0, 4000)
      .join("\n")

  return combined as Context
}

Geração Final

type Answer = Brand<string, "LLMAnswer">

async function answer(
  query: Query,
  index: Node
): Promise<Answer> {

  const context = await gatherContext(query, index)

  const prompt = `
Context:
${context}

Question:
${query}

Answer using only the context.
`

  const response = await llm(prompt)

  return response as Answer
}

Diferença Computacional

Vector search:

O(log n)

Tree traversal + reasoning:

O(depth × reasoning_steps)

O ganho vem da qualidade do contexto.

Indexação Automática de Documentos

Um parser pode gerar a árvore automaticamente.

Exemplo para Markdown usando TypeScript:

import { marked } from "marked"
import { JSDOM } from "jsdom"

function markdownTree(mdText: string): Node {

  const html = marked(mdText)

  const dom = new JSDOM(html)

  const root: Node = {
    id: "doc" as NodeId,
    title: "doc" as NodeTitle,
    content: "" as NodeContent,
    children: []
  }

  let current: Node = root

  const tags = dom.window.document.querySelectorAll("h1, h2, p")

  tags.forEach(tag => {

    if (tag.tagName === "H1") {

      current = {
        id: crypto.randomUUID() as NodeId,
        title: tag.textContent as NodeTitle,
        content: "" as NodeContent,
        children: []
      }

      root.children.push(current)

    }

    if (tag.tagName === "P") {

      current.content =
        ((current.content as string) +
          tag.textContent +
          "\n") as NodeContent

    }

  })

  return root
}

Otimizações Importantes

Summary Nodes

Cada nó pode receber um resumo.

type Summary = Brand<string, "NodeSummary">

async function summarizeNode(
  node: Node
): Promise<Summary> {

  const summary = await llm(
    `Summarize: ${node.content}`
  )

  return summary as Summary
}

Pruning

Limitar profundidade da exploração.

const MAX_DEPTH = 5

function prune(depth: number) {

  if (depth > MAX_DEPTH) {

    return

  }

}

Beam Search

Explorar múltiplos ramos da árvore.

type Score = Brand<number, "TraversalScore">

type Candidate = {
  node: Node
  score: Score
}

function beamSearch(
  children: Node[],
  score: (n: Node) => Score,
  k: number
): Node[] {

  const candidates: Candidate[] =
    children.map(n => ({
      node: n,
      score: score(n)
    }))

  candidates.sort(
    (a, b) =>
      (b.score as number) - (a.score as number)
  )

  return candidates
    .slice(0, k)
    .map(c => c.node)
}

Aplicação em Code RAG

Vector search em código falha porque:

código não é semântico como linguagem natural
dependências importam

Vectorless usa a AST do TypeScript.

import ts from "typescript"
import fs from "fs"

function buildCodeIndex(path: string): Node {

  const code = fs.readFileSync(path, "utf8")

  const source = ts.createSourceFile(
    path,
    code,
    ts.ScriptTarget.Latest,
    true
  )

  const node: Node = {
    id: path as NodeId,
    title: path as NodeTitle,
    content: "" as NodeContent,
    children: []
  }

  function visit(n: ts.Node) {

    if (ts.isFunctionDeclaration(n) && n.name) {

      const name = n.name.text

      const content =
        code.slice(n.pos, n.end)

      node.children.push({
        id: crypto.randomUUID() as NodeId,
        title: name as NodeTitle,
        content: content as NodeContent,
        children: []
      })

    }

    ts.forEachChild(n, visit)

  }

  visit(source)

  return node
}

Aplicação em Sistemas de Agentes

Vectorless RAG funciona muito bem como memória episódica de agentes.

AgentMemory
 ├─ Project A
 │   ├─ Task 1
 │   ├─ Task 2
 └─ Project B

Consulta:

what decisions were made for project A?

O planner explora apenas o ramo relevante.

Comparação Empírica

Em documentos longos (>100 páginas), benchmarks indicam:

método	factual accuracy
Vector RAG	~60–70%
Vectorless	~80–90%

Motivos:

mais contexto preservado
menos fragmentação semântica

Quando NÃO usar Vectorless

Vectorless RAG não é ideal para:

corpora gigantes (milhões de documentos)
busca semântica totalmente aberta
queries muito vagas

Nesses cenários, embeddings ainda funcionam melhor.

Arquitetura Híbrida Recomendada

O futuro provavelmente é Hybrid RAG.

query
 ↓
vector retrieval (doc level)
 ↓
vectorless traversal (inside doc)
 ↓
LLM

Conclusão

Vectorless RAG muda o paradigma de retrieval.

De:

semantic similarity

para

semantic navigation

O LLM deixa de ser apenas um gerador de texto e passa a atuar como um motor de exploração cognitiva sobre estruturas de conhecimento.

Essa mudança aproxima os sistemas de IA de algo muito mais parecido com como humanos realmente pesquisam informação.

DEV Community