You've probably built a RAG pipeline. You've chunked documents, embedded them, stored them in a vector DB, retrieved context for your LLM calls.
That's a good start. But it's missing something fundamental.
RAG is an engineering technique. What I want to describe is an epistemic architecture — a way of structuring a thinker's corpus so that it becomes queryable, navigable, and genuinely representative of a specific intellectual perspective.
I call it a Denkraum (German: "thinking space"). Here's what it is, how it's built, and why it matters.
The problem RAG doesn't solve
Standard RAG gives you document retrieval. You ask a question, get relevant chunks, feed them to the model.
The model still does the heavy lifting. It synthesizes, infers, reasons — from general training knowledge, with your chunks as context.
This means:
- Responses are grounded in the retrieved documents plus everything the model learned from the internet
- The perspective is the model's, shaped by your chunks
- There's no persistent structure — every query starts from scratch
- The model can't distinguish between what the author explicitly argued and what it's inferring
What you actually want, if you're building a knowledge system for a specific thinker, is something different: a system that responds from a corpus, not about it.
What a Denkraum is
A Denkraum is a published semantic space built from a thinker's corpus. The key properties:
- Dynamic: grows as new texts are added
- Relational: units exist in a network of explicit argumentative relations, not just proximity
- Traceable: every response traces back to source chunks
- Voiced: responses reflect the thinker's epistemic stance, not the model's default
The architecture has eight layers:
Archive
└── State Registry
└── Chunk Store
├── Vector Index
└── Graph Index
└── Hybrid Retrieval
└── Stylesheet
└── Interface
Let me walk through each.
Layer 1: The Archive
All original texts — essays, notes, lectures, fragments, drafts — stored as plain text files, versioned, never deleted.
This is your source of truth. Everything downstream is derived from it and can be recomputed. The principle is the same as raw data in data engineering: preserve the original, derive everything else.
/archive
/2019
essay_on_markets.txt
lecture_notes_03.txt
/2023
paper_draft_v2.txt
seminar_transcript.txt
Nothing is deleted. Revision is an intellectual event — it stays visible.
Layer 2: State Registry
A lightweight table tracking which documents have been processed and which are new or modified.
CREATE TABLE document_state (
doc_id TEXT PRIMARY KEY,
path TEXT,
hash TEXT,
processed_at TIMESTAMP,
chunk_count INTEGER,
status TEXT -- 'pending' | 'processed' | 'modified'
);
This is your incremental processing layer. You don't reprocess the entire corpus every time a new document is added.
Layer 3: Chunk Store
The canonical data structure of the Denkraum.
A language model segments each document into minimal, self-contained semantic units — chunks. This is not mechanical splitting by token count. It's semantic segmentation: the model identifies where a thought begins and ends, what role it plays in the argument, how it relates to neighboring units.
chunk_schema = {
"chunk_id": str, # uuid
"doc_id": str, # source document
"content": str, # the chunk text
"position": int, # position in document
"role": str, # 'thesis' | 'argument' | 'example' | 'qualification'
"created_at": datetime
}
All downstream layers are indices over this Chunk Store.
Layer 4: Vector Index
Each chunk is embedded and stored in a vector database. Standard stuff — but the semantic segmentation in Layer 3 matters here. Better chunks produce better retrieval.
# Embed and store
embedding = embed(chunk["content"]) # e.g. text-embedding-3-large
vector_store.upsert(
id=chunk["chunk_id"],
vector=embedding,
metadata={
"doc_id": chunk["doc_id"],
"role": chunk["role"],
"position": chunk["position"]
}
)
Similar thoughts lie close together in this space. The Vector Index makes semantic proximity searchable.
Layer 5: Graph Index
This is where the Denkraum diverges from standard RAG. And it's the most important layer.
The Graph Index models explicit argumentative relations between chunks:
relation_types = [
"supports", # chunk A provides evidence for chunk B
"refutes", # chunk A contradicts chunk B
"refines", # chunk A qualifies or sharpens chunk B
"synthesizes", # chunk A integrates chunk B and chunk C
"precedes", # chunk A is an earlier formulation of chunk B
]
edge_schema = {
"edge_id": str,
"source_chunk": str,
"target_chunk": str,
"relation": str, # one of relation_types
"confidence": float,
"established": str # 'local' (within doc) | 'global' (cross-doc)
}
These relations are established in two passes:
- Local pass: within each document, the model identifies the argument structure
- Global pass: across documents, the model identifies how ideas developed, were revised, or synthesized over time
The Graph Index is not an index of texts. It is an index of thinking itself.
A thesis from 2019 can be made visible as the precursor of a more refined thesis from 2023. A contradiction between two texts from different years is not an error — it's an intellectual event.
Layer 6: Hybrid Retrieval
Query processing combines both indices:
def retrieve(query: str, top_k: int = 10) -> list[Chunk]:
# Step 1: semantic expansion
expanded = expand_query(query) # LLM-generated variants
# Step 2: vector search
candidates = vector_store.search(expanded, top_k=top_k * 2)
# Step 3: graph traversal
enriched = []
for chunk in candidates:
neighbors = graph.get_neighbors(
chunk_id=chunk.id,
relation_types=["supports", "refines", "synthesizes"],
depth=2
)
enriched.extend(neighbors)
# Step 4: deduplicate and rank
return rank_and_deduplicate(candidates + enriched)[:top_k]
The result is not a flat stack of similar passages. It's a structured context — key theses, supporting arguments, qualifications, syntheses.
Layer 7: Stylesheet
Not a data layer. An epistemic layer.
The Stylesheet describes the thinker's voice: how they pose questions, structure arguments, handle uncertainty, introduce concepts. It's injected as a system prompt with every response generation.
STYLESHEET: Alexander Markowetz
Epistemic stance:
- Frames arguments as structural claims, not value judgements
- Distinguishes between medium-induced and subject-matter-justified order
- Treats digitalization as civilizational rupture, not incremental change
Argumentative logic:
- Opens with the structural problem, then proposes the inversion
- Uses precise analogies (HTML/CSS, CPU/hard drive) rather than metaphors
- Names what's being given up, not just what's being gained
Voice:
- Dense but not jargon-heavy
- Declarative sentences for theses, longer sentences for qualifications
- No hedging on core claims
The semantic space relates to the Stylesheet as HTML relates to CSS. Without it, the Denkraum has content but no voice.
Layer 8: Interface
The chatbot is the most immediate interface. But it's not the only one:
- Chat: dialogue in which the user asks questions and the Denkraum responds
- API: machine queries for programmatic access
- Book generator: produces derivative texts from the corpus
- Comparison interface: measures semantic distance between two Denkräume
Why this is different from a chatbot over your docs
Standard approach:
User query → vector search → top-k chunks → LLM → response
The model synthesizes from its training knowledge, using your chunks as context.
Denkraum approach:
User query → semantic expansion → hybrid retrieval (vector + graph)
→ structured context (theses + arguments + relations)
→ Stylesheet injection → LLM → response
The model responds from the corpus, in the thinker's voice, with explicit argumentative structure as context.
The difference:
- Standard RAG: plausible response grounded in your docs
- Denkraum: anchored response derived from a specific intellectual perspective
Language models simulate knowledge. The Denkraum represents it.
The economics: compute vs. structure
There's a fundamental trade-off in computing: reduce computation by investing in precomputed structure, or reduce storage by recomputing on demand.
LLMs sit at the extreme end of computation. Every response is generated fresh. Every query costs tokens.
The Denkraum takes the opposite approach:
| LLM (standard) | Denkraum | |
|---|---|---|
| Cost structure | Low upfront, high recurring | High upfront, low recurring |
| Intelligence type | Just-in-time | Ahead-of-time |
| Perspective | Aggregated | Situated |
| Ownership | Platform | User |
Once built, the Denkraum can be queried indefinitely at low computational cost. The marginal cost of an additional query approaches zero.
The ownership problem
Here's the part that doesn't get discussed enough.
When you use a language model, you accumulate nothing. Each interaction is processed and forgotten on your side. The platform accumulates usage patterns, query structures, implicit knowledge about what you don't know.
The platform grows. You don't.
In classical computing, we take a separation for granted: CPU computes, hard drive stores. No one thinks the CPU manufacturer should own everything computed on the machine.
In the current AI paradigm, this separation doesn't exist. The Denkraum restores it: let the model compute, but store the knowledge yourself.
Classical computing: [CPU] ←→ [Storage] — separated, independently owned
Current AI: [LLM + implicit storage] — coupled, platform-owned
Denkraum model: [LLM] ←→ [Denkraum] — separated, user-owned
A user without a Denkraum is epistemically stateless. The Denkraum is the hard drive for your thinking.
What's next
The Denkraum is not a product. It's an architecture pattern. The components exist — vector stores, graph databases, LLM APIs, embedding models. What's missing is the framing: building these not as retrieval systems but as epistemic infrastructure.
If you're building something in this space — or thinking about it — I'd like to know.
The full paper (with architecture details and epistemological grounding) is available on request.
Alexander Markowetz is an informatician and honorary professor at Philipps-Universität Marburg, working at the intersection of information systems, digital market architecture, and societal transformation.
Top comments (0)