OrKa is not a chatbot kit. It is a modular cognitive architecture that executes graphs of purpose built agents defined in YAML. The value lives in determinism, memory with decay, and traces you can replay. That idea shaped the last six months of my life, for better and worse.
The nerve that started twitching
I kept seeing the same anti pattern in AI. One mega prompt trying to do everything. No real memory. No routing logic beyond a couple of if statements. Zero traceability. When a run went wrong, there was nothing to replay besides a vague console log.
So I designed OrKa as modular cognition. YAML graphs that define agents and service nodes. Fork paths when parallel thinking helps. Join them with rules. Route based on confidence or policies. Write to memory with explicit TTL and decay. Log every step with timestamps and payloads so a human can replay the path exactly.
That conviction stayed stable. The growth came from the work of turning it into a spine others can stand on.
What OrKa actually is
Short and precise.
- Orchestrator that executes YAML defined cognition graphs. Sequential, parallel, conditional.
- Agents that do reasoning work. Types like binary, classification, builder, router.
- Service nodes that mutate state. Memory writer, RAG fetcher, embedding fetcher.
- Memory model with layers and decay. Entries carry TTL, importance, and scope.
- Logging with structured events. Every fork, join, route, and write is traceable and replayable.
- Backends speak Redis today, Kafka in the private core. UI mirrors the engine state and lets you inspect runs.
If I cannot replay a cognition path with exact inputs, outputs, and routing decisions, it is not OrKa. It is hand waving.
Six months in snapshots
I am not going to dress this up as a neat arc. Cognition is a graph. The work matched that.
April. Friction beats fantasy
I shipped the first end to end link between the SDK, the API, and the UI. That flipped OrKa from concept to running stack. Then I made a predictable mistake. My public notes sounded nicer than the code. Some documentation read like a pitch. People called it out. They were correct.
I deleted text. I replaced buzzwords with YAML and traces. I elevated the primitive set and removed fluffy claims. Fork. Join. Router. Memory writer. RAG node. In the UI I showed live state, not edited outcomes. Ego down, bar up.
May. The first real spine
I locked fork join and router logic as canonical. The YAML flattening was corrected so branches do not leak beyond the nearest join. Agent attributes like prompt, options, and queue stayed intact during generation. Determinism became a zero tolerance rule. Either the graph is reproducible or it does not ship.
Observability grew up. Traces now show agent outputs plus the context of fork, join, and routing rationale. Router paths record the numeric reason. Confidence without provenance is superstition.
Service nodes got scaffolding. MemoryNode, MemoryWriterNode, RAGNode. Rough but testable. The UI learned to render them. The backend executed them consistently.
June. The cheap laptop marathon
I ran a 1000 run benchmark on a very normal machine. Acer i7 1355U, 32 GB RAM, no GPU, DeepSeek via Ollama. Two agents per run. The first asked a simple question. The second evaluated the first. The point was not intelligence. The point was stability, latency, and cost traceability.
- 1000 orchestrations completed.
- 2014 agent calls, zero failures or drift.
- Average latency per agent call a bit above seven seconds.
- Max latency around twenty three seconds.
- CPU stable at high load, RAM under six gigabytes, no swap.
- Simulated cost under one dollar.
The evaluator formatting exposed my own sloppiness. I had allowed a fuzzy return format because it was convenient. That lesson stuck. Write strict schemas. Enforce them.
I made graphs and a write up. I did not push it hard. Building felt more valuable than posting.
July. Memory with teeth
The memory model is layered and scoped. Short term and long term are not labels for a README. Entries carry ttl_seconds and a clear expires_at timestamp. The backend can run vector search with decay. That matters because memory that does not forget becomes sludge.
I added category labels, importance scores, and per node write policies. A MemoryWriterNode must justify long term writes. The rule is blunt. If an agent cannot explain why a memory needs to persist, the memory dies. Better to store less with meaning than more with mush.
August. Society inside a single flow
I built a small society loop to force structured disagreement. Four perspective agents debate: progressive, conservative, realist, purist. A cross examination node generates targeted challenges. A join node synthesizes an agreement score and a rationale. If the score crosses a threshold, the loop stops. This is not politics. It is redundancy against blind spots.
Here is what matters. The trace shows the fork that launched the four voices, the challenges created for each, the join that computed the agreement score, and the memory write that logged the rationale with TTL and expiry. Not a slide. An execution.
The uncomfortable part was seeing my own bias. My questions nudged toward early synthesis. I fixed that by moving the stop condition from a feeling to a numeric threshold and logging it. Now the system can disagree with my impatience.
September. Documentation, ego detox
A user said the docs still sounded like marketing. That sting was right. I rewrote. The YAML surface became the single source of truth. I pulled out adjectives and left agents, nodes, and runnable examples. The README had installation errors. I fixed them. I stopped auto generating markdown and wrote what the engine does and what it cannot do.
This revealed hidden magic inside the code. Anywhere a node made a decision without recording why, I added logging. Anywhere the UI hid complexity, I forced it to show the branch state and join composition. Alpha is not an excuse for opacity.
The human part, edited for signal
Goal stays the same. Build a system that stands on its own merits, independent of noise or upside. Progress is measured by backbone and reproducibility, not volume. Less attention to optics, more attention to the work itself.
Cadence needed correction. Trading sleep for velocity degrades judgment and stability. I am shifting to a sustainable development rhythm with protected recovery windows and deliberate focus blocks.
What is strong now
- Determinism. Runs are replayable with complete context. Fork and join are visible in traces. The same input and memory produce the same path.
- Memory with rules. Layers with TTL, index naming, vector search, and decay flags. Memory is policy, not mystery.
- Confidence driven routing. Routers make choices based on numbers, not vibes. Agreement loops stop by threshold and record the score.
- Observability. Meta reports show total calls, tokens, latency, and cost per agent. Planning and debugging are based on facts.
- UI that matches the engine. The visualizer shows the graph as it runs. Forks spin up. Joins reconverge. Memory is scrollable. Routing rationale is readable.
What is still weak
- Evaluation discipline. Evaluator prompts and schemas need to be strict. A JSON schema with a validator that rejects loose text is required for all evaluation agents.
- Dataset creation from traces. The traces are rich. The tooling to auto build evaluation datasets from them is still in the lab. It needs to ship with de identification and mapping from agent inputs to labels.
- Memory write policies. MemoryWriterNode is better, still too permissive. Each node needs a narrow schema and an auditable reason string for long term writes.
- Multi tenant runtime. Isolation of memory spaces and indices is halfway. Control plane needs stronger guardrails to contain blast radius.
- Public docs. The rewrite helped, but references and tutorials must catch up. Every section should include downloadable traces and exact reproduction steps.
What changed in my head
I stopped treating OrKa like a thesis and started treating it like an unglamorous piece of infrastructure. That cut anxiety. I do not need to sell philosophy in every post. I need to show that fork and join work, that memory has TTL, that routers expose their math, that traces are replayable. People can debate ideas while they build on stable primitives.
I also changed how I handle feedback. If someone says the README confuses them, I assume the doc is wrong, not the reader. I fix the doc. Less defense, more examples.
Scope control improved. The goal is a foundation for modular cognition that is testable and explainable. That does not require fifty features. It requires five that do not bend. Fork, join, router, memory node, rag node. Composition does the rest.
Wins and losses
Wins
- The orchestrator runs structured graphs with replayable traces.
- Memory is layered with TTL and decay and it is visible in execution.
- Agreement based flows converge by numeric policy and record rationale as memory.
- Meta reports expose tokens, latency, and cost per agent. Enough to plan and to keep honest.
- The marketing tone was cut from docs and replaced by YAML and traces.
Losses
- I let evaluator formatting drift because it was convenient. Fix in progress.
- I underestimated how quickly memory becomes soup without write policies. Now enforced.
- I leaned on auto generated docs for too long. Manual rewrite was overdue.
- I paid for speed with sleep. Cadence is now structured to avoid that failure mode.
Why OrKa still matters
Many AI apps are wrappers around API calls. Useful, but flat. Cognition needs composition, memory, routing, and observability that a human can read. It needs to run locally for privacy and cost control, and in the cloud for scale. That is what OrKa is trying to be. Not a brand. Not a trend. A substrate.
The plan is stubborn. Build the thing so it stands on its own. Invite critique at the trace level. Fix what is wrong without drama.
Concrete next steps
- Lock the YAML schema for service nodes and publish runnable examples with traces. Each example must include a meta report, memory index flags, TTL, and every routing decision recorded.
- Ship a dataset builder that turns a folder of traces into clean evaluation sets. Include per agent latency histograms, token distributions, and failure mode summaries.
- Harden multi tenant runtime with strict memory namespace isolation and audited keys. Log and block cross tenant access attempts by default.
- Release an agreement router that takes a distribution over eligible next nodes and routes probabilistically with a fixed seed for reproducibility. Record the seed plus the distribution.
- Keep orkacore.com as the single destination for trying OrKa. Message stays precise. YAML in. Traces out. Build explainable agentic flows. No magic. Just wiring that works.
No neat conclusion
OrKa is not done. It is finally honest. The next six months will be judged by whether the primitives get boringly reliable and whether people can build on them without talking to me. That is the right goal.
Top comments (0)