Mak Sò

Posted on Oct 14

🧠Introducing OrKa Cloud API

#ai #opensource #machinelearning #mlops

When One AI Agent Isn't Enough

Imagine you're building a research assistant. You ask your AI to: analyze a complex topic, remember key insights, search for related concepts, synthesize findings, and provide a comprehensive answer. You send one massive prompt to GPT-4 and... it works, sort of. But the response is unfocused, it forgets context halfway through, and there's no way to reuse the insights it discovered.

This is the challenge with monolithic AI interactions: asking one agent to do everything often means it does nothing particularly well.

Today, I'm excited to announce OrKa Cloud API – a live, production-ready service that lets you orchestrate multiple specialized AI agents into sophisticated workflows. No infrastructure required, just bring your OpenAI API key.

But more importantly, I want to show you something cool: an AI workflow that actually learns within a session, building knowledge progressively like a human researcher would. And you can test it right now.

The Orchestra Analogy

Think about a symphony orchestra. You don't have one musician playing all the instruments simultaneously. Instead, you have specialized musicians – each focused on their instrument – coordinated by a conductor who understands how they work together.

Monolithic AI is like asking one person to play violin, drums, trumpet, and piano at the same time. They might manage a simple tune, but anything complex becomes chaos.

Agent orchestration is like a proper orchestra: each agent (musician) has one clear job, and the orchestrator (conductor) coordinates their outputs into a cohesive result.

The Problem with Single-Agent Workflows

Let's be specific about what breaks when you try to do too much in one AI call:

Context Dilution

Your prompt needs to explain the task, provide context, give examples, specify output format, and handle edge cases. By token 500, the AI is drowning in instructions.

Mixed Responsibilities

"Analyze this data, identify patterns, remember the insights, search for related information, and synthesize a final report" – that's five different cognitive tasks. The AI will do all of them mediocrely instead of any of them excellently.

No Persistence

Each API call is stateless. If your AI discovers something interesting in step 1, it can't reference it in step 5 unless you paste everything back into the context (burning tokens and hitting limits).

Debugging Nightmare

When something goes wrong, you can't tell which part failed. Was it the analysis? The synthesis? The output formatting? You're debugging a black box.

Zero Reusability

That perfect prompt you crafted for analysis? You can't reuse it in another workflow without copy-pasting and adapting the entire monolith.

OrKa's Solution: Composable AI Workflows

OrKa takes a different approach: break complex tasks into specialized agents, then orchestrate them.
API endpoint: https://orka-demo-647096874165.europe-west1.run.app/api/run

Core Concepts

1. Agent Specialization
Each agent has one clear responsibility:

analyzer → Extract key concepts
memory_writer → Store insights
memory_reader → Search knowledge base
synthesizer → Combine findings

2. Sequential Orchestration
Agents execute in order, each building on previous outputs:

Agent 1 → Output A → Agent 2 → Output B → Agent 3 → Final Result

3. Template Variables
Agents access each other's outputs dynamically:

prompt: |
  Question: {{ get_input() }}
  Previous analysis: {{ get_agent_response('analyzer') }}
  Memory context: {{ previous_outputs.memory_search }}

4. Persistent Memory
Agents can write to and read from RedisStack, creating genuine learning:

Agent discovers insight → Write to memory → Later agent reads → Builds upon it

5. OpenAI-Only Architecture
You bring your own OpenAI API key. No infrastructure, no model management, no GPU costs. Just pure orchestration.

The Architecture Flow

User Request
    ↓
[OrKa Orchestrator]
    ↓
Agent 1: Initial Analysis ──→ Store insights → [Redis Memory]
    ↓                                                ↓
Agent 2: Search Memory ←─────────────────────────┘
    ↓
Agent 3: Deep Analysis (using memory context)
    ↓
Agent 4: Final Synthesis
    ↓
Comprehensive Result + Cost Analysis + Trace JSON

Each agent is a focused expert. The orchestrator ensures they work together seamlessly.

Live Demo: An AI That Learns

Let me show you something that demonstrates why this matters. I've built a workflow called Iterative Learning that mimics how humans actually research topics:

First pass: Get basic understanding
Store insights: Remember what you learned
Search memory: Find related concepts
Second pass: Go deeper with memory context
Store learnings: Save the deeper insights
Synthesize: Show how understanding evolved

This isn't just running multiple prompts – it's creating a learning progression where each stage builds on the last, and memory provides continuity.

The Workflow Structure

Here's the complete YAML configuration (yes, it's this simple):

# Iterative Learning Demo
# Shows AI learning within a session using memory

orchestrator:
  id: iterative-learning
  strategy: sequential
  agents:
    - initial_analyzer
    - insight_storer
    - knowledge_retriever
    - deep_analyzer
    - learning_recorder
    - final_synthesizer

agents:
  # Stage 1: Initial Understanding
  - id: initial_analyzer
    type: openai-answer
    model: gpt-4o-mini
    temperature: 0.7
    prompt: |
      Analyze this topic: {{ get_input() }}

      Provide:
      1. Core concepts (3-5 key points)
      2. Connections to related topics
      3. Areas needing deeper exploration

      Format as structured insights.

  # Stage 2: Store Insights in Memory
  - id: insight_storer
    type: memory
    operation: write
    prompt: |
      Initial analysis of: {{ get_input() }}

      Key insights:
      {{ get_agent_response('initial_analyzer') }}

  # Stage 3: Search for Related Knowledge
  - id: knowledge_retriever
    type: memory
    operation: read
    prompt: |
      Search for concepts related to:
      {{ get_agent_response('initial_analyzer') }}

  # Stage 4: Deeper Analysis with Context
  - id: deep_analyzer
    type: openai-answer
    model: gpt-4o
    temperature: 0.6
    prompt: |
      Original question: {{ get_input() }}

      Initial analysis:
      {{ get_agent_response('initial_analyzer') }}

      Related knowledge from memory:
      {{ previous_outputs.knowledge_retriever }}

      Now provide a DEEPER analysis that:
      1. Builds on the initial insights
      2. Connects to related concepts from memory
      3. Addresses the areas flagged for deeper exploration
      4. Adds new perspectives not covered initially

      Show how the analysis has evolved.

  # Stage 5: Record Advanced Insights
  - id: learning_recorder
    type: memory
    operation: write
    prompt: |
      Deep analysis of: {{ get_input() }}

      Advanced insights:
      {{ get_agent_response('deep_analyzer') }}

      Evolution from initial analysis:
      - Built upon: {{ get_agent_response('initial_analyzer') | truncate(200) }}
      - Connected with: {{ previous_outputs.knowledge_retriever | truncate(200) }}

  # Stage 6: Final Synthesis
  - id: final_synthesizer
    type: openai-answer
    model: gpt-4o-mini
    temperature: 0.4
    prompt: |
      Create a comprehensive final answer for: {{ get_input() }}

      Synthesize these learning stages:

      **Stage 1 - Initial Understanding:**
      {{ get_agent_response('initial_analyzer') }}

      **Stage 2 - Memory-Enhanced Analysis:**
      {{ get_agent_response('deep_analyzer') }}

      **Your Task:**
      1. Show how understanding evolved through the stages
      2. Present the final, most complete answer
      3. Highlight what was learned through iteration
      4. Demonstrate the value of this multi-pass approach

      Structure:
      - Evolution Summary (how thinking progressed)
      - Comprehensive Answer (synthesized knowledge)
      - Learning Insights (what the iteration revealed)

What Makes This Powerful

Six Specialized Agents vs. One Monolith

initial_analyzer: Focused only on first-pass understanding
insight_storer: Dedicated to memory persistence
knowledge_retriever: Expert at semantic search
deep_analyzer: Deep dive with full context
learning_recorder: Records the learning progression
final_synthesizer: Brings it all together

Template Variables Chain Everything

{{ get_input() }}                              # User's question
{{ get_agent_response('initial_analyzer') }}   # Another agent's output
{{ previous_outputs.knowledge_retriever }}     # Memory search results

Memory Creates Genuine Learning
The second analysis isn't just a different prompt – it's informed by:

What the first agent discovered
Related concepts from memory
Explicit awareness of the learning progression

Transparent Evolution
The final synthesis doesn't just give an answer – it shows HOW the understanding evolved through the stages.

Try It Yourself (Copy & Paste Ready!)

The OrKa Cloud API is live right now. Here's how to test the Iterative Learning workflow:

Step 1: Get Your OpenAI API Key

Go to https://platform.openai.com/api-keys
Create a new key (starts with sk-)
Copy it

Step 2: Test in Postman

Create a new POST request:

Method: POST
URL: https://orka-demo-647096874165.europe-west1.run.app/api/run
Headers:
  Content-Type: application/json

Body (raw JSON) - COPY THIS:

{
  "input": "Explain how neural networks learn from data",
  "openai_api_key": "sk-YOUR_OPENAI_KEY_HERE",
  "yaml_config": "orchestrator:\n  id: iterative-learning\n  strategy: sequential\n  agents:\n    - initial_analyzer\n    - insight_storer\n    - knowledge_retriever\n    - deep_analyzer\n    - learning_recorder\n    - final_synthesizer\n\nagents:\n  - id: initial_analyzer\n    type: openai-answer\n    model: gpt-4o-mini\n    temperature: 0.7\n    prompt: |\n      Analyze this topic: {{ get_input() }}\n      \n      Provide:\n      1. Core concepts (3-5 key points)\n      2. Connections to related topics\n      3. Areas needing deeper exploration\n      \n      Format as structured insights.\n\n  - id: insight_storer\n    type: memory\n    operation: write\n    prompt: |\n      Initial analysis of: {{ get_input() }}\n      \n      Key insights:\n      {{ get_agent_response('initial_analyzer') }}\n\n  - id: knowledge_retriever\n    type: memory\n    operation: read\n    prompt: |\n      Search for concepts related to:\n      {{ get_agent_response('initial_analyzer') }}\n\n  - id: deep_analyzer\n    type: openai-answer\n    model: gpt-4o\n    temperature: 0.6\n    prompt: |\n      Original question: {{ get_input() }}\n      \n      Initial analysis:\n      {{ get_agent_response('initial_analyzer') }}\n      \n      Related knowledge from memory:\n      {{ previous_outputs.knowledge_retriever }}\n      \n      Now provide a DEEPER analysis that:\n      1. Builds on the initial insights\n      2. Connects to related concepts from memory\n      3. Addresses the areas flagged for deeper exploration\n      4. Adds new perspectives not covered initially\n      \n      Show how the analysis has evolved.\n\n  - id: learning_recorder\n    type: memory\n    operation: write\n    prompt: |\n      Deep analysis of: {{ get_input() }}\n      \n      Advanced insights:\n      {{ get_agent_response('deep_analyzer') }}\n      \n      Evolution from initial analysis:\n      - Built upon: {{ get_agent_response('initial_analyzer') | truncate(200) }}\n      - Connected with: {{ previous_outputs.knowledge_retriever | truncate(200) }}\n\n  - id: final_synthesizer\n    type: openai-answer\n    model: gpt-4o-mini\n    temperature: 0.4\n    prompt: |\n      Create a comprehensive final answer for: {{ get_input() }}\n      \n      Synthesize these learning stages:\n      \n      **Stage 1 - Initial Understanding:**\n      {{ get_agent_response('initial_analyzer') }}\n      \n      **Stage 2 - Memory-Enhanced Analysis:**\n      {{ get_agent_response('deep_analyzer') }}\n      \n      **Your Task:**\n      1. Show how understanding evolved through the stages\n      2. Present the final, most complete answer\n      3. Highlight what was learned through iteration\n      4. Demonstrate the value of this multi-pass approach\n      \n      Structure:\n      - Evolution Summary (how thinking progressed)\n      - Comprehensive Answer (synthesized knowledge)\n      - Learning Insights (what the iteration revealed)"
}

Important: Replace sk-YOUR_OPENAI_KEY_HERE with your actual OpenAI API key!

Step 3: Send and Explore

Hit Send and watch the magic happen. You'll get back:

{
  "run_id": "abc-123-def-456",
  "input": "Explain how neural networks learn from data",
  "execution_log": {
    "events": [...],  // Each agent's execution
    "blob_store": {...},  // Actual responses
    "cost_analysis": {
      "summary": {
        "total_tokens": 2847,
        "total_cost_usd": 0.0043,
        "models_used": ["gpt-4o-mini", "gpt-4o"]
      }
    }
  },
  "log_file_url": "/api/logs/abc-123-def-456",
  "timestamp": "2025-10-13T22:30:00"
}

Step 4: Download the Full Trace

GET: https://orka-demo-647096874165.europe-west1.run.app/api/logs/abc-123-def-456

This gives you the complete execution trace as JSON, showing exactly how each agent contributed.

Try Different Questions

Scientific Concepts:

"Explain quantum entanglement"
"How does CRISPR gene editing work?"
"What causes ocean currents?"

Technical Topics:

"Explain how distributed consensus works in blockchain"
"What are transformer models in AI?"
"How does public key cryptography secure communications?"

Complex Ideas:

"What is emergence in complex systems?"
"How do market bubbles form and burst?"
"What causes social movements to succeed or fail?"

Watch how the workflow:

Gets initial understanding
Stores insights in memory
Searches for related concepts
Goes deeper with memory context
Shows the learning progression

Key Benefits Demonstrated

1. Task Segmentation Creates Clarity

Instead of one 1000-token mega-prompt trying to do everything, you have six focused prompts:

initial_analyzer: 150 tokens → clear first-pass analysis
deep_analyzer: 200 tokens → focused deep dive with context
final_synthesizer: 180 tokens → synthesis with clear objectives

Each agent does one thing excellently instead of many things adequately.

2. Orchestration Enables Emergence

The final output isn't just the sum of six AI calls – it's genuinely better because:

Each agent builds on previous insights
Memory provides cross-stage continuity
The synthesis explicitly shows the learning evolution

You get emergent intelligence from specialized components working together.

3. Memory Makes It Real

This isn't simulated learning by passing text around. The workflow:

Writes actual insights to RedisStack
Performs vector similarity search
Retrieves semantically related concepts
Uses that context to inform deeper analysis

If you run it twice on related topics, the second run benefits from the first's insights.

4. Debugging and Iteration

When something doesn't work quite right, you can:

Check each agent's output individually
Adjust one prompt without touching others
See exactly where the flow breaks
Reuse working agents in other workflows

Compare this to debugging a monolithic prompt where you have no idea which part is failing.

5. Cost Efficiency at Scale

Infrastructure costs: ~$42/month at 50% uptime

4GB RAM, 2 vCPU
Scales to zero when idle
No GPU needed
No model management

API costs: ~$0.01-0.03 per workflow execution

gpt-4o-mini: $0.15 per 1M input tokens
gpt-4o: $2.50 per 1M input tokens
You control which models to use

Total: Pay only for what you use, when you use it.

Real-World Applications

This pattern works for any complex cognitive task:

Research & Analysis

Question → Search papers → Extract findings → Cross-reference → 
Synthesize → Store insights → Final report

Content Generation

Topic → Outline → Research → Draft → Critique → 
Revise → Polish → Final content

Decision Support

Problem → Classify → Gather data → Analyze options → 
Evaluate risks → Recommend → Justify decision

Learning & Education

Concept → Explain → Test understanding → Identify gaps → 
Re-explain → Practice → Assess mastery

Code Review

Code → Parse → Analyze logic → Check patterns → 
Security review → Suggest improvements → Final assessment

Each becomes a sequence of specialized agents rather than one overwhelming prompt.

Under the Hood

The Technology Stack

OrKa Core: Python-based orchestration engine

Jinja2 templating for dynamic prompts
Async execution for performance
Comprehensive error handling
Cost tracking per agent

Memory Backend: RedisStack

Hash storage for structured data
Vector search for semantic retrieval
Fast in-memory operations
Automatic cleanup

Cloud Infrastructure: Google Cloud Run

Serverless containers
Auto-scaling (0-20 instances)
<30 second cold starts
Built-in load balancing

AI Provider: OpenAI

User-provided API keys
Support for all GPT models
Per-request key handling
Automatic cleanup

Request Flow

API receives request (input + YAML + API key)
Parse YAML into agent definitions
Set environment variable for OpenAI key (per-request)
Execute orchestrator sequentially
Each agent:
- Renders prompt template with context
- Calls OpenAI or memory backend
- Returns structured result
- Updates execution trace
Cleanup API key from environment
Return complete execution log + cost analysis
Store trace for download (24h retention)

Security & Rate Limiting

API keys: Never stored, used only for that request
Rate limiting: 5 requests/minute per IP (configurable)
Input validation: YAML size limit (100KB)
Execution timeout: 5 minutes maximum
Memory isolation: Each run gets clean Redis namespace

Comparison: Monolithic vs. Orchestrated

Let me show you the difference with a real example.

Monolithic Approach

prompt = """
You are an AI research assistant. Analyze the topic of quantum entanglement.

First, provide a basic explanation of the key concepts.
Then, search your knowledge for related physics concepts.
Next, provide a deeper analysis that connects these concepts.
Finally, synthesize everything into a comprehensive answer showing how your understanding evolved.

Format your response with clear sections for each stage.
"""

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}]
)

Problems:

✗ No actual memory search (just pretends)
✗ Context dilution from cramming everything into one prompt
✗ Can't debug which "stage" failed
✗ Can't reuse parts in other workflows
✗ Output quality suffers from trying to do too much

Orchestrated Approach

orchestrator:
  strategy: sequential
  agents: [analyze, store, search, deep_analyze, synthesize]

# Each agent with focused responsibility
# Memory operations are real (RedisStack)
# Template variables chain outputs
# Can debug/improve each stage independently
# Agents reusable across workflows

Benefits:

✓ Real memory operations with persistence
✓ Each agent focused and excellent
✓ Clear execution trace shows what happened
✓ Agents composable and reusable
✓ Higher quality from specialized components

The orchestrated approach isn't just more elegant – it produces measurably better results.

Getting Started: Three Levels

Level 1: Use the Cloud API (5 minutes)

Get OpenAI API key
Copy the Postman request above
Replace the API key
Send request
Explore the results

No setup, no installation, just test it.

Level 2: Create Custom Workflows (30 minutes)

Study the iterative_learning_demo.yml structure
Identify your task's stages
Design specialized agents
Connect them with template variables
Test via Cloud API

Example starter workflow:

orchestrator:
  id: my-workflow
  agents: [step1, step2, step3]

agents:
  - id: step1
    type: openai-answer
    model: gpt-4o-mini
    prompt: "Your task: {{ get_input() }}"

  - id: step2
    type: openai-answer
    model: gpt-4o-mini
    prompt: "Build on: {{ get_agent_response('step1') }}"

  - id: step3
    type: openai-answer
    model: gpt-4o-mini
    prompt: "Synthesize: {{ get_agent_response('step2') }}"

Level 3: Deploy Your Own Instance (1 hour)

Clone the repo: github.com/marcosomma/orka-reasoning
Follow PRIVATE/OPENAI_DEPLOYMENT.md
Deploy to Google Cloud Run
Custom domain, scaling, monitoring

Full control, your infrastructure, your costs.

The Future: Agent Networks

This is just the beginning. Imagine:

Parallel Execution

Question → [Agent A, Agent B, Agent C] → Synthesizer → Answer

Conditional Routing

Classify → if scientific: Science Agents
         → if technical: Tech Agents
         → if business: Business Agents

Recursive Workflows

Analyze → Score → if score < 0.8: Analyze again with feedback
                → if score >= 0.8: Continue

Multi-Model Ensembles

Same question → [GPT-4, Claude, Gemini] → Compare → Best answer

Human-in-the-Loop

Draft → Human review → Incorporate feedback → Refine → Iterate

All of these patterns are possible with orchestration. The composability is endless.

Why This Matters

We're moving from monolithic AI interactions to composable agent systems. This shift mirrors software engineering's evolution:

1970s: Monolithic programs
1990s: Object-oriented design
2000s: Microservices
2020s: AI agent orchestration

Just as microservices made software more maintainable, scalable, and reliable – agent orchestration does the same for AI workflows.

OrKa makes this accessible today. No ML expertise needed, no infrastructure required, just:

Define your agents in YAML
Orchestrate them
Let them work together

Try It Now

Cloud API: https://orka-demo-647096874165.europe-west1.run.app

GitHub: github.com/marcosomma/orka-reasoning

Examples: Browse /examples directory for more workflows

Postman Collection: Copy the request above and start experimenting

Questions & Community

Cost concerns? The demo workflow costs $0.01-0.03 per run. Infrastructure is ~$42/month at 50% uptime, scales to zero when idle.

Production ready? Yes. The Cloud API is live and stable. Rate-limited for public demo, but you can deploy your own instance for production use.

Other AI providers? Currently OpenAI-only for the cloud API. Local deployment supports any OpenAI-compatible API (including local models via Ollama).

Memory persistence? Session-scoped by default. Insights persist across agents within a workflow run. Cross-run persistence configurable.

Need help? Open an issue on GitHub or check the documentation.

Conclusion

AI orchestration isn't about making things more complicated – it's about making complex things manageable.

One specialized agent can't do everything well. But six specialized agents, properly orchestrated, can create genuine emergent intelligence.

The Iterative Learning demo shows this concretely: an AI that doesn't just answer questions, but builds understanding progressively, stores insights, retrieves context, and shows you how its thinking evolved.

This is possible today. The API is live. The code is open source. The examples are ready to run.

Break your monolithic prompts into composable agents. Orchestrate them. Watch the magic happen.

OrKa (Orchestration Kit for AI) is an open-source project focused on making sophisticated AI workflows accessible to everyone. No ML expertise required, no infrastructure needed – just bring your curiosity and an OpenAI API key.

Repository: github.com/marcosomma/orka-reasoning

*License: Apache 2.0
Questions? Open an issue or contribute to the project.