DEV Community

Mak Sò
Mak Sò

Posted on

🧠Introducing OrKa Cloud API

When One AI Agent Isn't Enough

Imagine you're building a research assistant. You ask your AI to: analyze a complex topic, remember key insights, search for related concepts, synthesize findings, and provide a comprehensive answer. You send one massive prompt to GPT-4 and... it works, sort of. But the response is unfocused, it forgets context halfway through, and there's no way to reuse the insights it discovered.

This is the challenge with monolithic AI interactions: asking one agent to do everything often means it does nothing particularly well.

Today, I'm excited to announce OrKa Cloud API – a live, production-ready service that lets you orchestrate multiple specialized AI agents into sophisticated workflows. No infrastructure required, just bring your OpenAI API key.

But more importantly, I want to show you something cool: an AI workflow that actually learns within a session, building knowledge progressively like a human researcher would. And you can test it right now.

The Orchestra Analogy

Think about a symphony orchestra. You don't have one musician playing all the instruments simultaneously. Instead, you have specialized musicians – each focused on their instrument – coordinated by a conductor who understands how they work together.

Monolithic AI is like asking one person to play violin, drums, trumpet, and piano at the same time. They might manage a simple tune, but anything complex becomes chaos.

Agent orchestration is like a proper orchestra: each agent (musician) has one clear job, and the orchestrator (conductor) coordinates their outputs into a cohesive result.

The Problem with Single-Agent Workflows

Let's be specific about what breaks when you try to do too much in one AI call:

Context Dilution

Your prompt needs to explain the task, provide context, give examples, specify output format, and handle edge cases. By token 500, the AI is drowning in instructions.

Mixed Responsibilities

"Analyze this data, identify patterns, remember the insights, search for related information, and synthesize a final report" – that's five different cognitive tasks. The AI will do all of them mediocrely instead of any of them excellently.

No Persistence

Each API call is stateless. If your AI discovers something interesting in step 1, it can't reference it in step 5 unless you paste everything back into the context (burning tokens and hitting limits).

Debugging Nightmare

When something goes wrong, you can't tell which part failed. Was it the analysis? The synthesis? The output formatting? You're debugging a black box.

Zero Reusability

That perfect prompt you crafted for analysis? You can't reuse it in another workflow without copy-pasting and adapting the entire monolith.

OrKa's Solution: Composable AI Workflows

OrKa takes a different approach: break complex tasks into specialized agents, then orchestrate them.
API endpoint: https://orka-demo-647096874165.europe-west1.run.app/api/run

Core Concepts

1. Agent Specialization
Each agent has one clear responsibility:

  • analyzer → Extract key concepts
  • memory_writer → Store insights
  • memory_reader → Search knowledge base
  • synthesizer → Combine findings

2. Sequential Orchestration
Agents execute in order, each building on previous outputs:

Agent 1 → Output A → Agent 2 → Output B → Agent 3 → Final Result
Enter fullscreen mode Exit fullscreen mode

3. Template Variables
Agents access each other's outputs dynamically:

prompt: |
  Question: {{ get_input() }}
  Previous analysis: {{ get_agent_response('analyzer') }}
  Memory context: {{ previous_outputs.memory_search }}
Enter fullscreen mode Exit fullscreen mode

4. Persistent Memory
Agents can write to and read from RedisStack, creating genuine learning:

Agent discovers insight → Write to memory → Later agent reads → Builds upon it
Enter fullscreen mode Exit fullscreen mode

5. OpenAI-Only Architecture
You bring your own OpenAI API key. No infrastructure, no model management, no GPU costs. Just pure orchestration.

The Architecture Flow

User Request
    ↓
[OrKa Orchestrator]
    ↓
Agent 1: Initial Analysis ──→ Store insights → [Redis Memory]
    ↓                                                ↓
Agent 2: Search Memory ←─────────────────────────┘
    ↓
Agent 3: Deep Analysis (using memory context)
    ↓
Agent 4: Final Synthesis
    ↓
Comprehensive Result + Cost Analysis + Trace JSON
Enter fullscreen mode Exit fullscreen mode

Each agent is a focused expert. The orchestrator ensures they work together seamlessly.

Live Demo: An AI That Learns

Let me show you something that demonstrates why this matters. I've built a workflow called Iterative Learning that mimics how humans actually research topics:

  1. First pass: Get basic understanding
  2. Store insights: Remember what you learned
  3. Search memory: Find related concepts
  4. Second pass: Go deeper with memory context
  5. Store learnings: Save the deeper insights
  6. Synthesize: Show how understanding evolved

This isn't just running multiple prompts – it's creating a learning progression where each stage builds on the last, and memory provides continuity.

The Workflow Structure

Here's the complete YAML configuration (yes, it's this simple):

# Iterative Learning Demo
# Shows AI learning within a session using memory

orchestrator:
  id: iterative-learning
  strategy: sequential
  agents:
    - initial_analyzer
    - insight_storer
    - knowledge_retriever
    - deep_analyzer
    - learning_recorder
    - final_synthesizer

agents:
  # Stage 1: Initial Understanding
  - id: initial_analyzer
    type: openai-answer
    model: gpt-4o-mini
    temperature: 0.7
    prompt: |
      Analyze this topic: {{ get_input() }}

      Provide:
      1. Core concepts (3-5 key points)
      2. Connections to related topics
      3. Areas needing deeper exploration

      Format as structured insights.

  # Stage 2: Store Insights in Memory
  - id: insight_storer
    type: memory
    operation: write
    prompt: |
      Initial analysis of: {{ get_input() }}

      Key insights:
      {{ get_agent_response('initial_analyzer') }}

  # Stage 3: Search for Related Knowledge
  - id: knowledge_retriever
    type: memory
    operation: read
    prompt: |
      Search for concepts related to:
      {{ get_agent_response('initial_analyzer') }}

  # Stage 4: Deeper Analysis with Context
  - id: deep_analyzer
    type: openai-answer
    model: gpt-4o
    temperature: 0.6
    prompt: |
      Original question: {{ get_input() }}

      Initial analysis:
      {{ get_agent_response('initial_analyzer') }}

      Related knowledge from memory:
      {{ previous_outputs.knowledge_retriever }}

      Now provide a DEEPER analysis that:
      1. Builds on the initial insights
      2. Connects to related concepts from memory
      3. Addresses the areas flagged for deeper exploration
      4. Adds new perspectives not covered initially

      Show how the analysis has evolved.

  # Stage 5: Record Advanced Insights
  - id: learning_recorder
    type: memory
    operation: write
    prompt: |
      Deep analysis of: {{ get_input() }}

      Advanced insights:
      {{ get_agent_response('deep_analyzer') }}

      Evolution from initial analysis:
      - Built upon: {{ get_agent_response('initial_analyzer') | truncate(200) }}
      - Connected with: {{ previous_outputs.knowledge_retriever | truncate(200) }}

  # Stage 6: Final Synthesis
  - id: final_synthesizer
    type: openai-answer
    model: gpt-4o-mini
    temperature: 0.4
    prompt: |
      Create a comprehensive final answer for: {{ get_input() }}

      Synthesize these learning stages:

      **Stage 1 - Initial Understanding:**
      {{ get_agent_response('initial_analyzer') }}

      **Stage 2 - Memory-Enhanced Analysis:**
      {{ get_agent_response('deep_analyzer') }}

      **Your Task:**
      1. Show how understanding evolved through the stages
      2. Present the final, most complete answer
      3. Highlight what was learned through iteration
      4. Demonstrate the value of this multi-pass approach

      Structure:
      - Evolution Summary (how thinking progressed)
      - Comprehensive Answer (synthesized knowledge)
      - Learning Insights (what the iteration revealed)
Enter fullscreen mode Exit fullscreen mode

What Makes This Powerful

Six Specialized Agents vs. One Monolith

  • initial_analyzer: Focused only on first-pass understanding
  • insight_storer: Dedicated to memory persistence
  • knowledge_retriever: Expert at semantic search
  • deep_analyzer: Deep dive with full context
  • learning_recorder: Records the learning progression
  • final_synthesizer: Brings it all together

Template Variables Chain Everything

{{ get_input() }}                              # User's question
{{ get_agent_response('initial_analyzer') }}   # Another agent's output
{{ previous_outputs.knowledge_retriever }}     # Memory search results
Enter fullscreen mode Exit fullscreen mode

Memory Creates Genuine Learning
The second analysis isn't just a different prompt – it's informed by:

  1. What the first agent discovered
  2. Related concepts from memory
  3. Explicit awareness of the learning progression

Transparent Evolution
The final synthesis doesn't just give an answer – it shows HOW the understanding evolved through the stages.

Try It Yourself (Copy & Paste Ready!)

The OrKa Cloud API is live right now. Here's how to test the Iterative Learning workflow:

Step 1: Get Your OpenAI API Key

  1. Go to https://platform.openai.com/api-keys
  2. Create a new key (starts with sk-)
  3. Copy it

Step 2: Test in Postman

Create a new POST request:

Method: POST
URL: https://orka-demo-647096874165.europe-west1.run.app/api/run
Headers:
  Content-Type: application/json
Enter fullscreen mode Exit fullscreen mode

Body (raw JSON) - COPY THIS:

{
  "input": "Explain how neural networks learn from data",
  "openai_api_key": "sk-YOUR_OPENAI_KEY_HERE",
  "yaml_config": "orchestrator:\n  id: iterative-learning\n  strategy: sequential\n  agents:\n    - initial_analyzer\n    - insight_storer\n    - knowledge_retriever\n    - deep_analyzer\n    - learning_recorder\n    - final_synthesizer\n\nagents:\n  - id: initial_analyzer\n    type: openai-answer\n    model: gpt-4o-mini\n    temperature: 0.7\n    prompt: |\n      Analyze this topic: {{ get_input() }}\n      \n      Provide:\n      1. Core concepts (3-5 key points)\n      2. Connections to related topics\n      3. Areas needing deeper exploration\n      \n      Format as structured insights.\n\n  - id: insight_storer\n    type: memory\n    operation: write\n    prompt: |\n      Initial analysis of: {{ get_input() }}\n      \n      Key insights:\n      {{ get_agent_response('initial_analyzer') }}\n\n  - id: knowledge_retriever\n    type: memory\n    operation: read\n    prompt: |\n      Search for concepts related to:\n      {{ get_agent_response('initial_analyzer') }}\n\n  - id: deep_analyzer\n    type: openai-answer\n    model: gpt-4o\n    temperature: 0.6\n    prompt: |\n      Original question: {{ get_input() }}\n      \n      Initial analysis:\n      {{ get_agent_response('initial_analyzer') }}\n      \n      Related knowledge from memory:\n      {{ previous_outputs.knowledge_retriever }}\n      \n      Now provide a DEEPER analysis that:\n      1. Builds on the initial insights\n      2. Connects to related concepts from memory\n      3. Addresses the areas flagged for deeper exploration\n      4. Adds new perspectives not covered initially\n      \n      Show how the analysis has evolved.\n\n  - id: learning_recorder\n    type: memory\n    operation: write\n    prompt: |\n      Deep analysis of: {{ get_input() }}\n      \n      Advanced insights:\n      {{ get_agent_response('deep_analyzer') }}\n      \n      Evolution from initial analysis:\n      - Built upon: {{ get_agent_response('initial_analyzer') | truncate(200) }}\n      - Connected with: {{ previous_outputs.knowledge_retriever | truncate(200) }}\n\n  - id: final_synthesizer\n    type: openai-answer\n    model: gpt-4o-mini\n    temperature: 0.4\n    prompt: |\n      Create a comprehensive final answer for: {{ get_input() }}\n      \n      Synthesize these learning stages:\n      \n      **Stage 1 - Initial Understanding:**\n      {{ get_agent_response('initial_analyzer') }}\n      \n      **Stage 2 - Memory-Enhanced Analysis:**\n      {{ get_agent_response('deep_analyzer') }}\n      \n      **Your Task:**\n      1. Show how understanding evolved through the stages\n      2. Present the final, most complete answer\n      3. Highlight what was learned through iteration\n      4. Demonstrate the value of this multi-pass approach\n      \n      Structure:\n      - Evolution Summary (how thinking progressed)\n      - Comprehensive Answer (synthesized knowledge)\n      - Learning Insights (what the iteration revealed)"
}
Enter fullscreen mode Exit fullscreen mode

Important: Replace sk-YOUR_OPENAI_KEY_HERE with your actual OpenAI API key!

Step 3: Send and Explore

Hit Send and watch the magic happen. You'll get back:

{
  "run_id": "abc-123-def-456",
  "input": "Explain how neural networks learn from data",
  "execution_log": {
    "events": [...],  // Each agent's execution
    "blob_store": {...},  // Actual responses
    "cost_analysis": {
      "summary": {
        "total_tokens": 2847,
        "total_cost_usd": 0.0043,
        "models_used": ["gpt-4o-mini", "gpt-4o"]
      }
    }
  },
  "log_file_url": "/api/logs/abc-123-def-456",
  "timestamp": "2025-10-13T22:30:00"
}
Enter fullscreen mode Exit fullscreen mode

Step 4: Download the Full Trace

GET: https://orka-demo-647096874165.europe-west1.run.app/api/logs/abc-123-def-456
Enter fullscreen mode Exit fullscreen mode

This gives you the complete execution trace as JSON, showing exactly how each agent contributed.

Try Different Questions

Scientific Concepts:

"Explain quantum entanglement"
"How does CRISPR gene editing work?"
"What causes ocean currents?"
Enter fullscreen mode Exit fullscreen mode

Technical Topics:

"Explain how distributed consensus works in blockchain"
"What are transformer models in AI?"
"How does public key cryptography secure communications?"
Enter fullscreen mode Exit fullscreen mode

Complex Ideas:

"What is emergence in complex systems?"
"How do market bubbles form and burst?"
"What causes social movements to succeed or fail?"
Enter fullscreen mode Exit fullscreen mode

Watch how the workflow:

  1. Gets initial understanding
  2. Stores insights in memory
  3. Searches for related concepts
  4. Goes deeper with memory context
  5. Shows the learning progression

Key Benefits Demonstrated

1. Task Segmentation Creates Clarity

Instead of one 1000-token mega-prompt trying to do everything, you have six focused prompts:

  • initial_analyzer: 150 tokens → clear first-pass analysis
  • deep_analyzer: 200 tokens → focused deep dive with context
  • final_synthesizer: 180 tokens → synthesis with clear objectives

Each agent does one thing excellently instead of many things adequately.

2. Orchestration Enables Emergence

The final output isn't just the sum of six AI calls – it's genuinely better because:

  • Each agent builds on previous insights
  • Memory provides cross-stage continuity
  • The synthesis explicitly shows the learning evolution

You get emergent intelligence from specialized components working together.

3. Memory Makes It Real

This isn't simulated learning by passing text around. The workflow:

  • Writes actual insights to RedisStack
  • Performs vector similarity search
  • Retrieves semantically related concepts
  • Uses that context to inform deeper analysis

If you run it twice on related topics, the second run benefits from the first's insights.

4. Debugging and Iteration

When something doesn't work quite right, you can:

  • Check each agent's output individually
  • Adjust one prompt without touching others
  • See exactly where the flow breaks
  • Reuse working agents in other workflows

Compare this to debugging a monolithic prompt where you have no idea which part is failing.

5. Cost Efficiency at Scale

Infrastructure costs: ~$42/month at 50% uptime

  • 4GB RAM, 2 vCPU
  • Scales to zero when idle
  • No GPU needed
  • No model management

API costs: ~$0.01-0.03 per workflow execution

  • gpt-4o-mini: $0.15 per 1M input tokens
  • gpt-4o: $2.50 per 1M input tokens
  • You control which models to use

Total: Pay only for what you use, when you use it.

Real-World Applications

This pattern works for any complex cognitive task:

Research & Analysis

Question → Search papers → Extract findings → Cross-reference → 
Synthesize → Store insights → Final report
Enter fullscreen mode Exit fullscreen mode

Content Generation

Topic → Outline → Research → Draft → Critique → 
Revise → Polish → Final content
Enter fullscreen mode Exit fullscreen mode

Decision Support

Problem → Classify → Gather data → Analyze options → 
Evaluate risks → Recommend → Justify decision
Enter fullscreen mode Exit fullscreen mode

Learning & Education

Concept → Explain → Test understanding → Identify gaps → 
Re-explain → Practice → Assess mastery
Enter fullscreen mode Exit fullscreen mode

Code Review

Code → Parse → Analyze logic → Check patterns → 
Security review → Suggest improvements → Final assessment
Enter fullscreen mode Exit fullscreen mode

Each becomes a sequence of specialized agents rather than one overwhelming prompt.

Under the Hood

The Technology Stack

OrKa Core: Python-based orchestration engine

  • Jinja2 templating for dynamic prompts
  • Async execution for performance
  • Comprehensive error handling
  • Cost tracking per agent

Memory Backend: RedisStack

  • Hash storage for structured data
  • Vector search for semantic retrieval
  • Fast in-memory operations
  • Automatic cleanup

Cloud Infrastructure: Google Cloud Run

  • Serverless containers
  • Auto-scaling (0-20 instances)
  • <30 second cold starts
  • Built-in load balancing

AI Provider: OpenAI

  • User-provided API keys
  • Support for all GPT models
  • Per-request key handling
  • Automatic cleanup

Request Flow

  1. API receives request (input + YAML + API key)
  2. Parse YAML into agent definitions
  3. Set environment variable for OpenAI key (per-request)
  4. Execute orchestrator sequentially
  5. Each agent:
    • Renders prompt template with context
    • Calls OpenAI or memory backend
    • Returns structured result
    • Updates execution trace
  6. Cleanup API key from environment
  7. Return complete execution log + cost analysis
  8. Store trace for download (24h retention)

Security & Rate Limiting

  • API keys: Never stored, used only for that request
  • Rate limiting: 5 requests/minute per IP (configurable)
  • Input validation: YAML size limit (100KB)
  • Execution timeout: 5 minutes maximum
  • Memory isolation: Each run gets clean Redis namespace

Comparison: Monolithic vs. Orchestrated

Let me show you the difference with a real example.

Monolithic Approach

prompt = """
You are an AI research assistant. Analyze the topic of quantum entanglement.

First, provide a basic explanation of the key concepts.
Then, search your knowledge for related physics concepts.
Next, provide a deeper analysis that connects these concepts.
Finally, synthesize everything into a comprehensive answer showing how your understanding evolved.

Format your response with clear sections for each stage.
"""

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}]
)
Enter fullscreen mode Exit fullscreen mode

Problems:

  • ✗ No actual memory search (just pretends)
  • ✗ Context dilution from cramming everything into one prompt
  • ✗ Can't debug which "stage" failed
  • ✗ Can't reuse parts in other workflows
  • ✗ Output quality suffers from trying to do too much

Orchestrated Approach

orchestrator:
  strategy: sequential
  agents: [analyze, store, search, deep_analyze, synthesize]

# Each agent with focused responsibility
# Memory operations are real (RedisStack)
# Template variables chain outputs
# Can debug/improve each stage independently
# Agents reusable across workflows
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • ✓ Real memory operations with persistence
  • ✓ Each agent focused and excellent
  • ✓ Clear execution trace shows what happened
  • ✓ Agents composable and reusable
  • ✓ Higher quality from specialized components

The orchestrated approach isn't just more elegant – it produces measurably better results.

Getting Started: Three Levels

Level 1: Use the Cloud API (5 minutes)

  1. Get OpenAI API key
  2. Copy the Postman request above
  3. Replace the API key
  4. Send request
  5. Explore the results

No setup, no installation, just test it.

Level 2: Create Custom Workflows (30 minutes)

  1. Study the iterative_learning_demo.yml structure
  2. Identify your task's stages
  3. Design specialized agents
  4. Connect them with template variables
  5. Test via Cloud API

Example starter workflow:

orchestrator:
  id: my-workflow
  agents: [step1, step2, step3]

agents:
  - id: step1
    type: openai-answer
    model: gpt-4o-mini
    prompt: "Your task: {{ get_input() }}"

  - id: step2
    type: openai-answer
    model: gpt-4o-mini
    prompt: "Build on: {{ get_agent_response('step1') }}"

  - id: step3
    type: openai-answer
    model: gpt-4o-mini
    prompt: "Synthesize: {{ get_agent_response('step2') }}"
Enter fullscreen mode Exit fullscreen mode

Level 3: Deploy Your Own Instance (1 hour)

  1. Clone the repo: github.com/marcosomma/orka-reasoning
  2. Follow PRIVATE/OPENAI_DEPLOYMENT.md
  3. Deploy to Google Cloud Run
  4. Custom domain, scaling, monitoring

Full control, your infrastructure, your costs.

The Future: Agent Networks

This is just the beginning. Imagine:

Parallel Execution

Question → [Agent A, Agent B, Agent C] → Synthesizer → Answer
Enter fullscreen mode Exit fullscreen mode

Conditional Routing

Classify → if scientific: Science Agents
         → if technical: Tech Agents
         → if business: Business Agents
Enter fullscreen mode Exit fullscreen mode

Recursive Workflows

Analyze → Score → if score < 0.8: Analyze again with feedback
                → if score >= 0.8: Continue
Enter fullscreen mode Exit fullscreen mode

Multi-Model Ensembles

Same question → [GPT-4, Claude, Gemini] → Compare → Best answer
Enter fullscreen mode Exit fullscreen mode

Human-in-the-Loop

Draft → Human review → Incorporate feedback → Refine → Iterate
Enter fullscreen mode Exit fullscreen mode

All of these patterns are possible with orchestration. The composability is endless.

Why This Matters

We're moving from monolithic AI interactions to composable agent systems. This shift mirrors software engineering's evolution:

  • 1970s: Monolithic programs
  • 1990s: Object-oriented design
  • 2000s: Microservices
  • 2020s: AI agent orchestration

Just as microservices made software more maintainable, scalable, and reliable – agent orchestration does the same for AI workflows.

OrKa makes this accessible today. No ML expertise needed, no infrastructure required, just:

  1. Define your agents in YAML
  2. Orchestrate them
  3. Let them work together

Try It Now

Cloud API: https://orka-demo-647096874165.europe-west1.run.app

GitHub: github.com/marcosomma/orka-reasoning

Examples: Browse /examples directory for more workflows

Postman Collection: Copy the request above and start experimenting

Questions & Community

Cost concerns? The demo workflow costs $0.01-0.03 per run. Infrastructure is ~$42/month at 50% uptime, scales to zero when idle.

Production ready? Yes. The Cloud API is live and stable. Rate-limited for public demo, but you can deploy your own instance for production use.

Other AI providers? Currently OpenAI-only for the cloud API. Local deployment supports any OpenAI-compatible API (including local models via Ollama).

Memory persistence? Session-scoped by default. Insights persist across agents within a workflow run. Cross-run persistence configurable.

Need help? Open an issue on GitHub or check the documentation.


Conclusion

AI orchestration isn't about making things more complicated – it's about making complex things manageable.

One specialized agent can't do everything well. But six specialized agents, properly orchestrated, can create genuine emergent intelligence.

The Iterative Learning demo shows this concretely: an AI that doesn't just answer questions, but builds understanding progressively, stores insights, retrieves context, and shows you how its thinking evolved.

This is possible today. The API is live. The code is open source. The examples are ready to run.

Break your monolithic prompts into composable agents. Orchestrate them. Watch the magic happen.


OrKa (Orchestration Kit for AI) is an open-source project focused on making sophisticated AI workflows accessible to everyone. No ML expertise required, no infrastructure needed – just bring your curiosity and an OpenAI API key.

Repository: github.com/marcosomma/orka-reasoning

*License: Apache 2.0

Questions? Open an issue or contribute to the project.

Top comments (0)