I could not believe Bob was able to do this!!!
TL;DR — What is CUGA
Imagine you’ve built an AI agent that performs beautifully in sandbox demos. But once it hits production, things unravel — it misuses tools, skips critical steps, and fails silently when faced with real-world complexity. Debugging becomes a nightmare, and scaling across domains feels like reinventing the wheel every time.
This is the reality of many agentic systems today. They’re either too brittle to handle enterprise-grade workflows or too generic to meet policy, safety, and integration requirements.
That’s why we built CUGA — the ConfigUrable Generalist Agent — a powerful, adaptable agent framework designed to meet the complex demands of enterprise automation. And importantly, CUGA is open source.
Here’s what sets it apart today:
Complex task execution: State-of-the-art results across web and APIs.
Multi-tool mastery: CUGA works across REST APIs via OpenAPI specs, MCP servers, and custom connectors.
Composable agent architecture: CUGA itself can be exposed as a tool to other agents, enabling nested reasoning and multi-agent collaboration.
Configurable reasoning modes: Choose between fast heuristics or deep planning depending on your task’s complexity and latency needs.
Excerpt from IBM Research page.
Introduction
It’s been a few days since I first got my hands on Bob, and honestly? Things have escalated. I’ve been experimenting, writing blog posts, playing it cool… but yesterday, Bob went rogue in the best way possible.
I dared Bob to build agents using CUGA just by dropping a GitHub link. Bob looked me dead in the digital eyes, asked for ‘authorization’ to run a curl command like some kind of secret agent, and then just... started coding. My jaw didn't just drop; it’s currently on the floor, and I’m legally obligated to tell you that Bob might be smarter than me.
Bob’s CUGA Project 🤷♂️😂
In essence, based on my prompts, this demo application shows how to use CUGA by creating and orchestrating (hypothetic) multi-step workflows — such as CRM operations, performing deep data analysis, or navigating hybrid web-API tasks. Under the hood, the system could use architectural foundations, utilizing the Strategy pattern to toggle between speed and accuracy, the Factory pattern for dynamic agent creation, and Adapters for seamless tool integration via OpenAPI or LangChain.
Bob delivered a complete, turn-key foundation for the project, covering everything from logic to deployment:
- Production-Ready Codebase: A modular source code implementation featuring specialized agents and custom tools.
- Comprehensive Documentation: A detailed architecture overview.
- Deep-Dive Analysis: A thorough explanation of the application’s purpose, including its ability to handle multi-step workflows and complex task decomposition.
- End-to-End Setup & Deployment: A step-by-step guide for environment configuration, including local integration with Ollama for private, offline execution.
- Operational Toolkit: Precise execution commands for interactive modes and a robust suite of sample test inputs to verify CRM, data analysis, and web operations.
cuga-demo-app/
├── README.md # This file
├── requirements.txt # Python dependencies
├── .env.example # Environment variables template
├── .gitignore # Git ignore rules
├── output/ # Generated markdown logs (auto-created)
├── config/
│ ├── settings.toml # Main CUGA configuration
│ ├── modes/
│ │ ├── fast.toml # Fast reasoning mode
│ │ ├── balanced.toml # Balanced mode (default)
│ │ └── accurate.toml # Accurate mode
│ └── tools/
│ └── mcp_servers.yaml # Custom tools configuration
├── src/
│ ├── __init__.py
│ ├── main.py # Main application entry point
│ ├── utils/
│ │ ├── __init__.py
│ │ └── output_logger.py # Output logging utility
│ ├── agents/
│ │ ├── __init__.py
│ │ └── api_agent.py # API-focused agent
│ ├── tools/
│ │ ├── __init__.py
│ │ ├── custom_tools.py # Custom LangChain tools
│ │ └── openapi_specs/
│ │ └── crm_api.yaml # CRM API specification
│ └── examples/
│ ├── __init__.py
│ └── basic_usage.py # Basic CUGA usage examples
└── docs/
├── setup.md # Setup instructions
├── BUILD_AND_TEST_GUIDE.md # Build and testing guide
├── OLLAMA_SETUP.md # Ollama configuration guide
└── ARCHITECTURE_FLOWCHART.md # Application architecture diagrams
#!/usr/bin/env python3
"""
CUGA Basic Usage Examples
This module demonstrates basic CUGA agent usage patterns including:
- Simple task execution
- Configuration management
- Error handling
- Result processing
"""
import os
import sys
from pathlib import Path
from typing import Dict, Any
# Add project root to path
project_root = Path(__file__).parent.parent.parent
sys.path.insert(0, str(project_root))
# Import output logger
from src.utils.output_logger import create_logger
class BasicCUGAExample:
"""Basic CUGA usage examples."""
def __init__(self):
"""Initialize basic example."""
self.results = []
def example_1_simple_task(self) -> Dict[str, Any]:
"""
Example 1: Execute a simple task
Demonstrates:
- Basic task execution
- Result retrieval
"""
print("\n" + "="*60)
print("Example 1: Simple Task Execution")
print("="*60)
task = "List all accounts in the CRM system"
print(f"\nTask: {task}")
print("\nIn a real implementation, this would:")
print("1. Initialize CUGA agent with configuration")
print("2. Load necessary tools (CRM API)")
print("3. Execute the task through the agent")
print("4. Return structured results")
# Simulated result
result = {
"task": task,
"status": "success",
"data": [
{"id": 1, "name": "Acme Corp", "revenue": 1000000},
{"id": 2, "name": "TechStart Inc", "revenue": 750000},
{"id": 3, "name": "Global Solutions", "revenue": 2000000}
],
"execution_time": "2.3s",
"mode": "balanced"
}
print(f"\nResult: {result['status']}")
print(f"Accounts found: {len(result['data'])}")
self.results.append(result)
return result
def example_2_with_parameters(self) -> Dict[str, Any]:
"""
Example 2: Task with parameters
Demonstrates:
- Parameterized task execution
- Filtering and sorting
"""
print("\n" + "="*60)
print("Example 2: Task with Parameters")
print("="*60)
task = "Get top 3 accounts by revenue"
print(f"\nTask: {task}")
print("\nThis demonstrates:")
print("- Parameter extraction from natural language")
print("- API call with filters and sorting")
print("- Result limiting")
# Simulated result
result = {
"task": task,
"status": "success",
"data": [
{"id": 3, "name": "Global Solutions", "revenue": 2000000},
{"id": 1, "name": "Acme Corp", "revenue": 1000000},
{"id": 2, "name": "TechStart Inc", "revenue": 750000}
],
"parameters": {
"limit": 3,
"sort_by": "revenue",
"order": "desc"
},
"execution_time": "1.8s",
"mode": "balanced"
}
print(f"\nResult: {result['status']}")
print(f"Top accounts: {len(result['data'])}")
for i, account in enumerate(result['data'], 1):
print(f" {i}. {account['name']}: ${account['revenue']:,}")
self.results.append(result)
return result
def example_3_error_handling(self) -> Dict[str, Any]:
"""
Example 3: Error handling
Demonstrates:
- Graceful error handling
- Retry mechanisms
- Error reporting
"""
print("\n" + "="*60)
print("Example 3: Error Handling")
print("="*60)
task = "Get account with invalid ID: -999"
print(f"\nTask: {task}")
print("\nThis demonstrates:")
print("- Input validation")
print("- Error detection and handling")
print("- Informative error messages")
# Simulated error result
result = {
"task": task,
"status": "error",
"error": {
"type": "ValidationError",
"message": "Invalid account ID: -999",
"details": "Account ID must be a positive integer"
},
"execution_time": "0.5s",
"mode": "balanced"
}
print(f"\nResult: {result['status']}")
print(f"Error: {result['error']['message']}")
print(f"Details: {result['error']['details']}")
self.results.append(result)
return result
def example_4_multi_step_task(self) -> Dict[str, Any]:
"""
Example 4: Multi-step task execution
Demonstrates:
- Task decomposition
- Sequential execution
- Intermediate results
"""
print("\n" + "="*60)
print("Example 4: Multi-Step Task")
print("="*60)
task = "Find the highest revenue account and get its contact details"
print(f"\nTask: {task}")
print("\nThis demonstrates:")
print("- Automatic task decomposition")
print("- Sequential step execution")
print("- Data passing between steps")
# Simulated result with steps
result = {
"task": task,
"status": "success",
"steps": [
{
"step": 1,
"action": "Get all accounts sorted by revenue",
"status": "completed",
"duration": "1.2s"
},
{
"step": 2,
"action": "Select top account",
"status": "completed",
"duration": "0.1s"
},
{
"step": 3,
"action": "Fetch contact details for account ID 3",
"status": "completed",
"duration": "0.8s"
}
],
"data": {
"account": {
"id": 3,
"name": "Global Solutions",
"revenue": 2000000
},
"contacts": [
{"name": "John Doe", "email": "john@global.com", "role": "CEO"},
{"name": "Jane Smith", "email": "jane@global.com", "role": "CFO"}
]
},
"execution_time": "2.1s",
"mode": "balanced"
}
print(f"\nResult: {result['status']}")
print(f"Steps executed: {len(result['steps'])}")
print(f"\nAccount: {result['data']['account']['name']}")
print(f"Revenue: ${result['data']['account']['revenue']:,}")
print(f"Contacts: {len(result['data']['contacts'])}")
self.results.append(result)
return result
def example_5_mode_comparison(self):
"""
Example 5: Compare different reasoning modes
Demonstrates:
- Fast mode (speed optimized)
- Balanced mode (default)
- Accurate mode (precision optimized)
"""
print("\n" + "="*60)
print("Example 5: Reasoning Mode Comparison")
print("="*60)
task = "Analyze sales trends for Q4 2024"
modes = ["fast", "balanced", "accurate"]
print(f"\nTask: {task}")
print("\nComparing execution across different modes:\n")
for mode in modes:
# Simulated results for different modes
if mode == "fast":
result = {
"mode": mode,
"execution_time": "1.2s",
"steps": 3,
"accuracy": "good",
"description": "Quick analysis with basic insights"
}
elif mode == "balanced":
result = {
"mode": mode,
"execution_time": "2.5s",
"steps": 5,
"accuracy": "very good",
"description": "Thorough analysis with detailed insights"
}
else: # accurate
result = {
"mode": mode,
"execution_time": "4.8s",
"steps": 8,
"accuracy": "excellent",
"description": "Deep analysis with comprehensive insights"
}
print(f"{mode.upper()} Mode:")
print(f" Time: {result['execution_time']}")
print(f" Steps: {result['steps']}")
print(f" Accuracy: {result['accuracy']}")
print(f" Description: {result['description']}\n")
def run_all_examples(self, logger):
"""Run all basic examples."""
logger.log_section("CUGA Basic Usage Examples")
print("\n" + "="*60)
print("CUGA Basic Usage Examples")
print("="*60)
logger.log_section("Example 1: Simple Task")
result1 = self.example_1_simple_task()
logger.log_result(result1)
logger.log_section("Example 2: Task with Parameters")
result2 = self.example_2_with_parameters()
logger.log_result(result2)
logger.log_section("Example 3: Error Handling")
result3 = self.example_3_error_handling()
logger.log_result(result3)
logger.log_section("Example 4: Multi-Step Task")
result4 = self.example_4_multi_step_task()
logger.log_result(result4)
logger.log_section("Example 5: Mode Comparison")
self.example_5_mode_comparison()
print("\n" + "="*60)
print("Summary")
print("="*60)
print(f"\nTotal examples executed: 5")
print(f"Successful tasks: {sum(1 for r in self.results if r['status'] == 'success')}")
print(f"Failed tasks: {sum(1 for r in self.results if r['status'] == 'error')}")
logger.log_section("Summary")
logger.log_text(f"Total examples executed: 5")
logger.log_text(f"Successful tasks: {sum(1 for r in self.results if r['status'] == 'success')}")
logger.log_text(f"Failed tasks: {sum(1 for r in self.results if r['status'] == 'error')}")
print("\n✓ All basic examples completed!")
print("\nNext steps:")
print("- Check out advanced_usage.py for more complex scenarios")
print("- Explore tool_integration.py for custom tool examples")
print("- See the agents/ directory for specialized agent implementations")
def main():
"""Main entry point."""
logger = create_logger("basic_usage")
examples = BasicCUGAExample()
examples.run_all_examples(logger)
logger.finalize()
print(f"\n[Output saved to: {logger.get_filepath()}]")
if __name__ == "__main__":
main()
# Made with Bob
Conclusion
In summary, Bob has proven to be an indispensable AI software development partner, capable of transforming a single GitHub link into a fully functional, production-ready ecosystem. From generating modular code and comprehensive documentation to orchestrating complex, multi-step CUGA workflows, Bob bridges the gap between high-level intent and technical execution.
The journey doesn’t end here — stay tuned, as a deep dive into using CUGA directly from Hugging Face Spaces is coming next...
Links
- Code Repository of this post: https://github.com/aairom/BeeAI-Bob/tree/main/cuga-demo-app
- CUGA Project: https://research.ibm.com/blog/cuga-agent-framework
- CUGA Repository: https://github.com/cuga-project/cuga-agent
- CUGA on Hugging Face-Democratizing Configurable AI Agents: https://huggingface.co/blog/ibm-research/cuga-on-hugging-face
- Experience CUGA Agent: https://huggingface.co/spaces/ibm-research/cuga-agent





Top comments (0)