Hady Walied

Posted on Dec 25

Enterprise AI Agents: Traceability, Atomicity, and Memory Persistence with AgentHelm

#ai #agents #orchestration #traceability

When deploying AI agents in enterprise environments, three requirements typically surface: every action must be traceable, multi-step operations must be atomic, and context must persist across sessions. AgentHelm v0.3.0 addresses all three.

This walkthrough demonstrates building a contract processing system with two specialized agents and full observability.

The Enterprise Requirements

Before diving into code, let's establish what enterprise deployments demand:

Traceability: Every tool call must be logged with inputs, outputs, timing, and cost
Atomicity: If step 3 of 5 fails, steps 1-2 must be rolled back
Memory Persistence: Agent context survives restarts and can be audited
Cost Visibility: Know exactly what each operation costs before the invoice arrives

Architecture Overview

┌────────────────────────────────────────────────────────────┐
│                      PlannerAgent                          │
│              (Generates execution blueprint)               │
└────────────────────────────────────────────────────────────┘
                              │
                              ▼
              ┌───────────────┴───────────────┐
              │         Orchestrator          │
              │   (Saga pattern execution)    │
              └───────────────┬───────────────┘
                              │
              ┌───────────────┴───────────────┐
              ▼                               ▼
┌─────────────────────┐         ┌─────────────────────┐
│   AnalysisAgent     │         │   DocumentAgent     │
│ (Extract & analyze) │         │  (Generate & save)  │
└─────────────────────┘         └─────────────────────┘
              │                               │
              └───────────────┬───────────────┘
                              ▼
              ┌───────────────────────────────┐
              │          MemoryHub            │
              │  (SQLite short-term + Qdrant) │
              └───────────────────────────────┘
                              │
                              ▼
              ┌───────────────────────────────┐
              │       ExecutionTracer         │
              │  (SQLite + OpenTelemetry)     │
              └───────────────────────────────┘
                              │
                              ▼
              ┌───────────────────────────────┐
              │           Jaeger              │
              │   (Trace visualization)       │
              └───────────────────────────────┘

Setup

pip install agenthelm

import dspy
from agenthelm import (
    ToolAgent, PlannerAgent, Orchestrator, AgentRegistry,
    tool, MemoryHub, ExecutionTracer
)
from agenthelm.core.storage import SqliteStorage
from agenthelm.tracing import init_tracing

lm = dspy.LM("mistral/mistral-large-latest")

Configure Observability First

In enterprise deployments, observability is not an afterthought.

# Initialize OpenTelemetry with Jaeger
init_tracing(
    service_name="contract-processor",
    otlp_endpoint="http://jaeger:4317",
    enabled=True
)

# Create execution tracer with persistent storage
tracer = ExecutionTracer(
    storage=SqliteStorage("/var/log/agenthelm/traces.db"),
    session_id="contract-batch-2025-12-26"
)

Every tool execution is now:

Logged to SQLite with full inputs/outputs
Exported to Jaeger for distributed tracing
Tagged with session ID for batch correlation

Configure Memory Persistence

# Production memory configuration
memory = MemoryHub(
    data_dir="/var/data/agenthelm",  # Local persistence
    # Or for network mode:
    # redis_url="redis://cache.internal:6379",
    # qdrant_url="http://qdrant.internal:6333"
)

MemoryHub provides:

Short-term memory: Session state with TTL (SQLite locally, Redis for distributed)
Semantic memory: Vector search for context retrieval (Qdrant with FastEmbed)

Define Tools with Compensation

Atomicity requires compensating actions for rollback.

@tool()
def extract_contract_data(document_path: str) -> dict:
    """Extract structured data from a contract document."""
    # Simulated extraction
    return {
        "parties": ["Acme Corp", "Widget Inc"],
        "value": 150000,
        "terms": "12 months",
        "effective_date": "2025-01-01"
    }

@tool()
def validate_compliance(contract_data: dict) -> dict:
    """Validate contract against compliance rules."""
    issues = []
    if contract_data.get("value", 0) > 100000:
        issues.append("Requires senior approval for contracts > $100k")
    return {"valid": len(issues) == 0, "issues": issues}

@tool(compensating_tool="delete_record")
def create_record(contract_data: dict, record_type: str) -> str:
    """Create a record in the system."""
    record_id = f"REC-{hash(str(contract_data)) % 10000:04d}"
    # In production: database insert
    return record_id

@tool()
def delete_record(record_id: str) -> str:
    """Delete a record (compensation action)."""
    # In production: database delete
    return f"Deleted {record_id}"

@tool(compensating_tool="archive_document")
def generate_summary(contract_data: dict, output_path: str) -> str:
    """Generate and save a contract summary document."""
    content = f"Contract Summary: {contract_data}"
    with open(output_path, "w") as f:
        f.write(content)
    return output_path

@tool()
def archive_document(output_path: str) -> str:
    """Archive a document (compensation action)."""
    import os
    if os.path.exists(output_path):
        os.rename(output_path, f"{output_path}.archived")
    return f"Archived {output_path}"

Note the compensating_tool parameter. When the Orchestrator detects a failure, it automatically calls these in reverse order.

Create Specialized Agents

# Agent 1: Analysis specialist
analysis_agent = ToolAgent(
    name="analyst",
    lm=lm,
    tools=[extract_contract_data, validate_compliance],
    tracer=tracer,
    memory=memory,
    role="You are a contract analysis specialist. Extract data accurately."
)

# Agent 2: Document specialist  
document_agent = ToolAgent(
    name="documenter",
    lm=lm,
    tools=[create_record, generate_summary],
    tracer=tracer,
    memory=memory,
    role="You are a document management specialist. Create records precisely."
)

# Register agents
registry = AgentRegistry()
registry.register(analysis_agent)
registry.register(document_agent)

Both agents share:

The same tracer (unified trace storage)
The same memory hub (shared context)

Generate the Execution Plan

# Planner agent has visibility into all tools
planner = PlannerAgent(
    name="planner",
    lm=lm,
    tools=[
        extract_contract_data, validate_compliance,
        create_record, generate_summary
    ]
)

plan = planner.plan(
    "Process contract.pdf: extract data, validate compliance, "
    "create a system record, and generate a summary document"
)

print(plan.to_yaml())

Generated plan:

goal: Process contract and create records
reasoning: |
  Sequential process: extract first, then validate, then create
  record, finally generate summary. Each step depends on previous.
steps:
  - id: extract
    agent: analyst
    tool: extract_contract_data
    description: Extract structured data from contract
    args:
      document_path: "contract.pdf"

  - id: validate
    agent: analyst
    tool: validate_compliance
    description: Check against compliance rules
    depends_on: [extract]
    args:
      contract_data: "${extract.result}"

  - id: record
    agent: documenter
    tool: create_record
    description: Create system record
    depends_on: [validate]
    args:
      contract_data: "${extract.result}"
      record_type: "contract"

  - id: summary
    agent: documenter
    tool: generate_summary
    description: Generate summary document
    depends_on: [record]
    args:
      contract_data: "${extract.result}"
      output_path: "/output/contract_summary.md"

Review and Edit the Plan

Before execution, the plan can be reviewed and modified.

# Save plan for review
plan_path = "/reviews/contract_plan.yaml"
with open(plan_path, "w") as f:
    f.write(plan.to_yaml())

# Manual review happens here...
# Reviewer can edit the YAML, add steps, modify args

# Load reviewed plan
from agenthelm import Plan
reviewed_plan = Plan.from_yaml(open(plan_path).read())

# Approve for execution
reviewed_plan.approved = True

In production, this review step integrates with your approval workflow; Slack notifications, PR-based reviews, or manual sign-off.

Execute with Saga Rollback

orchestrator = Orchestrator(
    registry=registry,
    enable_rollback=True  # Saga pattern enabled
)

result = await orchestrator.execute(reviewed_plan)

If generate_summary fails after create_record succeeds:

generate_summary marked as FAILED
Orchestrator triggers rollback
delete_record called automatically (compensating action for create_record)
System returns to consistent state

Inspect Traces

After execution, full traceability:

# Programmatic access
for event in result.events:
    print(f"{event.tool_name}: {event.execution_time:.3f}s, ${event.estimated_cost_usd:.4f}")

# Summary
print(f"Total cost: ${result.total_cost_usd:.4f}")
print(f"Total tokens: {result.token_usage.total_tokens}")

Via CLI:

# List recent traces
agenthelm traces list -s /var/log/agenthelm/traces.db

# Filter by tool
agenthelm traces filter --tool create_record --status success

# Export for audit
agenthelm traces export -o audit_report.json -f json
agenthelm traces export -o audit_report.csv -f csv

Memory for Context Continuity

Store and retrieve context across sessions:

from agenthelm.memory import MemoryContext

async with MemoryContext(memory, session_id="contract-batch-2025-12-26") as ctx:
    # Store processing context
    await ctx.set("last_processed_contract", "contract.pdf")
    await ctx.set("batch_status", {"processed": 1, "failed": 0})

    # Store semantic memory for future retrieval
    await ctx.store_memory(
        "Contract with Acme Corp processed successfully. Value: $150k, 12 months.",
        metadata={"contract_id": "contract.pdf", "status": "complete"}
    )

# Later, in another session
async with MemoryContext(memory, session_id="new-session") as ctx:
    # Recall relevant past contracts
    results = await ctx.recall("Acme Corp contracts", top_k=5)
    for r in results:
        print(f"Score: {r.score:.2f} - {r.text}")

Jaeger Integration

With OpenTelemetry configured, view traces in Jaeger:

# Start Jaeger
docker run -d -p 16686:16686 -p 4317:4317 jaegertracing/all-in-one

# Run your agent workflow
python process_contracts.py

# Open Jaeger UI
open http://localhost:16686

Each agent execution appears as a span with:

Tool name and arguments
Execution duration
Success/failure status
Cost attribution

Key Enterprise Benefits

Requirement	AgentHelm Feature
Audit trail	ExecutionTracer + SQLite storage
Distributed tracing	OpenTelemetry + Jaeger
Atomic operations	Saga pattern with compensating tools
Session persistence	MemoryHub with Redis/SQLite
Context search	Semantic memory with Qdrant
Cost control	Built-in pricing for 20+ LLM providers
Human review	Plan YAML export/import workflow