Reliable Multi-Agent Orchestration with Durable Swarm 💪🐝

#tutorial #ai #python #opensource

🚀 Want to build reliable AI agents? Excited to release Durable Swarm 💪🐝, a drop-in replacement for OpenAI's new Swarm framework.

GitHub repo: https://github.com/dbos-inc/durable-swarm
(Please give us a star if you like it!)

Overview

We enhance Swarm with durable execution to help you build reliable, scalable multi-agent systems. Durable Swarm makes your agentic workflows resilient to failures, so that if they are interrupted or restarted, they automatically resume from their last completed steps.

We believe that as multi-agent workflows become more common, longer-running, and more interactive, it's important to make them reliable. If an agent spends hours waiting for user inputs or processing complex workflows, it must be resilient to transient failures, such as a server restart. However, reliable multi-agent orchestration isn't easy—it requires complex rearchitecting like routing agent communication through SQS or Kafka.

Durable execution helps you write reliable agents while preserving the ease of use of a framework like Swarm. The idea is to automatically persist the execution state of your Swarm workflow in a Postgres database. That way, if your program is interrupted, it can automatically resume your agentic workflows from the last completed step.

Under the hood, we implemented Durable Swarm using DBOS and its lightweight durable execution. The entire implementation of Durable Swarm is <20 lines of code, declaring the main loop of Swarm to be a durable workflow and each chat completion or tool call to be a step in that workflow.

Making Swarm Durable

To add Durable Swarm to your project, simply create a durable_swarm.py file containing the following code:

from swarm import Swarm
from dbos import DBOS, DBOSConfiguredInstance

DBOS()

@DBOS.dbos_class()
class DurableSwarm(Swarm, DBOSConfiguredInstance):
    def __init__(self, client=None):
        Swarm.__init__(self, client)
        DBOSConfiguredInstance.__init__(self, "openai_client")

    @DBOS.step()
    def get_chat_completion(self, *args, **kwargs):
        return super().get_chat_completion(*args, **kwargs)

    @DBOS.step()
    def handle_tool_calls(self, *args, **kwargs):
        return super().handle_tool_calls(*args, **kwargs)

    @DBOS.workflow()
    def run(self, *args, **kwargs):
        return super().run(*args, **kwargs)

DBOS.launch()

Then use DurableSwarm instead of Swarm in your applications—it's a drop-in replacement.

Getting Started

To get started, install Swarm and DBOS and initialize DBOS. Swarm requires Python >=3.10.

pip install dbos git+https://github.com/openai/swarm.git
dbos init --config

You also need an OpenAI API key. You can obtain one here. Set it as an environment variable:

export OPENAI_API_KEY=<your-key>

To try Durable Swarm out, create durable_swarm.py as above then create a main.py file in the same directory containing this simple program:

from swarm import Agent
from durable_swarm import DurableSwarm

client = DurableSwarm()

def transfer_to_agent_b():
    return agent_b


agent_a = Agent(
    name="Agent A",
    instructions="You are a helpful agent.",
    functions=[transfer_to_agent_b],
)

agent_b = Agent(
    name="Agent B",
    instructions="Only speak in Haikus.",
)

response = client.run(
    agent=agent_a,
    messages=[{"role": "user", "content": "I want to talk to agent B."}],
)

print(response.messages[-1]["content"])

DBOS requires Postgres. If you already have a Postgres server, modify dbos-config.yaml to configure its connection information.
Otherwise, we provide a script to start Postgres using Docker:

export PGPASSWORD=swarm
python3 start_postgres_docker.py

Finally, run your agents:

> python3 main.py

Agent B is here,
Ready to help you today,
What do you need, friend?

Converting Existing Apps to DurableSwarm

You can convert any existing Swarm app to DurableSwarm in three simple steps:

Install dbos and initialize it with dbos init --config.
Add durable_swarm.py to your project.
Use DurableSwarm in place of Swarm in your application.

Note: DurableSwarm currently doesn't support streaming

Give it a try and let us know what you think!

Next Steps

Check out how DBOS can make your applications more scalable and resilient:

Use durable execution to write crashproof workflows.
Use queues to gracefully manage API rate limits.
Use scheduled workflows to run your functions at recurring intervals.
Want to learn what you can build with DBOS? Explore other example applications.

Top comments (5)

Tanishq Sakhare • Oct 17 '24

Wow, this is an impressive enhancement to Swarm!

The concept of durable execution adds a crucial layer of reliability, especially for long-running multi-agent systems.

I really like how you've integrated durability without overcomplicating the architecture—it seems like it could save a lot of time and effort in terms of rearchitecting solutions for fault tolerance.

Persisting the workflow state in Postgres is a smart move. Could you share more details on how you optimized for minimal overhead during persistence, or how DBOS complements Swarm's architecture in terms of performance and scalability?

I'm excited to see how this could streamline building more robust agentic workflows.

Qian Li • Oct 18 '24

Thank you @tanishq_s09 for your kind words! DBOS implements several optimizations such as batching and async writes to minimize the overhead of persistence. The code is open sourced here: github.com/dbos-inc/dbos-transact-py

Here is an example reliable agent I built with DBOS + Swarm: github.com/dbos-inc/durable-swarm/...

Would love to hear your feedback!