DEV Community: Valentina

Multiple Deployments, One Config File

Valentina — Thu, 05 Mar 2026 11:00:00 +0000

If you're building with AI agents, you probably don't have just one. Say you're building a lead aggregation pipeline. You've got one agent that scrapes company websites, another that pulls leads from LinkedIn, and a third that mines Reddit and community forums. They all share the same data models and scoring logic, they all run on a schedule, and they all live in the same repo. But each one deploys independently, so each one needs its own crewship.toml and its own deploy commands, which adds up fast.

It works, but it's clunky. You end up duplicating build settings, keeping exclude lists in sync, and jumping between directories every time you deploy.

We kept hearing this from teams building multi-agent systems, and honestly ran into it ourselves. So we fixed it.

One file, multiple deployments

You can now define multiple deployments in a single crewship.toml. Instead of one [deployment] section, use named [deployments.<name>] sections:

[build]
exclude = ["tests", "notebooks"]

[deployments.web-scraper]
framework = "crewai"
entrypoint = "leads.web_scraper.crew:WebScraperCrew"
profile = "browser"
python = "3.11"

[deployments.linkedin-miner]
framework = "crewai"
entrypoint = "leads.linkedin.crew:LinkedInCrew"

[deployments.reddit-miner]
framework = "crewai"
entrypoint = "leads.reddit.crew:RedditCrew"

Each named section becomes its own deployment on Crewship, with the name as the project name. The [build] config is shared across all of them, so you only declare your exclude list once.

That's it. No wrapper scripts, no monorepo tooling, no separate directories. Three lead miners, one file.

Deploying and targeting

Every CLI command now takes a --name (or -n) flag to target a specific deployment:

crewship deploy --name web-scraper
crewship deploy --name linkedin-miner
crewship deploy --name reddit-miner

Same for env vars, invocations, and schedules. For a lead pipeline where every source runs on its own schedule, that looks like:

crewship env set --name linkedin-miner LINKEDIN_API_KEY=...
crewship env set --name reddit-miner REDDIT_CLIENT_ID=... REDDIT_CLIENT_SECRET=...

crewship schedule create --name web-scraper "Scrape targets" --cron "0 */6 * * *"
crewship schedule create --name linkedin-miner "LinkedIn sync" --cron "0 8 * * 1-5"
crewship schedule create --name reddit-miner "Reddit sweep" --cron "0 9 * * *"

If you skip --name and there's only one deployment in the file, it gets picked automatically. If there are multiple, the CLI prompts you to choose. In CI where there's no TTY, it'll error and tell you to pass --name explicitly, so you don't accidentally deploy the wrong thing.

Deployment IDs are tracked per deployment

After the first deploy, Crewship saves the deployment_id back into the config for each deployment:

[deployments.web-scraper]
framework = "crewai"
entrypoint = "leads.web_scraper.crew:WebScraperCrew"
deployment_id = "dep_abc123"   # auto-populated after first deploy

[deployments.linkedin-miner]
framework = "crewai"
entrypoint = "leads.linkedin.crew:LinkedInCrew"
deployment_id = "dep_def456"   # auto-populated after first deploy

This means subsequent deploys know exactly which deployment to update without you having to track IDs manually. Commit the file to version control and your whole team stays in sync.

Nothing breaks

If you've got an existing crewship.toml with a single [deployment] section, nothing changes. That format works exactly as before. The new multi-deployment format is opt-in, and crewship init still generates the single-deployment config by default.

The two formats are mutually exclusive. If you accidentally mix [deployment] and [deployments.*] in the same file, the CLI catches it and tells you what to do.

When this matters

The lead aggregator setup is a good example, but it applies anywhere you have agents that share code but deploy separately. A few patterns that fall out of this naturally:

Monorepo without the mess — your lead miners share scoring logic, data models, and utility code. With multi-deployment, they stay in one repo and one config file instead of being split across separate projects that drift out of sync.
Independent schedules — each source runs on its own cadence. The web scraper every 6 hours, LinkedIn on weekday mornings, Reddit once a day. Set them up with crewship schedule create --name and they run independently.
Gradual rollout — deploy one miner at a time, verify it works, then deploy the next. Each deployment has its own version history and rollback.

Getting started

If you're starting from scratch, crewship init sets up a single-deployment config. When you're ready to add more agents, edit the file to use the named format:

# Before
[deployment]
framework = "crewai"
entrypoint = "leads.web_scraper.crew:WebScraperCrew"

# After
[deployments.web-scraper]
framework = "crewai"
entrypoint = "leads.web_scraper.crew:WebScraperCrew"

[deployments.linkedin-miner]
framework = "crewai"
entrypoint = "leads.linkedin.crew:LinkedInCrew"

Deploy them, set their env vars, invoke them. Everything else works the same.

Full details are in the configuration docs. If you run into anything or have feedback, reach out -- we'd like to hear how you're using it.

Runs vs. Threads: When to Use Which

Valentina — Thu, 05 Mar 2026 11:00:00 +0000

Crewship has two ways to execute a deployed crew: the Run API and the Thread API. If you've used the platform at all, you've already used runs. Threads are newer and less obvious, and the question we keep hearing is: when should I use which?

Runs are for one-shot tasks. Threads are for conversations. That's the short version. The rest of this post is the long version.

Runs: the default

A run is a single execution of your crew. You send input, it does its thing, you get output. Each run gets its own container, shares nothing with other runs, and the environment gets torn down when it finishes.

crewship invoke --input '{"topic": "AI agents in logistics"}'

Or via the API:

curl -X POST https://api.crewship.dev/v1/runs \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"deployment_id": "dep_abc123", "input": {"topic": "AI agents in logistics"}}'

You get back a run ID. The run moves through pending, running, then lands on succeeded, failed, or canceled. You can stream events in real time to watch your agents work, or just poll for the result.

No setup, no cleanup, no state to manage.

When runs make sense

Runs are the right choice when your crew's job starts and ends in a single execution. "Write a blog post about X" — input in, content out, done. "Analyze this dataset and generate a report" — same deal. Batch operations where you process a list of items independently, background jobs triggered by a webhook or a cron schedule, pipeline steps where your crew is one stage in a larger workflow. The work is self-contained every time.

The common pattern: the crew doesn't need to ask clarifying questions, doesn't need to remember what happened last time. It takes input and produces output, and that's the whole interaction.

What runs don't do

Runs are stateless. When a run finishes, it's gone. If you kick off another run with the same deployment, it has zero context about the previous one. It doesn't know you ran it five minutes ago with slightly different input. It doesn't know the output last time was almost right but needed one small tweak.

For a lot of workloads, that's exactly what you want. But for some, it's a problem.

Threads: when you need memory

A thread is a persistent conversation context scoped to a deployment. You create it once, then run your crew inside it as many times as you need. Each run receives the thread's current state, and when the run finishes, it can update that state. The next run picks up where the last one left off.

# Create a thread
crewship thread create dep_abc123

# Run inside it
crewship invoke dep_abc123 --thread thr_xyz789 -i '{"message": "Research AI agents in healthcare"}'

# Follow up — the crew remembers the first message
crewship invoke dep_abc123 --thread thr_xyz789 -i '{"message": "Now focus on diagnostic applications"}'

Via the API:

# Create the thread
curl -X POST https://api.crewship.dev/v1/threads \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"deployment_id": "dep_abc123"}'

# Run inside it
curl -X POST https://api.crewship.dev/v1/threads/thr_xyz789/runs \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": {"message": "Research AI agents in healthcare"}}'

The thread tracks state through a values field — a JSON object that your crew can read and write. After each run, Crewship saves a checkpoint, so you have a full history of how the state changed over time.

Thread lifecycle

Threads have their own status:

idle — ready for a new run
busy — a run is executing; new run requests get a 409 until the current one finishes
interrupted — the run was interrupted
error — the last run failed, but the thread still accepts new runs

Only one run can execute in a thread at a time. That's by design — it keeps state consistent. No risk of two concurrent runs stepping on each other's updates.

When threads make sense

Threads are useful anywhere that context carries over between interactions. The most obvious case is a conversational agent — a chatbot or support assistant where the user sends a message, the crew responds, the user follows up, and so on. The thread holds the full conversation history so each response accounts for everything that came before.

But it goes beyond chat. Iterative refinement is another good fit: "Generate a marketing plan." Then: "Make the budget section more detailed." Then: "Add a timeline." Each run builds on the previous output instead of starting from scratch. You could try to stuff the entire prior result into the next run's input, but that gets unwieldy fast.

Multi-step workflows with human approval also work well with threads. The crew does research, presents findings, and waits. The user reviews, gives direction, and kicks off the next run. The thread holds the intermediate state between steps without you having to manage it yourself.

Checkpoints

Every time a run inside a thread finishes, Crewship saves a checkpoint — a snapshot of the thread's state at that moment.

curl https://api.crewship.dev/v1/threads/thr_xyz789/history \
  -H "Authorization: Bearer YOUR_API_KEY"

This gives you an audit trail. It's also useful for debugging: if the crew's response went sideways on turn 5, you can look at the checkpoint from turn 4 to see what state it was working with.

Thread metadata

Threads support a metadata field that's separate from the conversation state. Use it for things like user IDs, channels, tags — anything you want to filter or search by later:

curl -X POST https://api.crewship.dev/v1/threads \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "deployment_id": "dep_abc123",
    "metadata": {"user_id": "user_42", "channel": "web", "priority": "high"}
  }'

This matters once you have hundreds of threads across different users and use cases.

Side by side

	Runs	Threads
State	None — each run is isolated	Persistent across runs via `values`
Lifecycle	`pending` → `running` → terminal state	`idle` → `busy` → `idle` (repeats)
Concurrency	Unlimited parallel runs	One run at a time per thread
History	Individual run records	Checkpoints after each run
Setup	None — just create a run	Create thread first, then run inside it
Cleanup	Automatic	Manual — delete thread when done
Cost profile	Predictable per run	Grows with conversation length

How other platforms handle this

If you've used other agent platforms, you'll recognize the split. OpenAI's Assistants API had Threads and Runs — literally the same names. They've since replaced it with the Responses API and Conversations, but it's the same idea: a stateless execution primitive and an optional persistence layer.

LangGraph does the same thing. Call graph.invoke() for a one-shot execution. Pass a thread_id and you get persistence, checkpoints, and the ability to resume from any point. CrewAI has kickoff() for one-shot execution and a separate conversational mode for multi-turn interactions.

The pattern across all of these: runs and threads are independent concerns. Runs are always the execution primitive. Threads optionally chain runs together with shared state. You don't need threads until you do.

So which one?

One question gets you most of the way there: does the crew need to remember anything from previous executions?

If no, use a run. Adding a thread just adds complexity you don't need.

If yes, use a thread. Trying to fake statefulness by jamming prior context into run inputs gets ugly fast, and you lose checkpoints, history, and the concurrency guarantees that threads give you.

A few more signals that point toward threads: your users will interact with the crew multiple times per session, the crew's output depends on conversation history rather than just the current input, or you need an audit trail of how state evolved over time.

And toward runs: the crew can do its job with a single set of inputs, you want to fire off many executions in parallel, or the workload is triggered by an automated system rather than a person sending messages.

Getting started with threads

If you've been using runs and want to try threads, there's nothing to change about your deployment. Threads work with any deployed crew — CrewAI, LangGraph Python, or LangGraph JS.

Create a thread, run inside it, and your crew receives the thread state. Update the state from within your crew, and the next run picks up where you left off.

# Create a thread
crewship thread create dep_abc123 --metadata '{"user_id": "demo"}'

# First turn
crewship invoke dep_abc123 --thread thr_xyz789 -i '{"message": "What can you help me with?"}'

# Second turn — crew receives context from the first
crewship invoke dep_abc123 --thread thr_xyz789 -i '{"message": "Tell me more about option 2"}'

# Check the history
crewship thread history thr_xyz789

Full API reference is in the threads documentation.

Questions about runs, threads, or anything else? Check the docs or reach out at mail@crewship.dev.

How to Deploy LangGraph to Production

Valentina — Mon, 09 Feb 2026 15:00:00 +0000

You built a LangGraph agent. It runs locally. You've got nodes, edges, conditional routing, state that flows through the graph. Maybe it's a research assistant that searches the web and writes reports. Maybe it's a multi-step tool-calling agent with loops. Whatever it does, it works on your machine.

Now you want to put it somewhere other people can use it. Or somewhere your backend can call it. And this is where LangGraph gets interesting, because it's not a simple stateless function you can throw behind a Lambda.

LangGraph adds real complexity that plain LangChain chains don't have: state management, cycles, conditional edges, long-running executions with multiple tool calls. That complexity matters when you try to run it in production.

The state problem

LangGraph graphs are stateful by design. Your StateGraph defines a typed state object, and every node reads from and writes to that state as execution flows through the graph. Nodes accumulate results, branch based on previous outputs, loop back when conditions aren't met.

Here's a basic example of what that looks like:

from langgraph.graph import StateGraph, END
from typing import TypedDict

class AgentState(TypedDict):
    messages: list
    next_step: str

graph = StateGraph(AgentState)
graph.add_node("research", research_node)
graph.add_node("analyze", analyze_node)
graph.add_conditional_edges("research", should_continue)

Locally, this all lives in memory. State gets created, passed around, updated. When the graph finishes, you read the final state and move on. Simple.

In production, that in-memory model breaks. You need state to survive process restarts. You need it isolated between concurrent runs so one user's execution doesn't bleed into another's. If you're running workers across multiple machines, state can't just sit in a local variable. And you need to be able to inspect the state of a running or failed graph to understand what happened.

You still need an API

Same problem as any agent framework: graph.invoke() works fine in a script, but nobody can call a Python script running on your laptop. You need an HTTP API in front of it.

from fastapi import FastAPI, BackgroundTasks
from uuid import uuid4

app = FastAPI()

@app.post("/run")
async def start_run(inputs: dict, background_tasks: BackgroundTasks):
    run_id = str(uuid4())
    store[run_id] = {"status": "running"}
    background_tasks.add_task(execute_graph, run_id, inputs)
    return {"run_id": run_id}

The catch with LangGraph is that runs tend to be long. A graph with conditional loops, multiple tool calls, and multi-step reasoning can easily run for several minutes. That's well past the default timeout on most reverse proxies and serverless platforms. So you can't just wait for the result inside the request handler. You need async execution, a way to store results, and a way for clients to check back or get notified when it's done.

You also want streaming. One of LangGraph's strengths is that you can stream events as nodes execute: which node just ran, what it produced, what the state looks like at each step. Losing that in production means losing one of the main benefits of using LangGraph in the first place. So now you need SSE or WebSocket support in your API too.

Containerization

LangGraph pulls in langgraph, langchain-core, and depending on what tools and models you're using, potentially langchain-community, langchain-openai, langchain-anthropic, and a handful of other packages. If your agents use custom tools, add those dependencies to the pile.

Dependency resolution in the LangChain ecosystem can be painful. Version conflicts between langchain-core and community packages are common. Pinning versions in a requirements.txt or pyproject.toml helps, but you'll still spend time debugging import errors that only show up in the container and not on your machine.

Docker solves the environment consistency problem, but now you're maintaining Dockerfiles, dealing with image builds, and pushing to a registry. If your graph uses any tools that need system-level dependencies (browsers, ffmpeg, etc.), the Dockerfile gets more complex.

Tracing graph execution

This is where LangGraph really differs from other frameworks. When a CrewAI crew fails, you have a sequence of agent actions to trace through. When a LangGraph graph fails, you have a directed graph where node A called node B which conditionally routed back to node A, which then called node C.

Without proper tracing, debugging a failed graph run is rough. Which node threw the error? What was the state when it happened? Did a conditional edge route to the wrong node? Was the state corrupted by a previous node? Did a cycle run more times than expected?

You could pipe everything to stdout and read logs, but that gets unreadable fast with complex graphs. LangSmith exists for this, but it's a separate hosted service with its own pricing and setup. And even with LangSmith, you still need to wire up the integration and make sure traces are actually being captured in your production environment.

The same infrastructure problems

Everything else from the CrewAI deployment guide applies here too. Scaling workers, job queues, versioning deployments, authentication, rate limiting, secret management. These are the same problems regardless of whether you're running CrewAI or LangGraph.

I won't rehash all of that here. The short version: you end up building a container orchestration system, a job queue, a versioning pipeline, an auth layer, and an observability stack. None of it is your actual product.

Deploy with Crewship

Crewship now supports LangGraph natively. If your project has a langgraph.json file, Crewship auto-detects it and handles everything from there.

Here's the full deployment flow.

Install the CLI

curl -fsSL https://www.crewship.dev/install.sh | bash

Log in

crewship login

Set up your project

crewship init

Crewship detects LangGraph from your langgraph.json and generates a crewship.toml:

[deployment]
framework = "langgraph"
entrypoint = "src.my_graph.graph:graph"
python = "3.11"
profile = "slim"

The entrypoint points to your compiled StateGraph object. Crewship uses this to invoke your graph without you needing to write any API code.

Deploy

crewship deploy

Your code gets packaged, built into a container, and deployed. You get a deployment URL and a link to the Crewship console where you can manage it.

Add your secrets

crewship env set OPENAI_API_KEY=sk-... TAVILY_API_KEY=...

Or import from your .env:

crewship env import -f .env

Secrets are encrypted and injected at runtime. Nothing gets baked into the container image.

Run it

From the CLI:

crewship invoke --input '{"messages": [{"role": "user", "content": "Research the latest AI papers"}]}'

The CLI streams events as nodes execute, so you can watch your graph work through its steps in real-time.

From the REST API:

curl -X POST https://api.crewship.dev/v1/runs \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"deployment": "my-graph", "input": {"messages": [{"role": "user", "content": "Research the latest AI papers"}]}}'

Execution traces in Crewship map to your graph nodes. You can see which node ran, what state it received, what it produced, how long each step took. When a node fails or a conditional edge routes somewhere unexpected, you'll see where and why.

LangGraph.js support

If you're building with LangGraph.js (the TypeScript/JavaScript version), Crewship supports that too. The deployment experience is the same: same CLI workflow, same API, same execution traces. Check out the LangGraph.js page for details.

What you get out of the box

All of that infrastructure from steps 1 through 6 that you'd otherwise build yourself:

Isolated execution — every run gets its own environment, no interference between runs
Auto-scaling — scales up when there's work, scales to zero when there isn't
Deployment versioning — each crewship deploy creates a new version, roll back to any previous one
Graph-aware execution traces — see which nodes ran, state at each step, timing, token usage
Webhooks — trigger runs from CI/CD, cron jobs, or Zapier; get notified on completion
Token auth — API key authentication, generate and rotate keys from the console
Real-time SSE streaming — watch graph execution live, or poll for the result

Get started

You can deploy your first LangGraph graph on Crewship in a few commands. No credit card required for the free tier.

If you're already using Crewship for CrewAI, the same account and CLI work for LangGraph. Just crewship init in your LangGraph project and deploy.

Questions about deploying LangGraph? Check the docs or reach out at mail@crewship.dev.

How to Deploy CrewAI to Production

Valentina — Mon, 09 Feb 2026 12:10:16 +0000

So you've built a CrewAI crew. Maybe it researches topics and writes reports, or processes customer data and spits out insights. It works on your machine, the output looks good, and now you want other systems—or other people—to be able to use it.

That's where things get interesting. There's a surprising amount of stuff between "it works on my laptop" and "it runs in production," and most of it has nothing to do with AI. This guide walks through the whole journey, step by step, in roughly the order you'd run into each problem yourself.

Step 1: You Need an API

Right now your crew runs as a Python script. You call crew.kickoff(), wait, and get a result. That's fine for development, but no other service can call a Python script sitting on your machine.

First order of business: stick an HTTP API in front of it. FastAPI is the go-to choice here:

from fastapi import FastAPI
from your_project.crew import YourCrew

app = FastAPI()

@app.post("/run")
def run_crew(inputs: dict):
    crew = YourCrew().crew()
    result = crew.kickoff(inputs=inputs)
    return {"output": result.raw}

Looks easy enough. But there's a catch you'll hit almost immediately—crew runs are slow. We're talking anywhere from one to ten minutes depending on how many agents you have, what they're doing, and how many LLM calls they need to make. Meanwhile, most HTTP clients and reverse proxies have timeouts way shorter than that. Nginx defaults to 60 seconds. Serverless platforms are often worse.

Your crew that happily runs for 8 minutes on your laptop? In production, the request just dies with a timeout error.

The fix is to not wait for the crew inside the request handler at all. Kick off the run in the background, hand back a run ID, and let the client check back later:

@app.post("/run")
async def start_run(inputs: dict, background_tasks: BackgroundTasks):
    run_id = str(uuid4())
    store[run_id] = {"status": "running"}
    background_tasks.add_task(execute_crew, run_id, inputs)
    return {"run_id": run_id}

@app.get("/run/{run_id}")
def get_run(run_id: str):
    return store.get(run_id, {"status": "not_found"})

Great, now you have an API. Next you need somewhere to actually run it.

Step 2: Containerize It

Your API needs a consistent environment every time it starts up. CrewAI pulls in langchain, openai, pydantic, and potentially dozens of other packages depending on what tools your agents use. All of those need to be installed, at the right versions, reliably.

Docker is the standard answer:

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

If any of your agents use browser-based tools—web scraping with Playwright, for example—your Dockerfile gets a lot uglier. You'll need Chromium and all its system-level dependencies, which bloats the image size and makes builds slower.

Then there's the question of where to host the container. Railway, Fly.io, AWS ECS, Google Cloud Run—there's no shortage of options, and each one comes with its own config format, networking quirks, and pricing model. Pick one, get it running, and move on. Because there's more.

Step 3: One Machine Isn't Enough

Here's something you'll figure out pretty fast: crew runs eat resources. Each one loads a full agent context, fires off dozens of LLM calls, and might run tools that chew through a lot of memory. If you try to handle several concurrent runs on a single server, you're going to run into memory pressure, CPU contention, and eventually OOM kills.

The obvious reaction is to just get a bigger server. That helps for a while, but you're also paying for all that capacity even when nothing is running.

What you really want is to spin up an isolated environment for each run:

Runs can't step on each other
When there's no work, you scale to zero and stop paying
When a burst of requests comes in, you spin up more instances to match

This is where it stops being "deploy an app" and starts being "build a container orchestration system." Kubernetes is the usual answer, but running Kubernetes well—even managed Kubernetes on EKS or GKE—is basically its own job.

Step 4: Add a Job Queue

Now that runs happen in their own containers, something needs to coordinate the work. You can't just spin up a container inline when a request comes in—you need a proper queue.

The flow ends up looking like this:

Your API gets a request and drops a job onto the queue
A worker picks up the job
The worker spins up a fresh environment, runs the crew, stores the result
The client polls your API for the result (or you send a webhook)

For the queue itself, you'll need a message broker—Redis, RabbitMQ, SQS, something like that. Plus a task framework like Celery to actually run the jobs:

@celery_app.task(bind=True, max_retries=3)
def run_crew_task(self, crew_config: dict, inputs: dict):
    crew = build_crew(crew_config)
    result = crew.kickoff(inputs=inputs)
    return result.raw

And now you've got a whole new set of things to worry about: dead letter queues, retry policies, concurrency limits, monitoring. What happens when a worker crashes in the middle of a run? What if the queue starts backing up—how do you prioritize? All solvable, but none of it solves itself.

Step 5: Versioning Gets Tricky

This one sneaks up on you. Picture this: you have 10 jobs sitting in the queue, waiting to be processed. You deploy a new version of your crew—maybe you tweaked an agent's prompt or swapped out a tool. What happens to those 10 jobs?

If your workers always pull the latest code, those queued jobs run on the new version. Sometimes that's fine. But if you changed the input format or removed a tool the old config depended on, those jobs are going to break.

For production you need version awareness. Every job should be pinned to the version of the crew it was submitted against, and your workers need to be able to run older versions. That means:

Tagging your container images properly (not just pushing to latest)
Recording the version alongside every job in the queue
Keeping older versions around so in-flight jobs can finish
Having a way to roll back when a new version causes problems

At this point your "deployment" has turned into a proper system—container registry, version metadata, rollback procedures, maybe separate staging and production environments each with their own version history.

Step 6: Authentication and Security

Your API is on the internet now. If someone finds the URL, they can start kicking off crew runs—and those cost real money because every run makes LLM API calls that show up on your bill.

At bare minimum you need:

API keys so only authorized clients can trigger runs
Secret management for your LLM provider keys (OpenAI, Anthropic, etc.)—you don't want those hardcoded or scattered across worker configs
Input validation to catch garbage inputs and prompt injection attempts
Rate limiting so a buggy client can't accidentally blow through your API budget in an afternoon

If multiple people or teams are using your crew, add per-user keys, usage tracking, and audit logs to the list. None of this is AI-specific—it's standard web security stuff. But it all needs to get built.

Step 7: Observability

Alright. Your crew is deployed, containerized, queued, versioned, and locked down. It's running in production. And then one morning a run fails.

Why? Which agent hit the problem? Which task? What did the LLM actually respond with—was it a rate limit, a timeout, a weird tool output, or just a hallucination?

Without proper observability, you're basically guessing. What you need:

Detailed logs from each run—not just "started" and "finished," but the actual trace of agent decisions, tool calls, and LLM responses
Metrics on run duration, token usage, cost per run, success rates
Alerts for when error rates spike or costs go past a threshold
Run history so you can compare outputs across different inputs and versions

That means integrating with whatever logging and monitoring stack you use (Datadog, CloudWatch, Grafana, etc.), building custom dashboards, and adding instrumentation throughout your code. It's work.

Take a Step Back

Look at everything you've put together:

An HTTP API in front of your crew
A Docker container to package it
Container orchestration for isolated execution
A message queue to coordinate jobs
A versioning system wired into your deploy pipeline
Auth, rate limiting, and secret management
Logging, metrics, and alerting

That's a lot of infrastructure. Every piece makes sense on its own, but stacked together it's a real system that needs real maintenance. And here's the thing—none of it is your actual product. All of it exists purely to let your CrewAI agents run somewhere other than your laptop.

Or: Deploy With Crewship

We built Crewship because we got tired of rebuilding this stack every time. All the infrastructure above—the containers, the queues, the versioning, the auth—it's handled for you.

Here's what the same deployment looks like with Crewship, starting from a standard CrewAI project:

Install the CLI

curl -fsSL https://www.crewship.dev/install.sh | bash

Log in

crewship login

Opens your browser for a one-time login. After that, API keys are managed through the Crewship console.

Set up your project

crewship init

This looks at your project, finds your CrewAI entrypoint, and creates a crewship.toml:

[deployment]
framework = "crewai"
entrypoint = "your_project.crew:YourCrew"
python = "3.11"
profile = "slim"

If your agents need a browser, just set profile = "browser" and Crewship takes care of the Chromium stuff.

Deploy

crewship deploy

Done. Your code gets packaged, built, and deployed. You get a deployment URL and a link to the console.

Add your secrets

crewship env set OPENAI_API_KEY=sk-... SERPER_API_KEY=...

Or just import your .env file:

crewship env import -f .env

Everything is encrypted and injected at runtime. Nothing gets baked into the image.

Run it

crewship invoke --input '{"topic": "AI agents in healthcare"}'

The CLI streams events as they happen—you can watch your agents work through their tasks in real time. For programmatic access, there's a REST API:

curl -X POST https://api.crewship.dev/v1/runs \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"deployment": "your-crew", "input": {"topic": "AI agents"}}'

Everything from steps 1–7, included

All of that infrastructure you'd otherwise build yourself comes out of the box:

Isolated execution — every run gets its own environment, no interference between runs
Auto-scaling — scales up when there's work, scales to zero when there isn't
Deployment versioning — each crewship deploy creates a new version, roll back to any previous one with a click, keep staging and production separate
Execution traces — full visibility into agent actions, LLM calls, tool usage, token counts, and cost per run
Webhooks — trigger runs from CI/CD, cron jobs, or Zapier with incoming webhooks; get notified on completion with outgoing webhooks to your backend, Slack, wherever; all signed with HMAC-SHA256
Auth — token-based API authentication, generate and rotate keys from the console
Real-time streaming — watch runs happen live over Server-Sent Events, or just poll for the result

Push a new version

Changed your agents? Just deploy again:

crewship deploy

New version goes live. In-flight jobs keep running on the version they started with. If something's off, roll back from the console. No downtime.

Hook it into your systems

Create an incoming webhook in the console and trigger runs from anywhere:

curl -X POST https://api.crewship.dev/webhooks/runs/YOUR_WEBHOOK_TOKEN \
  -H "Content-Type: application/json" \
  -d '{"topic": "AI agents", "year": "2025"}'

Set up an outgoing webhook so your backend gets notified when a run finishes—no polling needed.

Nobody's saying you can't build all of this yourself. Plenty of teams do. But it's a lot of engineering that doesn't move your actual product forward. Crewship handles the infrastructure so you can spend your time on the part that matters—making your agents better.

Get started for free—no credit card required.

Have questions about deploying your crew? Check out the docs or reach out—we're happy to help.