Building an agentic Jira automation platform with MCP and Temporal

#mcp #ai #python #opensource

Most "AI automation" demos fall apart the moment a workflow needs to run longer than a single request. An agent makes a few tool calls, the process crashes or times out, and you lose all state. I wanted something that could drive real, multi-step work inside Atlassian (Jira and Confluence) and survive restarts, retries, and failures. So I built an open-source platform around two ideas: MCP for tool access and Temporal for durable execution.

Repo: https://github.com/ahmet-ozel/atlassian-ai-workflow-platform

The problem with one-shot agents

A typical agent loop looks like: read a ticket, decide on an action, call a tool, repeat. This is fine for short tasks. It breaks down when a workflow spans minutes or hours, depends on external systems that fail intermittently, or needs to be resumed after a deploy. If your orchestration lives in a single Python process, any crash means you start over. For business workflows that touch real Jira issues, that is not acceptable.

Why MCP for tools

The Model Context Protocol (MCP) standardizes how an agent discovers and calls tools. Instead of hard-coding Jira API calls into the agent, I expose Jira and Confluence as MCP tools. The agent sees a clean, typed tool surface (create issue, transition status, search, comment, fetch a Confluence page) and the protocol handles the wiring.

The practical benefit is decoupling. I can add or change tools without touching the agent logic, and the same tools work with any MCP-compatible client. It also keeps the agent prompt focused on intent rather than API mechanics.

Why Temporal for orchestration

Temporal gives you durable workflows. The workflow code looks like ordinary Python, but every step is checkpointed. If a worker dies, the workflow resumes from the last completed step on another worker. Retries, timeouts, and backoff are declarative.

This maps perfectly onto agent workflows. Each LLM call and each tool call becomes a Temporal activity. If an LLM provider rate-limits you or a Jira call fails, Temporal retries that single activity instead of replaying the whole reasoning chain. Long-running approvals (wait for a human to review before transitioning a ticket) become a normal part of the workflow instead of a hack.

The tradeoff is added infrastructure. Temporal is one more service to run, and you have to think in terms of deterministic workflow code versus side-effecting activities. For short, stateless tasks it is overkill. For anything that has to be reliable, it pays for itself quickly.

Architecture

The stack ties together a few pieces:

An MCP integration layer that exposes Atlassian tools to the agent
Temporal workers that run the durable workflows and activities
A webhook gateway that turns Jira events into workflow triggers
An admin dashboard plus a Streamlit UI for running and inspecting workflows
Multi-provider LLM support (OpenAI, Anthropic, Gemini, and self-hosted vLLM)

Everything runs in a single Docker Compose stack, so you can bring the whole system up locally and see the moving parts together. Provider choice is config-driven, which makes it easy to swap a hosted model for a local one during development.

What I learned

Separating "what to do" from "how to survive doing it" was the key insight. The agent reasons about intent and picks tools. Temporal owns reliability. MCP owns the tool boundary. Keeping those three responsibilities apart made each one much simpler to reason about and test.

The other lesson: deterministic workflow code is a discipline. Anything non-deterministic (network calls, timestamps, random values) has to live in an activity, not the workflow body. Once that clicked, debugging got a lot easier because the workflow history is a precise, replayable log of what happened.

It currently targets Atlassian, but the tool layer is designed to extend to other platforms.

Feedback welcome

I would like to hear how others handle long-running agent workflows. Are you using Temporal, a queue plus your own state machine, or a custom orchestration loop? And for MCP users: how are you structuring tools when one agent needs access to several systems at once?

Repo and setup instructions: https://github.com/ahmet-ozel/atlassian-ai-workflow-platform

Top comments (1)

Raju Dandigam • Jun 30

I really like the separation you made here: MCP owns the tool boundary, Temporal owns durability, and the agent stays focused on intent. That mental model maps well to how production teams need to reason about agent workflows once retries, approvals, external APIs, and restarts enter the picture. The Temporal workflow history also feels like a strong foundation for replayable debugging, especially when each LLM/tool call is modeled as an activity. I’m exploring similar questions around local-first execution traces for TypeScript agents in agent-inspect, and this kind of durable workflow example is exactly the type of real-world agent use case that makes observability meaningful.