Arun Rao

Posted on Apr 1

Strands Temporal Agent: Building an AI-Powered Docker Monitor with Temporal, AWS Bedrock & Ollama

#agents #ai #docker #monitoring

Introduction
What if you could monitor your Docker containers just by typing plain English commands like “show nginx logs” or “is redis healthy?” — and have the system figure out the rest automatically?

That’s exactly what I built with Strands Temporal Agent — an AI-powered Docker container health monitoring system that combines local LLMs, cloud AI, and fault-tolerant workflows to make container management as simple as having a conversation.

In this post, I’ll walk you through what I built, the tech stack I used, the challenges I hit, and what I learned along the way.

What is Strands Temporal Agent?
Strands temporal agents is a Docker monitoring agent that:

Accepts natural language commands from the user
Uses AI to parse and route those commands to the right Docker operation
Executes operations like status checks, health monitoring, log retrieval, and container restarts
Handles failures automatically with retry policies and exponential backoff powered by Temporal
Instead of memorizing Docker CLI commands, you just type what you want — and the system handles the rest.

The Tech Stack
Here’s everything I used to build this project:

ToolPurposePythonCore application logicTemporalFault-tolerant workflow orchestrationDockerContainer managementOllama + LLaMA 3Local LLM for AI orchestrationAmazon BedrockCloud AI capabilities via AWSAWS CLI + IAMSecure AWS authentication

System Architecture
The system works in three layers:

User Input (natural language)
↓
AI Orchestrator Activity
(parses intent → generates operation plan)
↓
Temporal Workflow
(executes operations with retry policies)
↓
Docker Activities
(status / health / logs / restart)
When you type “show nginx logs”, here’s what happens behind the scenes:

The input goes to ai_orchestrator_activity
The AI parses it and returns a plan: logs:nginx:100
Temporal’s workflow engine picks up the plan
get_container_logs_activity executes with retry logic
The result is returned to you
Step 1 — Setting Up the Environment
The first step was getting all the tools installed and configured on my local Windows machine.

Ollama + LLaMA 3

bash

Install Ollama from https://ollama.com

ollama pull llama3
ollama run llama3
Docker

Downloaded Docker Desktop from docker.com and verified the installation:

bash

docker --version
docker run hello-world
Python Environment

bash

python -m venv venv
venv\Scripts\activate
pip install temporalio docker boto3
AWS CLI + IAM Setup

bash

Install AWS CLI, then configure with IAM credentials

aws configure

Enter: Access Key ID, Secret Access Key, Region

Step 2 — Building the Temporal Activities
Temporal is the backbone of this project. It ensures that even if something fails midway, the workflow retries automatically without losing progress.

I built 5 core activities:

AI Orchestrator Activity Parses natural language and returns an operation plan:

python

@activity.defn
async def ai_orchestrator_activity(task: str) -> str:
"""Parses natural language into Docker operation plan."""
# e.g. "show nginx logs" → "logs:nginx:100"
# e.g. "restart postgres" → "restart:postgres"
# e.g. "is redis healthy?" → "health:redis"

Get Container Status Activity

python

@activity.defn
async def get_container_status_activity(filter_by: str = None) -> str:
"""Returns status of all or filtered containers."""

Check Container Health Activity

python

@activity.defn
async def check_container_health_activity(container_name: str = None) -> str:
"""Checks health of specific or all running containers."""

Get Container Logs Activity

Become a Medium member
python

@activity.defn
async def get_container_logs_activity(container_name: str, lines: int = 100) -> str:
"""Retrieves last N lines of container logs."""

Restart Container Activity

python

@activity.defn
async def restart_container_activity(container_name: str) -> str:
"""Restarts a specified container."""
Step 3 — The Temporal Workflow
The workflow ties everything together. It calls the AI orchestrator first, then executes each operation in the plan:

python

@workflow.defn
class DockerMonitorWorkflow:
@workflow.run
async def run(self, task: str) -> str:
# Step 1: Get AI-generated operation plan
plan = await workflow.execute_activity(
ai_orchestrator_activity,
task,
start_to_close_timeout=timedelta(seconds=15),
retry_policy=RetryPolicy(maximum_attempts=2)
)
# Step 2: Execute each operation
results = []
for operation_spec in plan.split(','):
result = await self._execute_operation(operation_spec)
results.append(result)
return "\n\n".join(results)

Each activity has its own **retry policy** tuned to the operation type. For example, restarts get 5 retry attempts with 30-second timeouts, while status checks only need 3 attempts with 10-second timeouts.
---
### Step 4 — The AI Parser
This was the most interesting part to build. The AI orchestrator needed to understand varied natural language inputs and map them to structured operation strings.
For example:

"show nginx logs" → "logs:nginx:100"
"analyze nginx logs" → "logs:nginx:100"
"is redis healthy?" → "health:redis"
"check nginx health and show logs" → "health:nginx,logs:nginx:100"
"restart my postgres container" → "restart:postgres"
The key challenge was building a stop-words filter that ignores filler words and correctly extracts the container name:

python

STOP_WORDS = {
"restart", "container", "the", "please", "a", "an",
"can", "you", "my", "show", "logs", "log", "check",
"health", "healthy", "analyze", "fetch", "get", ...
}
def find_container(words):
for w in words:
if w and w not in STOP_WORDS and not w.isdigit():
return w
return None
Challenges I Faced

SyntaxError — Unterminated Triple-Quoted String

The most frustrating bug was a SyntaxError caused by leftover code fragments from an earlier version of the file. The old LLM-based orchestrator code wasn't fully removed, leaving orphaned text that Python couldn't parse. Lesson learned: always verify syntax with:

bash

python -c "import ast; ast.parse(open('file.py').read()); print('Syntax OK')"

**2. Dead Code — Unreachable Combined Block**
My original parser had a COMBINED section for handling queries like *"check nginx health and show logs"* — but it was placed after standalone health and log blocks that always returned first. The combined block was completely unreachable. I fixed this by merging all three into a single unified block.
**3. Stop-Words Too Narrow**
Words like `"analyze"` and `"fetch"` weren't in my stop-words list, so they were being grabbed as container names. *"Analyze nginx logs"* was routing to `logs:analyze:100` instead of `logs:nginx:100`. The fix was expanding the stop-words set with common action verbs.
**4. Error Strings Being Executed as Operations**
When no container name was found, my parser returned `"Error: logs requires a container name"`. The workflow then tried to execute `"error"` as an operation name, returning `Unknown operation: error`. The fix was to fall back to `"status"` instead of returning an error string.
---
### The Result
After all fixes, the system works smoothly:

Enter task: show nginx logs
→ logs:nginx:100 executed ✅
Enter task: is redis healthy?
→ health:redis executed ✅
Enter task: restart postgres
→ restart:postgres executed ✅
Enter task: check nginx health and show logs
→ health:nginx,logs:nginx:100 executed ✅
The Temporal UI at localhost:8233 shows every workflow execution with full event history, activity timelines, inputs, outputs, and retry attempts — making debugging incredibly easy.

Key Takeaways
Temporal is a game-changer for reliability. Writing retry logic manually is error-prone and tedious. Temporal handles it declaratively — you just define the policy and it takes care of the rest.

Local LLMs are surprisingly capable. Running LLaMA 3 locally via Ollama gave me a fully offline AI layer with no API costs and no latency from cloud round-trips.

NLP parsing is harder than it looks. Even simple rule-based parsing has edge cases. Real user input is messy — abbreviations, typos, extra words, unexpected word order. Always test with varied inputs.

Always verify syntax programmatically. A one-liner AST check would have saved me an hour of debugging.

What’s Next
Scheduled health checks using Temporal’s cron scheduling
Alerting when a container goes unhealthy
Web dashboard to visualize container status in real time
Kubernetes support — extend beyond Docker to K8s pods
Source Code
The full source code is available on GitHub: https://github.com/Arun12415/strands-temporal-agents.git

Final Thoughts
Strands temporal agents started as a learning project and turned into something genuinely useful. If you’re exploring Temporal, local LLMs, or Docker automation — I hope this post gives you a solid starting point.

Have questions or suggestions? Drop them in the comments — I’d love to hear your thoughts!

Tags: #Python #Docker #AWS #Temporal #LLM #Ollama #AmazonBedrock #DevOps #AI #LLaMA