If your LLM endpoint times out, dashboards alone rarely help. What you need is a fast path from symptom to cause.
This post shows a small .NET lab where you can force a controlled 504 and debug it with a repeatable metrics -> trace -> logs workflow. The stack is ASP.NET Core, Blazor, .NET Aspire, Ollama, and OpenTelemetry, and the goal is practical: reduce time-to-diagnosis before you ship.
Here’s the core idea: observability is not dashboards. It is time-to-diagnosis.
I built this because I have already lost too much time staring at logs without a reliable way to correlate logs, traces, and metrics. For this post, an “LLM workload” means an endpoint where tail latency and failures often come from a model call plus prompt or tool changes, not just your HTTP handler.
This post is repo-first and uses the companion repository directly:
- Repo: minimal-llm-observability
- It includes a Blazor UI to trigger healthy, delay, timeout, and real model-call scenarios.
The Stack in One Minute
- ASP.NET Core API — a small request surface that I can instrument end-to-end without noise.
- Blazor Web UI — one-click healthy, delay, timeout, and real model-call scenarios.
- .NET Aspire AppHost — local orchestration plus the Aspire Dashboard for fast pivoting.
-
Ollama (
ollama/ollama:0.16.3) — real local model-call behavior without cloud token cost. - OpenTelemetry — logs tell me what, traces tell me where, metrics tell me how often.
The point is simple: one local environment where I can trigger failure and observe it end-to-end without guessing.
Why LLM Timeouts Feel Different
- Prompt changes are deployments: the code may stay the same, but latency and failure modes can change.
- Model and runtime changes can shift tail latency.
- Tool or dependency calls amplify variance — one slow call can become a timeout.
Minimum Correlation Fields
To keep triage fast, I want a few fields to exist everywhere:
-
run_idto follow one request lifecycle -
trace_idto follow execution across spans and services -
prompt_versionto tie behavior to prompt changes -
tool_versionto tie failures to integration changes
How Correlation Should Look
POST /ask -> trace_id in the trace span -> run_id + trace_id in logs -> timeout metric increases
Naming convention I use:
- snake_case in logs and JSON:
run_id,trace_id,prompt_version,tool_version - camelCase in C# variables:
runId,traceId,promptVersion,toolVersion
Example log line:
timeout during /ask run_id=9f0f2f3a6fdd4f5f9e9a1f4d8f6c6f3e trace_id=4c4f3b2e86d4d6a6b1f69a0d9d0d9f0a prompt_version=v1 tool_version=local-llm-v1
If one link in that chain is missing, triage slows down immediately.
What the Debugging Flow Looks Like
In practice, the drill looks like this:
- Click
Simulated Timeout (504)in the Web UI. - Open Aspire Metrics and confirm
llm_timeouts_totalincreased. - Jump to Traces and open the failing
llm.run. - Copy the
trace_id, then pivot to logs and filter bytrace_idorrun_id. - Check whether the failure lines up with a specific
prompt_versionortool_version.
That is the whole point of the lab: move from a timeout symptom to a likely cause in a few deliberate steps instead of guessing.
Prerequisites
- Docker Desktop or Docker Engine installed and running
- The .NET SDK from the repo’s
global.jsoninstalled - Aspire workload installed if required by your setup:
dotnet workload install aspire
- Local ports available (or adjust launch settings):
18888,18889,11434 - If you use the stable API port appendix, you also need
17100free
Step 1 — Clone and Run the Repository
git clone https://github.com/ovnecron/minimal-llm-observability.git
cd minimal-llm-observability
dotnet run --project LLMObservabilityLab.AppHost/LLMObservabilityLab.AppHost.csproj
Open the Aspire Dashboard URL printed in the terminal. If you see an auth prompt, use the one-time URL from the terminal.
This repo uses fixed local HTTP launch settings:
- Aspire Dashboard:
http://localhost:18888 - OTLP endpoint (Aspire Dashboard):
http://localhost:18889 - Web UI (
LLMObservabilityLab.Web): open it from the Aspire Dashboard resource list - Unsecured local transport is already enabled in the AppHost launch profile with
ASPIRE_ALLOW_UNSECURED_TRANSPORT=true
If you already run Ollama locally on 11434, stop it or change the container port mapping in AppHost.
If Real Ollama Call returns “model not found”, pull the default model in the running container:
docker exec -it "$(docker ps --filter "name=local-llm" --format "{{.Names}}" | head -n 1)" \
ollama pull llama3.2:1b
Step 2 — Trigger Scenarios in the Web UI
Open Aspire Dashboard -> Resources -> click the web-ui endpoint.
The root page in LLMObservabilityLab.Web gives you one-click actions:
- Healthy Run
- Simulate Delay
- Real Ollama Call
- Simulated Timeout (504)
Each run shows:
run_idtrace_id- status
- elapsed time
The Web UI also includes /drill with the fixed 15-minute triage checklist.
Step 3 — Generate a Healthy Baseline (Optional)
Click Healthy Run around 20 times in the Web UI.
This gives you a quick baseline in llm_runs_total, llm_success_total, and llm_latency_ms before you force a timeout.
Step 4 — Force a Timeout and Triage It
Use the Simulated Timeout (504) button in the Web UI, then move directly to the Aspire Dashboard.
That action returns a controlled 504 so you can exercise the observability pipeline on demand.
My triage loop (target: about 15 minutes in this lab):
- Spot: check
llm_timeouts_totalin Metrics - Drill: open the failing
llm.runtrace - Pivot: filter logs by
trace_idandrun_id - Inspect: compare
prompt_versionandtool_version - Mitigate: apply the smallest safe fix first
- Verify: rerun the timeout scenario and confirm recovery
A simple flow to follow:
- Metrics -> check
llm_latency_msfor the spike - Traces -> filter
scenario=simulate_timeout - Open the failing
llm.run
Minimal Signals I Use to Make Fast Decisions
Directly emitted by this repo:
llm_runs_totalllm_success_totalllm_timeouts_totalllm_errors_totalllm_latency_ms
A derived metric:
task_success_rate = llm_success_total / llm_runs_total * 100
Starter alert heuristics (these are seeds — tune them to your baseline):
-
task_success_ratedrops by more than 5 percentage points in 30 minutes - latency percentile degradation (derived from
llm_latency_ms) rises more than 30% over baseline - tool-version-scoped success (derived from runs tagged with
tool_version) falls below 90%
Troubleshooting
- Port 11434 already in use: stop local Ollama or change the AppHost port mapping
- No traces or metrics: verify the Aspire Dashboard is running and the OTLP endpoint is reachable
-
Model not found: run the
ollama pull ...command inside the container -
CLI or API calls fail: copy the exact API endpoint from the Aspire Dashboard (
llm-api-> Endpoints)
Verified vs Opinion
This section matters because observability advice often mixes hard facts with personal workflow.
Verified (reproducible in this repo):
- the scenarios (healthy, delay, timeout, real call) are triggered from the Web UI
- the correlation chain exists: metric counters ->
llm.runtraces -> logs withrun_idandtrace_id
Opinion (works well for me, but tune as needed):
- the “15-minute” target loop
- the alert thresholds above (they are starter seeds, not universal truth)
- the exact four correlation fields (add more if your system needs them)
Final Thoughts
The goal is not perfect dashboards. It is shrinking time-to-diagnosis.
If you cannot pivot from a timeout to the exact trace and log lines, you are still guessing.
I used this lab to find a workflow that works for me, and I hope it helps you build an observability pipeline that works for you.
If you run into an issue, open a GitHub issue and I will be happy to help.
Top comments (0)