Mrunmay

Posted on Dec 12, 2025 • Edited on Dec 13, 2025 • Originally published at vllora.dev

Pause, Inspect, Edit: Debugging LLM Requests in vLLora

#rust #ai #opensource #agents

LLMs behave like black boxes. You send them a request, hope the prompt is right, hope your agent didn't mutate it, hope the framework packaged it correctly — and then hope the response makes sense.
In simple one-shot queries this usually works fine. But when you're building agents, tools, multi-step workflows, or RAG pipelines, it becomes very hard to see what the model is actually receiving. A single unexpected message, parameter, or system prompt change can shift the entire run.

Today we're introducing breakpoint debugging for LLM requests in vLLora that makes this visible — and editable.

Here’s what debugging looks like in practice:

Breakpoint Debugging for LLM Requests

vLLora now supports interactive, breakpoint-style debugging for LLM requests. When debugging is enabled, every request pauses before it reaches the model.

You can:

Inspect the exact request
Edit anything
Continue execution normally

This brings a familiar software-engineering workflow ("pause -> inspect -> edit -> continue") to LLM development.

Why We Built This

If you've built anything beyond a simple chat interface, you've likely hit one of these:

Silent tool-call failures (wrong name / bad params / malformed JSON)
Overloaded or corrupted context / RAG input leading to hallucination or truncation
Error accumulation and state drift in long or multi-step workflows
Lack of visibility: standard logs rarely show the actual request sent to the model

It is difficult to fix these issues without proper observability. Breakpoint debugging changes that.

What Happens When a Request Pauses

Here's what it looks like when vLLora intercepts a request right before it's sent:

You get a real-time snapshot of:

The selected model
Full message array (system, user, assistant)
Parameters like temperature or max tokens
Any tool definitions
Any extra fields and headers your framework injected

This is the full request payload your application is about to send — not what you assume it's sending.

Edit Anything

Click Edit and the payload becomes modifiable:

You can adjust:

Message content
System prompts
Model name
Parameters
Tool definitions
Metadata

This affects only the current request. Your application code stays untouched.

It's a fast way to validate fixes, test ideas, and confirm what the agent should have sent.

Continue the Workflow

When you click Continue, vLLora:

Sends your edited request to the model
Receives the real response
Passes it back to your application
Resumes the workflow as if nothing unusual happened

After you click Continue, the workflow proceeds using the response from your edited request. The agent treats it the same way it would treat any normal response from the model.

Why This Matters for Agents

Agents are long-running chains of decisions. Each step can depend on the previous one, and each step can affect the next. Once you're 15 steps deep, you might not know whether:

The prompt changed
A system message was overwritten
A parameter was set differently than expected
The context blew up
A tool schema got mutated

With breakpoint debugging:

You catch drift early
You see exactly what the model receives
You fix issues in seconds
You avoid rerunning long multi-step workflows
You test prompt or parameter changes instantly

For deep agents, debugging becomes 10x easier.

Closing Thoughts

Debugging LLM systems has been mostly tedious. Breakpoint mode gives you a clear view into what’s happening and a way to correct issues as they occur.

If you need to understand or fix what an agent is sending, this is the most direct way to do it.

Read the docs: Debugging LLM Requests

Try it locally: Quickstart

Join Community: https://join.slack.com/t/vllora/shared_invite/zt-2haf5kj6a-d7NX6TFJUPX45w~Ag4dzlg