LLMs behave like black boxes. You send them a request, hope the prompt is right, hope your agent didn't mutate it, hope the framework packaged it correctly — and then hope the response makes sense.
In simple one-shot queries this usually works fine. But when you're building agents, tools, multi-step workflows, or RAG pipelines, it becomes very hard to see what the model is actually receiving. A single unexpected message, parameter, or system prompt change can shift the entire run.
Today we're introducing breakpoint debugging for LLM requests in vLLora that makes this visible — and editable.
Here’s what debugging looks like in practice:
Breakpoint Debugging for LLM Requests
vLLora now supports interactive, breakpoint-style debugging for LLM requests. When debugging is enabled, every request pauses before it reaches the model.
You can:
- Inspect the exact request
- Edit anything
- Continue execution normally
This brings a familiar software-engineering workflow ("pause -> inspect -> edit -> continue") to LLM development.
Why We Built This
If you've built anything beyond a simple chat interface, you've likely hit one of these:
- Silent tool-call failures (wrong name / bad params / malformed JSON)
- Overloaded or corrupted context / RAG input leading to hallucination or truncation
- Error accumulation and state drift in long or multi-step workflows
- Lack of visibility: standard logs rarely show the actual request sent to the model
It is difficult to fix these issues without proper observability. Breakpoint debugging changes that.
What Happens When a Request Pauses
Here's what it looks like when vLLora intercepts a request right before it's sent:
You get a real-time snapshot of:
- The selected model
- Full message array (system, user, assistant)
- Parameters like temperature or max tokens
- Any tool definitions
- Any extra fields and headers your framework injected
This is the full request payload your application is about to send — not what you assume it's sending.
Edit Anything
Click Edit and the payload becomes modifiable:
You can adjust:
- Message content
- System prompts
- Model name
- Parameters
- Tool definitions
- Metadata
This affects only the current request. Your application code stays untouched.
It's a fast way to validate fixes, test ideas, and confirm what the agent should have sent.
Continue the Workflow
When you click Continue, vLLora:
- Sends your edited request to the model
- Receives the real response
- Passes it back to your application
- Resumes the workflow as if nothing unusual happened
After you click Continue, the workflow proceeds using the response from your edited request. The agent treats it the same way it would treat any normal response from the model.
Why This Matters for Agents
Agents are long-running chains of decisions. Each step can depend on the previous one, and each step can affect the next. Once you're 15 steps deep, you might not know whether:
- The prompt changed
- A system message was overwritten
- A parameter was set differently than expected
- The context blew up
- A tool schema got mutated
With breakpoint debugging:
- You catch drift early
- You see exactly what the model receives
- You fix issues in seconds
- You avoid rerunning long multi-step workflows
- You test prompt or parameter changes instantly
For deep agents, debugging becomes 10x easier.
Closing Thoughts
Debugging LLM systems has been mostly tedious. Breakpoint mode gives you a clear view into what’s happening and a way to correct issues as they occur.
If you need to understand or fix what an agent is sending, this is the most direct way to do it.
Read the docs: Debugging LLM Requests
Try it locally: Quickstart
Join Community: https://join.slack.com/t/vllora/shared_invite/zt-2haf5kj6a-d7NX6TFJUPX45w~Ag4dzlg




Top comments (0)