I had a need for an LLM providing deterministic output to make it easier for me to test, develop against a OpenAI style API endpoint. The result was the project MockLLM
A deterministic output from an LLM API is useful in scenarios where consistency is key such as Unit & Regression Testing and Debugging.
So let's look at example.
Variable handling
What is 5 + 7?
A desired deterministic output, could be:
The sum of 5 and 7 is 12.
However, what if the model occasionally responds with 12
and other times Twelve
, it could well break a type based language expecting a integer, or rather I might want to test that such a regression does not occur.
AI Code Generation (e.g., GitHub Copilot, Cursor etc)
We could test for FIM style completion:
def factorial(n):
A desired deterministic output, could then be:
def factorial(n):
if n == 0:
return 1
return n * factorial(n - 1)
Lastly, anyone who has played with trying to get an LLM to conform to providing specific output such as JSON or YAML, without the un-requested "Certainly!..." style LLM persona, will know this one well.
JSON Serialized conformance
How can I reset my password?
Deterministic JSON Output:
{
"question": "How can I reset my password?",
"response": "To reset your password, click on 'Forgot Password' on the login page, enter your email, and follow the instructions in the email you receive.",
"timestamp": "2025-02-19T12:00:00Z",
"source": "AI Customer Support"
}
Let's have a look how Mock LLM could help us here.
Install and use MockLLM
Create a virtualenv and install mockLLM
python3 -m venv venv && source venv/bin/activate
pip install mockllm
Create a response.yaml
responses:
"write a python function to calculate factorial": |
{
"prompt": "Write a Python function to calculate factorial.",
"response": "def factorial(n):\n if n == 0:\n return 1\n return n * factorial(n - 1)",
"language": "Python",
"version": "1.0",
"generated_at": "2025-02-19T12:05:00Z"
}
defaults:
unknown_response: "I don't know the answer to that."
We can then call the API as we would any other provider
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mock-llm",
"messages": [
{"role": "user", "content": "write a python function to calculate factorial"}
]
}' |jq
Note: jq
is simple tool to display JSON in a nicer way. Get it here
We then get back a valid payload, complete with token usage, id , timestamp:
{
"id": "mock-9b4ea717-9e1e-4111-984e-204c0c1d1705",
"object": "chat.completion",
"created": 1739961216,
"model": "mock-llm",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "def factorial(n):\n if n == 0:\n return 1\n return n * factorial(n - 1)"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 8,
"completion_tokens": 14,
"total_tokens": 22
}
}
Other provider, there is also support for Anthropic
curl -X POST http://localhost:8000/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3-sonnet-20240229",
"messages": [
{"role": "user", "content": "what colour is the sky?"}
]
}'
To add a streamed response, add "stream": true
:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mock-llm",
"messages": [
{"role": "user", "content": "what colour is the sky?"}
],
"stream": true
}'
Last of all, if you want some realism, use the Network Lag Simulator
Add the following to your reponses.yml
settings:
lag_enabled: true
lag_factor: 5 # Higher values = faster responses (10 = fast, 1 = slow)
Want to learn more and do a lot of LLM based coding, checkout CodeGate for secure AI code generation, model routing, prompt management
Top comments (0)