I had a need for an LLM providing deterministic output to make it easier for me to test, develop against a OpenAI style API endpoint. The result was the project MockLLM
A deterministic output from an LLM API is useful in scenarios where consistency is key such as Unit & Regression Testing and Debugging.
So let's look at example.
Variable handling
What is 5 + 7?
A desired deterministic output, could be:
The sum of 5 and 7 is 12.
However, what if the model occasionally responds with 12 and other times Twelve, it could well break a type based language expecting a integer, or rather I might want to test that such a regression does not occur.
AI Code Generation (e.g., GitHub Copilot, Cursor etc)
We could test for FIM style completion:
def factorial(n):
A desired deterministic output, could then be:
def factorial(n):
    if n == 0:
        return 1
    return n * factorial(n - 1)
Lastly, anyone who has played with trying to get an LLM to conform to providing specific output such as JSON or YAML, without the un-requested "Certainly!..." style LLM persona, will know this one well.
JSON Serialized conformance
How can I reset my password?
Deterministic JSON Output:
{
  "question": "How can I reset my password?",
  "response": "To reset your password, click on 'Forgot Password' on the login page, enter your email, and follow the instructions in the email you receive.",
  "timestamp": "2025-02-19T12:00:00Z",
  "source": "AI Customer Support"
}
Let's have a look how Mock LLM could help us here.
Install and use MockLLM
Create a virtualenv and install mockLLM
python3 -m venv venv && source venv/bin/activate
pip install mockllm
Create a response.yaml
responses:
  "write a python function to calculate factorial": |
    {
      "prompt": "Write a Python function to calculate factorial.",
      "response": "def factorial(n):\n    if n == 0:\n        return 1\n    return n * factorial(n - 1)",
      "language": "Python",
      "version": "1.0",
      "generated_at": "2025-02-19T12:05:00Z"
    }
defaults:
  unknown_response: "I don't know the answer to that."
We can then call the API as we would any other provider
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mock-llm",
    "messages": [
      {"role": "user", "content": "write a python function to calculate factorial"}
    ]
  }' |jq
Note: jq is simple tool to display JSON in a nicer way. Get it here
We then get back a valid payload, complete with token usage, id , timestamp:
{
  "id": "mock-9b4ea717-9e1e-4111-984e-204c0c1d1705",
  "object": "chat.completion",
  "created": 1739961216,
  "model": "mock-llm",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "def factorial(n):\n    if n == 0:\n        return 1\n    return n * factorial(n - 1)"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 14,
    "total_tokens": 22
  }
}
Other provider, there is also support for Anthropic
curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-sonnet-20240229",
    "messages": [
      {"role": "user", "content": "what colour is the sky?"}
    ]
  }'
To add a streamed response, add "stream": true:
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mock-llm",
    "messages": [
      {"role": "user", "content": "what colour is the sky?"}
    ],
    "stream": true
  }'
Last of all, if you want some realism, use the Network Lag Simulator
Add the following to your reponses.yml
settings:
  lag_enabled: true
  lag_factor: 5  # Higher values = faster responses (10 = fast, 1 = slow)
Want to learn more and do a lot of LLM based coding, checkout CodeGate for secure AI code generation, model routing, prompt management

    
Top comments (0)