Luke Hinds

Posted on Feb 19

MockLLM, a simulated Large Language Model API for development and testing

I had a need for an LLM providing deterministic output to make it easier for me to test, develop against a OpenAI style API endpoint. The result was the project MockLLM

A deterministic output from an LLM API is useful in scenarios where consistency is key such as Unit & Regression Testing and Debugging.

So let's look at example.

Variable handling

What is 5 + 7?

A desired deterministic output, could be:

The sum of 5 and 7 is 12.

However, what if the model occasionally responds with 12 and other times Twelve, it could well break a type based language expecting a integer, or rather I might want to test that such a regression does not occur.

AI Code Generation (e.g., GitHub Copilot, Cursor etc)

We could test for FIM style completion:

def factorial(n):

A desired deterministic output, could then be:

def factorial(n):
    if n == 0:
        return 1
    return n * factorial(n - 1)

Lastly, anyone who has played with trying to get an LLM to conform to providing specific output such as JSON or YAML, without the un-requested "Certainly!..." style LLM persona, will know this one well.

JSON Serialized conformance

How can I reset my password?

Deterministic JSON Output:

{
  "question": "How can I reset my password?",
  "response": "To reset your password, click on 'Forgot Password' on the login page, enter your email, and follow the instructions in the email you receive.",
  "timestamp": "2025-02-19T12:00:00Z",
  "source": "AI Customer Support"
}

Let's have a look how Mock LLM could help us here.

Install and use MockLLM

Create a virtualenv and install mockLLM

python3 -m venv venv && source venv/bin/activate

pip install mockllm

Create a response.yaml

responses:
  "write a python function to calculate factorial": |
    {
      "prompt": "Write a Python function to calculate factorial.",
      "response": "def factorial(n):\n    if n == 0:\n        return 1\n    return n * factorial(n - 1)",
      "language": "Python",
      "version": "1.0",
      "generated_at": "2025-02-19T12:05:00Z"
    }


defaults:
  unknown_response: "I don't know the answer to that."

We can then call the API as we would any other provider

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mock-llm",
    "messages": [
      {"role": "user", "content": "write a python function to calculate factorial"}
    ]
  }' |jq

Note: jq is simple tool to display JSON in a nicer way. Get it here

We then get back a valid payload, complete with token usage, id , timestamp:

{
  "id": "mock-9b4ea717-9e1e-4111-984e-204c0c1d1705",
  "object": "chat.completion",
  "created": 1739961216,
  "model": "mock-llm",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "def factorial(n):\n    if n == 0:\n        return 1\n    return n * factorial(n - 1)"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 14,
    "total_tokens": 22
  }
}

Other provider, there is also support for Anthropic

curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-sonnet-20240229",
    "messages": [
      {"role": "user", "content": "what colour is the sky?"}
    ]
  }'

To add a streamed response, add "stream": true:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mock-llm",
    "messages": [
      {"role": "user", "content": "what colour is the sky?"}
    ],
    "stream": true
  }'

Last of all, if you want some realism, use the Network Lag Simulator

Add the following to your reponses.yml

settings:
  lag_enabled: true
  lag_factor: 5  # Higher values = faster responses (10 = fast, 1 = slow)

Want to learn more and do a lot of LLM based coding, checkout CodeGate for secure AI code generation, model routing, prompt management

DEV Community

MockLLM, a simulated Large Language Model API for development and testing

Variable handling

AI Code Generation (e.g., GitHub Copilot, Cursor etc)

JSON Serialized conformance

Install and use MockLLM

Top comments (0)