DEV Community

Luke Hinds
Luke Hinds

Posted on

3 1

MockLLM, a simulated Large Language Model API for development and testing

Project Logo

I had a need for an LLM providing deterministic output to make it easier for me to test, develop against a OpenAI style API endpoint. The result was the project MockLLM

A deterministic output from an LLM API is useful in scenarios where consistency is key such as Unit & Regression Testing and Debugging.

So let's look at example.

Variable handling

What is 5 + 7?

A desired deterministic output, could be:

The sum of 5 and 7 is 12.

However, what if the model occasionally responds with 12 and other times Twelve, it could well break a type based language expecting a integer, or rather I might want to test that such a regression does not occur.

AI Code Generation (e.g., GitHub Copilot, Cursor etc)

We could test for FIM style completion:

def factorial(n):

A desired deterministic output, could then be:

def factorial(n):
    if n == 0:
        return 1
    return n * factorial(n - 1)
Enter fullscreen mode Exit fullscreen mode

Lastly, anyone who has played with trying to get an LLM to conform to providing specific output such as JSON or YAML, without the un-requested "Certainly!..." style LLM persona, will know this one well.

JSON Serialized conformance

How can I reset my password?

Deterministic JSON Output:

{
  "question": "How can I reset my password?",
  "response": "To reset your password, click on 'Forgot Password' on the login page, enter your email, and follow the instructions in the email you receive.",
  "timestamp": "2025-02-19T12:00:00Z",
  "source": "AI Customer Support"
}
Enter fullscreen mode Exit fullscreen mode

Let's have a look how Mock LLM could help us here.

Install and use MockLLM

Create a virtualenv and install mockLLM

python3 -m venv venv && source venv/bin/activate
Enter fullscreen mode Exit fullscreen mode
pip install mockllm
Enter fullscreen mode Exit fullscreen mode

Create a response.yaml

responses:
  "write a python function to calculate factorial": |
    {
      "prompt": "Write a Python function to calculate factorial.",
      "response": "def factorial(n):\n    if n == 0:\n        return 1\n    return n * factorial(n - 1)",
      "language": "Python",
      "version": "1.0",
      "generated_at": "2025-02-19T12:05:00Z"
    }


defaults:
  unknown_response: "I don't know the answer to that."
Enter fullscreen mode Exit fullscreen mode

We can then call the API as we would any other provider

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mock-llm",
    "messages": [
      {"role": "user", "content": "write a python function to calculate factorial"}
    ]
  }' |jq
Enter fullscreen mode Exit fullscreen mode

Note: jq is simple tool to display JSON in a nicer way. Get it here

We then get back a valid payload, complete with token usage, id , timestamp:

{
  "id": "mock-9b4ea717-9e1e-4111-984e-204c0c1d1705",
  "object": "chat.completion",
  "created": 1739961216,
  "model": "mock-llm",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "def factorial(n):\n    if n == 0:\n        return 1\n    return n * factorial(n - 1)"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 14,
    "total_tokens": 22
  }
}
Enter fullscreen mode Exit fullscreen mode

Other provider, there is also support for Anthropic

curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-sonnet-20240229",
    "messages": [
      {"role": "user", "content": "what colour is the sky?"}
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

To add a streamed response, add "stream": true:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mock-llm",
    "messages": [
      {"role": "user", "content": "what colour is the sky?"}
    ],
    "stream": true
  }'
Enter fullscreen mode Exit fullscreen mode

Last of all, if you want some realism, use the Network Lag Simulator

Add the following to your reponses.yml

settings:
  lag_enabled: true
  lag_factor: 5  # Higher values = faster responses (10 = fast, 1 = slow)
Enter fullscreen mode Exit fullscreen mode

Want to learn more and do a lot of LLM based coding, checkout CodeGate for secure AI code generation, model routing, prompt management

Sentry image

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay