GenAI App with Prompt Templates and Role Switching (Part 4)

In Part 3, we built a FastAPI backend that could talk to a local LLM using Docker Model Runner. In this Part 4, we are moving forward by adding:

Prompt engineering isn't just about text, it's about structure. This app lets you:

Try different system roles like "You are a data science tutor" or "You are a sarcastic assistant"
Apply templates around your prompt
Experiment with how prompt structure changes output

docker-llm-fastapi-app/
├── app/
│   ├── main.py         
│   ├── templates/
│   │   └── index.html
├── Dockerfile
├── docker-compose.yml

docker model pull ai/mistral
docker model run ai/mistral

Ensure TCP host access is enabled in Docker Desktop’s Beta > Enable Docker Model Runner section.

docker compose up --build

Open from browser: http://localhost:8000
Full code is available here part4-code

The UI has:

All results appear below your prompt, in the same page.

If your prompts are consistently taking more than a minute, it’s likely due to:

Model Size: Even relatively small LLMs like Mistral can take time to load into memory and start processing.
Cold Starts: Each prompt might be triggering a cold start if the model isn't staying warm between requests.
System Resources: Docker Model Runner currently doesn’t optimize for resource constraints, so if your machine lacks sufficient CPU/RAM or an NVIDIA GPU, performance may suffer.