DEV Community

Gowsiya Syednoor Shek
Gowsiya Syednoor Shek

Posted on • Edited on

GenAI App with Prompt Templates and Role Switching (Part 4)

In Part 3, we built a FastAPI backend that could talk to a local LLM using Docker Model Runner. In this Part 4, we are moving forward by adding:

  • Prompt Templates
  • Role Switching (system/user roles)
  • A Simple HTML UI - all running on your machine

Why This Matters

Prompt engineering isn't just about text, it's about structure. This app lets you:

  • Try different system roles like "You are a data science tutor" or "You are a sarcastic assistant"
  • Apply templates around your prompt
  • Experiment with how prompt structure changes output

Project Layout

docker-llm-fastapi-app/
├── app/
│   ├── main.py         
│   ├── templates/
│   │   └── index.html
├── Dockerfile
├── docker-compose.yml
Enter fullscreen mode Exit fullscreen mode

How to Run

1. Pull & Run the Model (if not already)

docker model pull ai/mistral
docker model run ai/mistral
Enter fullscreen mode Exit fullscreen mode

Ensure TCP host access is enabled in Docker Desktop’s Beta > Enable Docker Model Runner section.

2. Start the Backend API

docker compose up --build
Enter fullscreen mode Exit fullscreen mode

Open from browser: http://localhost:8000
Full code is available here part4-code


UI Preview

The UI has:

  • A dropdown for system roles
  • A textbox for prompts
  • A Generate button to generate responses

All results appear below your prompt, in the same page.

Here is the page


Why Is It So Slow?

If your prompts are consistently taking more than a minute, it’s likely due to:

  • Model Size: Even relatively small LLMs like Mistral can take time to load into memory and start processing.

  • Cold Starts: Each prompt might be triggering a cold start if the model isn't staying warm between requests.

  • System Resources: Docker Model Runner currently doesn’t optimize for resource constraints, so if your machine lacks sufficient CPU/RAM or an NVIDIA GPU, performance may suffer.

Suggestions to improve speed:

  • Enable host-side TCP support.
  • Keep the model warm by sending periodic “ping” prompts every few minutes.
  • Try a smaller or quantized model (e.g., ai/smollm2 instead of ai/mistral).
  • Upgrade to a system with more memory or GPU support for Docker Desktop.

What’s Next

In Part 5, I’ll wrap up this mini-series with a broader, practical guide: "How to Set Up Your Company for Success with Docker"

This won’t be about just local demos, but will cover topics like:

  • Organizing Your Teams with Docker Organizations
  • Enforcing Sign-In and Enabling SSO
  • Standardizing Docker Desktop Configurations

After that, I will be exploring independent Docker topics in more depth, stay tuned!


Top comments (1)

Collapse
 
salma_aga profile image
Salma Aga Shaik

Great post. I liked the simple UI and easy way to try different prompts and roles.