Link: devforth.io/lab/chat
No signup needed. Every model available there can be executed on own hardware with vLLM or similar tool.
This is our playground chat UI where you can test popular open source model for quality, response delay and decoding speed, RAG summarization capabilities and tool calls.
Primarily created for our clients to make decisions and testing open source models on own tasks, but sharing with community as well.
You can also set different levels of reasoning_effort.
Please leave comments if you wish us to add more models or features.
Top comments (2)
Nice playground. One thing I notice when testing models like these: the prompt matters as much as the model, sometimes more. Same Qwen or DeepSeek with a structured prompt vs a vague one — completely different outputs.
Been building around this idea with flompt (flompt.dev, github.com/Nyrok/flompt) — visual prompt builder that forces you to think through role, constraints, examples, output format as separate pieces. Makes model comparisons a lot more meaningful when the prompt baseline is solid.
No signup, multiple models, tools + RAG testing this is a gift to the dev community. Finally a way to compare Qwen vs DeepSeek vs others without spinning up infrastructure. Would love to see latency benchmarks across models next. Thanks for sharing!