DEV Community

Cover image for I Built My Own DeepSite Alternative That Works with Kimi K2.5, Gemini, or Any Model
Juan Denis
Juan Denis

Posted on

I Built My Own DeepSite Alternative That Works with Kimi K2.5, Gemini, or Any Model

So I got tired of DeepSite.

Don't get me wrong, it's a fun tool. Type a prompt, get a website. But every time I used it I kept running into the same problems. You can't pick the model. You can't self-host it. And there's no quality check on what it generates. You just get whatever comes out and deal with it.

I wanted something where I could plug in whatever model I felt like using that day. Kimi K2.5 just dropped? Let me try it. Gemini got an update? Switch it. Claude writing better code this week? Cool, use that instead.

So I built AgentSite. Open source, self-hosted, and instead of one model doing everything, it uses four AI agents working together like an actual team.

Let me show you what I mean. I just built a travel agency website using Kimi K2.5 and the whole thing took one prompt.

Four Agents, Not One

Here's what happens when you hit generate. Instead of one model trying to plan, design, code, and review all at once, AgentSite splits the work:

Your prompt → PM → Designer → Developer ↔ Reviewer → Done
Enter fullscreen mode Exit fullscreen mode

The PM plans the structure. The Designer builds a color palette and typography system. The Developer writes the actual code. And the Reviewer scores everything and sends it back if it's not good enough.

That last part is the thing I haven't seen anywhere else. The Reviewer agent actually checks accessibility, code quality, and visual consistency. If the score is below 7 out of 10, the Developer gets feedback and fixes it. Up to two rounds of revision. Like an actual code review, but automatic.

Getting It Running

The fastest way is straight from GitHub. There's a Deploy on Railway button right in the README. Click it and Railway handles everything. No terminal, no installing stuff locally.

You can also do pip install agentsite and run it locally if you prefer. But for this demo I deployed to Railway and had a live URL running in a couple minutes.

Setting It Up

Once the app is running you need two things: an API key and a model.

Go to the providers page in settings and paste your key. I added my Moonshot AI key since I wanted to use Kimi K2.5. But you could paste an OpenAI key, Google key, Anthropic key, whatever. You can even add multiple providers and mix them across projects.

Then go to the agents page and set your default model. I set all four agents to moonshot/kimi-k2.5. You could also set different models per agent if you want to get creative, like Kimi for coding and Claude for reviewing.

This is the part DeepSite can't do. New model drops tomorrow? Come back, change the dropdown, done.

Building Ukelele Travels

Alright, the fun part. I created a project called Ukelele Travels, a travel agency that sends people to Venezuela.

The AgentSite dashboard showing my projects including Ukelele Travels

Inside the project you get the detail page where you manage all your pages, see the brand identity, and get an overview of what's been generated.

The Ukelele Travels project detail page

I created a home page, typed my prompt, and let the four agents do their thing. Here's the page builder after generation — chat on the left, live preview on the right.

The Home page builder showing the generated Ukelele Travels site

All of this from one prompt. Kimi K2.5 running through four agents. The PM planned the structure, the Designer picked colors and typography, the Developer wrote the code, and the Reviewer checked it and sent feedback until it passed. No templates, no frameworks, just clean HTML, CSS, and JS.

Tracking Everything

One thing I added that I find really useful is the analytics page. You can see exactly how many tokens each agent used, the cost breakdown, and what happened during each generation.

The analytics page showing token usage and cost breakdown

When you're paying for API calls it's nice to know where the tokens are going. The Reviewer agent barely uses anything compared to the Developer, which makes sense since it's reading and scoring rather than writing full pages of code.

Why I Think This Approach is Better

When you ask one model to do everything, you get a compromise. The planning is okay. The design is generic. The code works but it's messy. And nobody checks it.

When you split the work across specialized agents, each one only has to be good at its job. The PM doesn't need to write CSS. The Developer doesn't pick colors. The Reviewer just reviews. And because each agent gets the output from the previous one as context, everything stays consistent.

I've been running different models through this pipeline for a while now and the multi-agent output is consistently better than any single-prompt approach I've tried. Not a little better, noticeably better.

About Kimi K2.5

Since I used it for this whole demo, some quick notes. Kimi K2.5 is one of the most consistent models I've run through AgentSite. Most models have at least one role where they struggle. Great at coding but the design system is bland. Good at planning but the code output is sloppy.

Kimi K2.5 was solid across all four agents. It's also fast, which matters when you're running four sequential agents plus revision loops. And it never broke the structured JSON schema that the agents use to pass data between each other. Some models hallucinate extra keys or mess up the nesting. Kimi didn't.

That said, the whole point of AgentSite is that you're not married to any model. I built Ukelele Travels with Kimi K2.5 today. Tomorrow I might use Gemini or Claude for the next project. Just change the dropdown.

Try It Yourself

GitHub: github.com/jhd3197/AgentSite

One-click deploy buttons for Railway, Render, and Heroku right in the README. Or pip install agentsite if you want it local.

It's MIT licensed and free. You just bring your own API keys.

If you try it out let me know what model you used and what you built. I'm genuinely curious to see what people come up with.

Top comments (4)

Collapse
 
bhavin-allinonetools profile image
Bhavin Sheth

Love this idea of splitting the work between PM → Designer → Developer → Reviewer instead of forcing one model to do everything. The reviewer loop is especially smart — it feels much closer to how real teams work. Also really cool that you can swap models anytime without changing the pipeline. Great build and very practical for anyone experimenting with AI site generation.

Collapse
 
aniruddhaadak profile image
ANIRUDDHA ADAK

Brilliant!

Collapse
 
aitrandingprompt profile image
Azhar Mehmood

This is an incredible approach! 🚀
‎> Splitting the work between PM → Designer → Developer → Reviewer really mirrors how real teams function, and the automated reviewer loop is genius — it ensures quality and consistency that a single model often misses.
‎>
‎> I especially love that you can swap mode****ls anytime without breaking the pipeline. It makes experimentation and iteration so much easier.
‎>
‎> I’ve been exploring similar multi-agent workflows for AI-driven projects, and seeing the structured JSON communication between agents is a game-changer. Curious — have you tried combining different models for each agent, like Kimi for coding and Gemini for reviewing? Would love to see how the output differs!

Collapse
 
peacebinflow profile image
PEACEBINFLOW

This is a really clean way of articulating a problem a lot of us feel but don’t always name.

The part that resonates most isn’t “multi-agent” as a buzzword, it’s the separation of intent, taste, execution, and judgment. That PM → Designer → Developer → Reviewer flow mirrors how good teams actually work — and more importantly, where single-prompt tools usually collapse. Asking one model to plan, design, code, and self-critique is basically asking it to argue with itself and somehow win.

The Reviewer agent is the real differentiator here. Most site generators stop at “it runs.” Introducing an explicit quality loop — accessibility, consistency, basic standards — changes the output from acceptable to defensible. That’s the difference between a demo and something you could actually ship or maintain.

I also like that you didn’t lock this to a model. The industry moves too fast for that kind of coupling. Treating models as interchangeable workers instead of the product itself feels like the right abstraction layer — especially when different models peak in different roles.

The analytics piece is underrated too. Once people start paying real money for tokens, visibility becomes part of UX. Knowing where cost and effort are actually going makes experimentation sustainable instead of reckless.

Overall this feels less like “AI builds websites” and more like AI simulates a small team with guardrails. That framing matters. Curious to see how this evolves once people start pushing it with larger sites or more opinionated design constraints — that’s where this architecture should really shine.

Solid work. This is one of the more thoughtful takes I’ve seen in this space.