DEV Community

Cover image for How I Accidentally Built an LLM Orchestration System in the Browser
Anton Minin Baranovskii
Anton Minin Baranovskii

Posted on • Originally published at antonmb.com

How I Accidentally Built an LLM Orchestration System in the Browser

Two years ago, I built Litseller.

At that moment, I was not thinking about LLM orchestration or the architecture of such systems. I was simply solving a specific problem: how to quickly generate structured content for a book catalog.

Looking back now, I understand that it was essentially a full LLM orchestration system. It was just implemented not on the backend, but directly in the browser.

What It Actually Was

It is important to frame this correctly.

Litseller was not an LLM service.

It was a classic web application with a catalog, on top of which I embedded an LLM as a data generation tool inside the admin area.

The entire orchestration happened in a simple flow: Editor UI, GPT API, JSON, validation, backend save.

  • No queues.
  • No workers.
  • No server-side orchestration.
  • No complex infrastructure.

Everything was built on React, prompts, and chains of requests.

Architecture

The system was split into three layers.

  • Frontend: Next.js admin area, editor, and all LLM logic.
  • Backend: .NET API, validation, and persistence.
  • Storage: SQL Server and S3.

The LLM logic did not live in the backend at all. Calls were made directly from the browser.

Generation Pipeline

Instead of one large request, I built a pipeline.

  1. Check whether the model knows the book.
  2. Clarify the title if needed.
  3. Select a category.
  4. Generate the main information.
  5. Select content blocks such as summary, quotes, themes, and other sections.
  6. Generate each block separately.
  7. Assemble the JSON.
  8. Translate the content into other languages.
  9. Validate the result.
  10. Save the final data.

This was already real orchestration, just without a separate orchestration service.

Why Blocks Worked Well

Generating content by blocks was one of the strongest decisions.

I did not ask the model to generate the entire book page at once.

  • Summary was generated separately.
  • Characters were generated separately.
  • Quotes were generated separately.
  • Themes were generated separately.

This gave me better quality control, the ability to regenerate individual parts, more stable JSON, and fewer errors.

In practice, it became a manual version control layer over LLM output.

Prompt Engineering

The prompts were simple, but structured.

  • Strict JSON format.
  • Clear instructions.
  • Minimal magic.

Context was passed explicitly: title, author, categories, language, and previous blocks.

  • No conversation memory.
  • No complex state.
  • No tool calling.

The Most Controversial Decision

The API key was stored in localStorage.

The reason was simple: the editor was not a public interface, there were very few users, the priority was to launch quickly, and backend orchestration would have made the system much more complex.

It was a conscious decision. The risks were understood and accepted.

Additionally, access was restricted and protected manually through Cloudflare.

Weaknesses

Looking at it honestly today, the weaknesses are clear.

  • API key on the client.
  • No centralized rate limiting.
  • No centralized control over calls.
  • No retry or backoff mechanism.
  • No proper schema validation.
  • Weak error handling.
  • No observability.
  • Too much complex logic in the browser.

It was a strong MVP, but not a production-grade LLM platform.

Strengths

At the same time, the system worked and produced real results.

  • Very fast development.
  • Minimal infrastructure.
  • High flexibility.
  • Full control through the UI.
  • Convenient manual refinement.
  • Modular generation.
  • Real production workflow.

The LLM was a tool, not the core of the system.

What I Would Do Differently Today

If I were building it again today, I would change the architecture.

  • Move LLM calls into a backend gateway.
  • Remove the API key from the client.
  • Add queues and retries.
  • Introduce strict JSON schema validation.
  • Add logging and tracing.
  • Implement rate limiting.
  • Separate orchestration from the UI.

Main Takeaway

The most interesting part for me is that I did not design an LLM system.

I was simply solving a problem.

Only later did I realize that I had built an architecture that is now called LLM orchestration.

Sometimes good engineering decisions first look like chaos and intuition. Only later do they become a clear architecture that can be improved consciously.

Top comments (0)