DEV Community: Alexander Ivanov

I Built a Python Prompt Orchestrator for Structured LLM Pipelines

Alexander Ivanov — Fri, 29 May 2026 04:00:51 +0000

Most LLM applications eventually hit the same problem:

prompts become unmanageable.

At first, everything fits into a single string.

Then you add:

summaries
RAG
memory
safety checks
token budgets
conversation compaction
provider switching

And suddenly your prompt pipeline becomes harder to maintain than the model itself.

So I built prompt_orchestrator.

What is it?

prompt_orchestrator is a Python module for structured prompt orchestration with:

static/semi-stable/dynamic prompt layout
configurable summarization providers
optional RAG integration
safety heuristics
token budgeting
centralized configuration
prompt efficiency analysis

The goal was simple:

Make prompt pipelines deterministic, modular, and production-friendly.

Structured prompt sections

The orchestrator separates prompts into:

static parts
semi-stable parts
dynamic conversation context

This improves:

cacheability
token efficiency
prompt readability
debugging

Works with or without RAG

The module supports optional RAG providers.

It integrates directly with rag_orchestrator and compatible retrieval systems.

One particularly useful detail:

Both projects share a compatible DocChunk structure.

This makes integration extremely simple.

Safety checks included

The project includes lightweight safety heuristics for:

injection detection
contradiction checks

without requiring a separate moderation service.

Summary providers

Supported summary backends:

OpenAI
Ollama
deterministic local fallback
custom providers

So the orchestration layer is not tied to a single vendor.

Token-aware orchestration

The orchestrator includes:

token counting via tiktoken
automatic trimming
prompt fitting
configurable token budgets

which becomes critical for long-running conversations.

Designed for integration

The module was intentionally designed to integrate into existing systems.

It does not force:

a framework
an agent runtime
a specific LLM provider
a database stack

Tests and simulations

The repository already includes:

interactive simulations
safety simulations
conversation replay tests
console pipelines

which makes experimentation easy.

Installation

pip install -e .

Final thoughts

A lot of current LLM tooling focuses on:

agents
autonomous loops
framework ecosystems

But prompt orchestration itself is still an unsolved infrastructure problem.

This project focuses specifically on making that layer cleaner and easier to reason about.

I Built a Lightweight Python RAG Flow Orchestrator That Works with SQLite, PGVector and Qdrant

Alexander Ivanov — Thu, 28 May 2026 16:28:50 +0000

Most RAG frameworks today assume:

a huge dependency graph
mandatory LLM orchestration
opinionated pipelines
complex configuration

But many real-world systems need something simpler.

Especially when:

you already have an existing pipeline
you want local/offline execution
you need predictable retrieval
you do not want every step delegated to an LLM

So I built rag-orchestrator.

What makes it different?

The project was designed around one key idea:

RAG infrastructure should be modular, lightweight, and database-agnostic.

Works with multiple vector databases

The orchestrator supports:

SQLite
PGVector
Qdrant

through an abstract storage layer.

This means you can switch backends without rebuilding the whole pipeline.

Fully pluggable architecture

The project provides abstraction layers for:

Embeddings
Retrievers
Cleaners
Vector stores
Processing steps

You can easily plug in:

your own embedding provider
your own retriever
custom preprocessing logic
external pipelines

without rewriting internal logic.

Minimal LLM usage

One important design decision:

The orchestrator works without an LLM for almost the entire pipeline.

LLMs are only required at a single step where they actually add value.

This makes the system:

cheaper
faster
more deterministic
easier to debug

Minimal configuration

The module intentionally requires very few input parameters.

The goal was:

fast onboarding
simple integration
production-friendly defaults

Tested and production-oriented

The repository already includes:

integration tests
runnable scripts
usage examples

You can inspect them directly in the scripts/ directory.

Easy integration into existing systems

The project was built to integrate into:

existing RAG pipelines
enterprise systems
AI backends
local AI stacks
internal search systems

instead of forcing users into a completely new ecosystem.

Installation

```bash id="1b38r0"
pip install rag-orchestrator




## Why this matters

A lot of modern RAG tooling is becoming increasingly framework-heavy.

But many production systems actually need:

* predictability
* portability
* low overhead
* composability

rather than autonomous agent complexity.

This project focuses exactly on that.

PromptMan: REST API-First Prompt Registry for Real LLM Infrastructure

Alexander Ivanov — Wed, 20 May 2026 02:54:27 +0000

Large Language Models changed the way modern systems are built.
Prompts are no longer “just text” — they have become infrastructure:

behavioral contracts for LLMs,
reusable business logic,
configuration artifacts,
optimization targets,
security-sensitive assets.

As soon as teams start iterating on prompts, they immediately encounter classic infrastructure problems:

How should prompts be versioned?
How do multiple services share them?
How can teams enforce RBAC?
How are prompts audited?
How do you scale prompt access under concurrent load?
How do you keep prompts fully on-premise?

This is why prompt registries are becoming a separate software category.

For engineering teams, especially backend-focused teams, the ideal solution usually includes:

REST API access,
RBAC,
immutable version history,
tagging and search,
authentication,
automation support,
horizontal scalability,
cloud and on-premise deployment,
and no SaaS dependency.

Below is an updated overview of the current ecosystem.

Existing Solutions
PromptHub

Cloud prompt manager with UI collaboration, prompt versioning, evaluations, and experimentation tools.

REST API: Yes
RBAC: Partial
On-Premise: No
Scaling: SaaS
License: Freemium
PromptLayer

Focused mainly on LLM observability, request logging, analytics, and tracing.

REST API: Yes
RBAC: Limited
On-Premise: No
Scaling: SaaS
License: Freemium
LangSmith

LLM tracing, monitoring, evaluation, and debugging platform from LangChain.

REST API: Yes
RBAC: Partial
On-Premise: Enterprise only
Scaling: SaaS
License: Freemium
Promptfoo

Open-source framework focused on prompt testing, evaluation, regression analysis, and CI/CD workflows.

REST API: Partial
RBAC: No
On-Premise: Yes
Scaling: CI/CD
License: Free
Flowise

Visual low-code builder for LLM pipelines and AI workflows.

REST API: Yes
RBAC: Limited
On-Premise: Yes
Scaling: Docker/Kubernetes
License: Free / Enterprise
PromptPerfect

Automatic prompt optimization platform focused on prompt rewriting and quality improvements.

REST API: Yes
RBAC: No
On-Premise: No
Scaling: SaaS
License: Paid
Notion

General-purpose knowledge management platform sometimes used as ad-hoc prompt storage.

REST API: Yes
RBAC: Limited
On-Premise: No
Scaling: SaaS
License: Freemium
Obsidian

Local Markdown-based knowledge system frequently used for personal prompt collections.

REST API: No
RBAC: No
On-Premise: Yes (local)
Scaling: Git/local filesystem
License: Free
Dendron

VSCode-centered hierarchical note system.

REST API: No
RBAC: No
On-Premise: Yes (local)
Scaling: Git/local filesystem
License: Free
PromptMan

PromptMan takes a very different architectural approach compared to most tools in this space.

It is designed primarily as a REST API-first prompt registry rather than a SaaS UI product.

The HTTP API is the main integration surface.
The UI intentionally acts as a lightweight companion client over the same API.

This makes PromptMan closer to infrastructure software than to a browser-oriented prompt workspace.

Core Architecture

PromptMan provides:

REST API-first architecture
Immutable prompt versioning
Prompt storage by project + name
Structured prompt fields:
role
task
context
constraints
output format
examples
RBAC with:
admin
developer
viewer
Authentication for both API and UI
Access + refresh token sessions
Per-project access control
Audit metadata:
created_by
updated_by
timestamps
Prompt tagging and AND/OR search
Pagination and server-side sorting
Automatic DB migrations
Semantic versioning
Runtime version endpoint
Sensitive configuration encryption
Bootstrap admin initialization
Optimization Features

PromptMan also includes built-in prompt optimization workflows.

Features include:

Optimization profiles:
fast
quality
ultra
Multiple provider support:
Ollama
OpenAI-compatible APIs
Anthropic
Gemini
Groq
Mistral
Dynamic model discovery
Per-user optimization configuration
Heuristic fallback optimizer
Leo optimizer backend integration

Unlike many SaaS products, PromptMan supports fully local optimization flows using Ollama.

Plugin System (EPS)

One of the largest additions since earlier versions is the extensible plugin system.

PromptMan now supports:

Dynamic plugin loading
Hot plugin reload
Runtime plugin isolation
Detached plugin signatures
Trusted signer validation
Modal plugin sessions
Plugin hooks
Endpoint injection
UI control rendering
Plugin RBAC
Plugin health monitoring

Plugins can expose their own REST endpoints automatically:

/v1/plugins//

The platform also supports signed plugins through detached signature sidecars and trusted signer registries.

This makes PromptMan extensible without modifying the core application.

Prompt Efficiency Analyzer

PromptMan now includes a built-in Prompt Efficiency Analyzer plugin.

The analyzer:

works fully locally,
requires no external LLM calls,
evaluates prompt stability,
analyzes predictability,
measures cache friendliness,
estimates prompt efficiency characteristics.

This is particularly useful for teams trying to optimize prompt cost and cache reuse patterns in production systems.

Scalability And Infrastructure

PromptMan was designed with backend deployment patterns in mind.

Supported databases:

SQLite
PostgreSQL
MySQL/MariaDB (via SQLAlchemy)
Deployment Modes
Local single-node deployment
Docker deployment
Kubernetes deployment
Horizontally scaled multi-instance deployment
Horizontal Scaling

The architecture is stateless.

Multiple PromptMan instances can run behind a load balancer while sharing PostgreSQL as the central state store.

The repository also contains:

Locust-based load testing harness,
benchmark charts,
concurrency validation,
cache performance measurements,
race-condition tests.
Measured Performance

PromptMan includes real benchmark results in the repository.

Highlights from current measurements:

Cache-heavy workloads scale linearly under concurrent load.
Hot optimization paths sustain high throughput with zero failures.
PostgreSQL sync mode showed the best balanced production characteristics.
SQLite remains highly competitive for small local teams.
Cache reuse produced ~100× throughput improvement compared to cold optimization paths.

This is unusually infrastructure-focused for a prompt management tool.

Security Model

PromptMan emphasizes self-hosted security controls:

100% on-premise capable
Encrypted password hashes
Encrypted API tokens
RBAC enforcement
Signed plugin validation
Refresh token isolation
Authentication for both API and UI

Prompts never need to leave internal infrastructure.

Docker Images

Official container images are available via:

Docker Hub
GitHub Container Registry
Comparison Table
Tool,REST API,RBAC,On-Premise,Scaling,License
PromptHub,Yes,Partial,No,SaaS,Freemium
PromptLayer,Yes,Limited,No,SaaS,Freemium
LangSmith,Yes,Partial,Enterprise,SaaS,Freemium
Promptfoo,Partial,No,Yes,CI/CD,Free
Flowise,Yes,Limited,Yes,Docker/K8s,Free/Enterprise
PromptPerfect,Yes,No,No,SaaS,Paid
Notion,Yes,Limited,No,SaaS,Freemium
Obsidian,No,No,Yes,Git/local,Free
Dendron,No,No,Yes,Git/local,Free
PromptMan,Yes,Yes,Yes,Horizontal,Free
Why PromptMan Stands Out

Most prompt tools today optimize for:

browser collaboration,
prompt experimentation,
analytics dashboards,
SaaS workflows.

PromptMan instead optimizes for:

backend integration,
API semantics,
concurrent multi-user access,
infrastructure deployment,
self-hosting,
operational predictability.

That makes it particularly attractive for:

backend-heavy teams,
internal AI platforms,
regulated environments,
private deployments,
multi-service architectures,
CI/CD-driven prompt workflows.

In practice, PromptMan behaves less like a “prompt editor” and more like infrastructure software for LLM systems.

A useful analogy is:

PromptMan is closer to “PostgreSQL for prompts” than to a collaborative SaaS workspace.

For teams that need a local, secure, horizontally scalable, API-driven prompt registry with real engineering semantics, PromptMan is currently one of the most infrastructure-oriented open-source solutions available.