DEV Community

Cover image for Monolithic vs Plugin-Based Medical AI Architectures
Kaushik Manjunatha
Kaushik Manjunatha

Posted on

Monolithic vs Plugin-Based Medical AI Architectures

When I started working on medical AI infrastructure at a university hospital, the first real question wasn't about models or FHIR or GPU scheduling. It was simpler: how do you structure a system that multiple research teams can use, without it turning into a nightmare to maintain?
That question led me to rethink how medical AI actually gets deployed in practice.

The Monolithic Approach - and why it breaks
The most common starting point is a monolithic setup: one codebase, one container, one pipeline. You take your model, wrap it in a FastAPI endpoint, dockerize it, and ship it.

It works. Until a second team wants to plug in a different model.
Suddenly you're touching the core codebase for every new AI service. Deployments become risky. The team maintaining the core is now a bottleneck for everyone else. And nobody is happy.

In a hospital research environment, this happens faster than you'd expect. Different clinical teams have different models, different input formats, different timelines, and they all need to run on the same shared infrastructure.

The Plugin-Based Alternative
The idea is straightforward: separate the core gateway from the AI services completely.

The gateway handles infrastructure - FHIR communication, job queuing, container orchestration, GPU detection. Individual AI models live in their own repositories as self-contained plugins. Each plugin follows a defined interface: receive input, run inference, return output in a standard format (in our case, a FHIR DiagnosticReport).

The gateway doesn't care what's inside the plugin. It just knows how to talk to it.

The tradeoff
Plugin-based architectures add upfront complexity. You need a well-defined interface contract between the gateway and plugins. If that contract is poorly designed early on, you pay for it with every new plugin.

In a healthcare context, FHIR is a natural choice for that contract, it's a standard, it's interoperable, and hospitals already speak it. But it comes with a steep learning curve, especially FHIR R4B. It is not exactly lightweight.

So, which one should you choose?
For a single-model, single-team setup, the monolithic approach is faster and easier to reason about. Just don't over-engineer it.

But in a research hospital environment, where multiple teams need to deploy different models on shared GPU infrastructure, the plugin-based approach pays off quickly. It shifts the bottleneck from the architecture itself to the interface contract, and that's a much better problem to have.

I work on medical AI infrastructure in Germany. If you've dealt with similar architectural decisions or have a different approach, I'd genuinely like to hear about it in the comments.

Top comments (0)