How do you compare software architectures objectively?
It's a challenge many teams face.
In this series, we're taking a hands-on approach with the EventStream AI Monitor, a platform designed for intelligent event processing in distributed systems. Rather than relying on abstract concepts or gut feelings, we're grounding our decisions in hard data. Our methodology is a phased, systematic evaluation where we fix most variables and isolate the technology under test (such as architecture, database, or framework) to truly understand its characteristics.
EventStream AI Monitor is an intelligent layer for monitoring distributed systems. It receives events, applies AI for classification and summarization, and triggers automated actions based on rules. Given the critical nature of such a system, choosing the right foundational technologies is paramount.
Our approach is systematic and structured in multiple phases:
- Phase 1: Compare architectural styles (Hexagonal, Clean, Onion) using fixed technologies: FastAPI and PostgreSQL.
- Phase 2: Compare databases (PostgreSQL, MongoDB) using the selected architecture, with FastAPI and Kafka fixed.
- Phase 3: Compare messaging systems (Kafka, RabbitMQ, Local Queue) using the selected architecture and database, with the framework fixed.
- Phase 4: Compare frameworks (FastAPI, Django, Flask) using the selected architecture, database, and messaging system.
- Phase 5: Compare AI integration methods (Hugging Face API, Local Models) using the fully defined stack.
By fixing variables like the framework and database in Phase 1, we ensure that any observed differences are caused by the architectural structures themselves, rather than side effects from other components. This isolation is essential for making fair and reliable comparisons.
Today, we're sharing the results from the very first phase: the static benchmarking of our Hexagonal Architecture.
What We Measure Before Runtime
Before running load tests or benchmarks, we want answers to a few non-negotiable questions:
- Is the code structurally simple, or is it already drifting into complexity?
- Does the architecture enforce boundaries, or does it rely on discipline?
- Can the system be tested without infrastructure?
- Are there type inconsistencies that will fail at runtime?
If these fail, performance does not matter. The system is already compromised.
Toolchain
All checks run against the src directory:
-
pytest-covfor coverage -
radonfor cyclomatic complexity and maintainability -
rufffor linting and structural issues -
mypyin strict mode for type safety -
pydepsandpipdeptreefor dependency boundaries
Together, the tools expose architectural quality.
Results — Hexagonal Architecture v1
Cyclomatic complexity average: 1.52
This is exactly where it should be. Small functions, explicit orchestration, no hidden logic.
Anything above 3 at this stage is a design smell.
Maintainability index above 90 across most modules
This confirms low cognitive load. The separation between domain, application, and adapters is working.
Core coverage between 87 and 100 percent
The business logic is protected. This is the only part that must be fully deterministic.
Infrastructure coverage is weak
main.py is not covered. The repository layer sits at around 46 percent.
Expected at this stage, this is not a unit testing concern. It will be addressed with integration tests.
The Only Metric That Actually Matters Here
mypy reported 10 type errors.
This is the signal that matters.
Examples:
- ORM-to-domain mismatch:
Column[str]versusstr - Invalid typing in the session factory used for dependency injection
- Pydantic v2 migration issues:
Configversusmodel_config
These are runtime failures waiting to happen.
Static typing caught them before execution.
Architectural Read
At this stage, Hexagonal Architecture shows:
- Low accidental complexity
- Clear dependency direction
- Testable core
- Isolated infrastructure
This is what we want before scaling the system.
Next Steps
Fix
mypyErrors: Address the type-related issues identified, particularly the ORM-to-domain mapping problems.Implement Integration Tests: Expand our test suite to cover interactions between layers (e.g., API to Use Case to Repository).
Execute Dynamic Benchmarking: Once the core logic is stable, we'll move to measuring performance metrics like response time and throughput under simulated load.
Repeat the Process: Apply the same static and dynamic benchmarking process to the Clean and Onion Architectures.
Compare and Decide: Analyze the results from all three architectures to make an informed decision for Phase 2.
Bottom Line
Architecture should not be a belief system. It should be measured.
Static analysis is the cheapest place to catch structural problems. If you skip this step, you are deferring cost to runtime.
Full details are available at EventStream Build Blog
Full code, scripts, and benchmark outputs: eventstream-ai-monitor
Fernando Magalhães
Founder, FM ByteShift Software
Building systems that do not break under real load
Top comments (0)