Fernando Magalhaes

Posted on Mar 19 • Originally published at fmbyteshiftsoftware.com

EventStream Build Log #1: Static Benchmarking as a Gate for Architecture Decisions

#python #softwareengineering #architecture #staticanalysis

How do you compare software architectures objectively?

It's a challenge many teams face.

In this series, we're taking a hands-on approach with the EventStream AI Monitor, a platform designed for intelligent event processing in distributed systems. Rather than relying on abstract concepts or gut feelings, we're grounding our decisions in hard data. Our methodology is a phased, systematic evaluation where we fix most variables and isolate the technology under test (such as architecture, database, or framework) to truly understand its characteristics.

EventStream AI Monitor is an intelligent layer for monitoring distributed systems. It receives events, applies AI for classification and summarization, and triggers automated actions based on rules. Given the critical nature of such a system, choosing the right foundational technologies is paramount.

Our approach is systematic and structured in multiple phases:

Phase 1: Compare architectural styles (Hexagonal, Clean, Onion) using fixed technologies: FastAPI and PostgreSQL.
Phase 2: Compare databases (PostgreSQL, MongoDB) using the selected architecture, with FastAPI and Kafka fixed.
Phase 3: Compare messaging systems (Kafka, RabbitMQ, Local Queue) using the selected architecture and database, with the framework fixed.
Phase 4: Compare frameworks (FastAPI, Django, Flask) using the selected architecture, database, and messaging system.
Phase 5: Compare AI integration methods (Hugging Face API, Local Models) using the fully defined stack.

By fixing variables like the framework and database in Phase 1, we ensure that any observed differences are caused by the architectural structures themselves, rather than side effects from other components. This isolation is essential for making fair and reliable comparisons.

Today, we're sharing the results from the very first phase: the static benchmarking of our Hexagonal Architecture.

What We Measure Before Runtime

Before running load tests or benchmarks, we want answers to a few non-negotiable questions:

Is the code structurally simple, or is it already drifting into complexity?
Does the architecture enforce boundaries, or does it rely on discipline?
Can the system be tested without infrastructure?
Are there type inconsistencies that will fail at runtime?

If these fail, performance does not matter. The system is already compromised.

Toolchain

All checks run against the src directory:

pytest-cov for coverage
radon for cyclomatic complexity and maintainability
ruff for linting and structural issues
mypy in strict mode for type safety
pydeps and pipdeptree for dependency boundaries

Together, the tools expose architectural quality.

Results — Hexagonal Architecture v1

Cyclomatic complexity average: 1.52
This is exactly where it should be. Small functions, explicit orchestration, no hidden logic.
Anything above 3 at this stage is a design smell.

Maintainability index above 90 across most modules
This confirms low cognitive load. The separation between domain, application, and adapters is working.

Core coverage between 87 and 100 percent
The business logic is protected. This is the only part that must be fully deterministic.

Infrastructure coverage is weak
main.py is not covered. The repository layer sits at around 46 percent.
Expected at this stage, this is not a unit testing concern. It will be addressed with integration tests.

The Only Metric That Actually Matters Here

mypy reported 10 type errors.
This is the signal that matters.

Examples:

ORM-to-domain mismatch: Column[str] versus str
Invalid typing in the session factory used for dependency injection
Pydantic v2 migration issues: Config versus model_config

These are runtime failures waiting to happen.
Static typing caught them before execution.

Architectural Read

At this stage, Hexagonal Architecture shows:

Low accidental complexity
Clear dependency direction
Testable core
Isolated infrastructure

This is what we want before scaling the system.

Next Steps

Fix mypy Errors: Address the type-related issues identified, particularly the ORM-to-domain mapping problems.
Implement Integration Tests: Expand our test suite to cover interactions between layers (e.g., API to Use Case to Repository).
Execute Dynamic Benchmarking: Once the core logic is stable, we'll move to measuring performance metrics like response time and throughput under simulated load.
Repeat the Process: Apply the same static and dynamic benchmarking process to the Clean and Onion Architectures.
Compare and Decide: Analyze the results from all three architectures to make an informed decision for Phase 2.

Bottom Line

Architecture should not be a belief system. It should be measured.

Static analysis is the cheapest place to catch structural problems. If you skip this step, you are deferring cost to runtime.

Full details are available at EventStream Build Blog

Full code, scripts, and benchmark outputs: eventstream-ai-monitor

Fernando Magalhães
Founder, FM ByteShift Software

Building systems that do not break under real load

DEV Community