Daniele Frasca for AWS Community Builders

Posted on Mar 17

Three Responsibilities of a Global Application (Part 2)

#aws #serverless #observability #governance

In Part 1, I explained that at a global scale, trust is part of the architecture. Not trust as a feeling, but trust as something the system must enforce and prove.

In this article, I aim to explain the 3 distinct responsibilities that enable the system to grow organically.

Why global systems feel complex

Most complexity in global systems does notcome from services. It comes from mixing concerns.
I have seen this pattern many times:

CloudWatch dashboards are used to answer audit questions
CloudTrail logs are pulled into debugging workflows
Metrics start carrying tenant identifiers just to be safe

None of the above are wrong in isolation, but when they are together, they create systems that are:

Hard to operate
Hard to explain
Hard to defend

The problem is the system is trying to answer too many different questions at once. In short, it is the same pattern that applies to code Single Responsibility.
My job brought me to a point where I stopped thinking in terms of architectures and started thinking in terms of responsibilities. No matter how the application is built, it must answer the same 3 questions:

How does work actually happen?
How can we prove that work happened correctly?
How do we know whether the system is healthy?

When these responsibilities are clearly separated, decisions become easier. When they are mixed, every discussion becomes a mess.

Responsibility #1 — Doing the work (execution)

This is the responsibility most devs are comfortable with.

It is where:

Business logic runs
Requests are processed
Events are handled
Workflows progress

In AWS terms, this is:

AWS Lambda
Step Functions
EventBridge
DynamoDB
SQS
SNS

This responsibility answers one question only:

What does the system do for the business?

And it should be optimised for:

Correctness
Scalability
Resilience
Isolation

Problems start when this responsibility is overloaded.
Examples:

embedding compliance logic directly into business code
adding just in case logging everywhere without structure
leaking operational concerns into domain logic

Execution code should focus on doing the work, not explaining or defending it.

Responsibility #2 — Proving the work (evidence and control)

This responsibility exists because someone outside the team will ask questions like:

Who had access?
Who changed production?
What data moved where?
Was logging enabled at the time?

This responsibility is not about debugging. It is about proof.

In AWS, this responsibility is expressed through things like:

AWS CloudTrail
IAM configuration and access records
Configuration history
Retention policies
AWS Audit Manager

A common issue is teams trying to reuse execution or observability data as evidence. That usually fails because:

Logs change format
Metrics are aggregated
Dashboards get deleted
Devs remember things differently

Evidence systems must be:

Complete
Consistent
Tamper‑resistant

This is why this responsibility must be separate from execution.

Responsibility #3 — Understanding the system (operations)

This responsibility answers a very different question:

Is the system healthy right now?

Not:

What happened to tenant X?
Who changed this?

But:

Are errors increasing?
Is latency degrading?
Is this regional or global?

And the answers are:

Metrics
Alerts
SLOs
Dashboards

In AWS environments, this usually means:

CloudWatch metrics
Amazon Managed Prometheus
Service telemetry
Alerts

Those services exist, they trigger the investigation and actions.

Why mixing responsibilities breaks systems

Once these 3 responsibilities are separated, many things become obvious. For example:

Metrics with tenantId - That is usually execution detail leaking into operations, and this is not what metrics are actually meant for.
CloudWatch dashboards as audit - Dashboards explain system behaviour while auditors need immutable, verifiable evidence.
Debugging incidents by scrolling through CloudTrail - CloudTrail is excellent at answering who did what, but it is a terrible tool for answering why the system is behaving this way right now.

Each of these feels ok on its own, but at scale, there is confusion about what the system is actually trying to tell us.

Benefits of separation

Once responsibilities are separated, conversations change.

Instead of:

Should we centralise logs?

I ask:

Which responsibility are we trying to serve?

Instead of:

Why cannot I just add tenantId to metrics?

I ask:

Is this an operational signal or an accounting question?

Instead of:

Why is governance slowing us down?

I ask:

Which responsibility are we trying to satisfy?

The trade‑offs are the same, but they are made explicit.

Conclusion

From a governance angle, global systems do not fail because they are distributed. They fail because we ask one system to do everything at once.
Separating responsibilities does notreduce complexity, it puts complexity where it belongs.