Hello Cerbi

Posted on Jun 1

We Keep Buying Better Observability Tools for Worse Logs

#discuss #devops #security #observability

Most teams do not have an observability problem.

They have a logging behavior problem.

That sounds like splitting hairs, but it is not.

An observability problem sounds like this:

“We need better dashboards.”

“We need better alerting.”

“We need better search.”

“We need better retention.”

“We need a better vendor.”

A logging behavior problem sounds like this:

“Why did someone log a token?”

“Why does every service use a different correlation field?”

“Why are half our logs unstructured strings?”

“Why are we paying to ingest data nobody can query?”

“Why did this debug log become permanent infrastructure?”

That second group is the one we avoid talking about.

Because it is not solved by buying another dashboard.

Bad logs are usually created by good developers

I do not think most bad logs come from careless developers.

They come from normal developers under pressure.

Production is broken.

The error is vague.

The customer is waiting.

The incident channel is getting louder.

Someone needs context fast.

So someone adds this:

_logger.LogInformation("User login failed: {@User}", user);

Or this:

console.log("payment request", requestBody);

Or this:

logger.info("Auth header: {}", authorizationHeader);

Nobody thinks they are creating risk.

They are trying to solve a problem.

That is what makes logging hard. The same instinct that helps debug production can also leak sensitive data, pollute dashboards, and create expensive garbage.

A wiki page is not a control

Most companies have logging standards.

Somewhere.

Maybe in Confluence.

Maybe in a platform engineering document.

Maybe in a security policy.

Maybe in a PDF last updated by someone who left three reorganizations ago.

The standards usually say reasonable things:

Use structured logging.
Include correlation IDs.
Do not log secrets.
Do not log PII.
Use standard severity levels.
Include service and environment metadata.

All good.

But here is the problem:

A document does not change behavior.

A tired developer at 2 AM is not going to stop and lovingly reread the logging policy.

They are going to log whatever helps them understand the issue.

That is not a character flaw. That is how real systems get operated.

So the question is not:

“Do we have logging standards?”

The better question is:

“Can developers accidentally bypass them?”

If the answer is yes, the standard is mostly a suggestion.

Dashboards cannot fix what the app already emitted

Observability tools are useful.

Search is useful.

Dashboards are useful.

Alerts are useful.

Retention policies are useful.

Pipelines are useful.

But most of those tools operate after the log already exists.

If the app emits a token, the token already moved.

If the app emits an email address, the email already moved.

If the app emits junk fields, your dashboard gets junk fields.

If every service names the same concept differently, your query layer becomes a crime scene.

{
  "correlationId": "abc-123"
}

{
  "corr_id": "abc-123"
}

{
  "trace": "abc-123"
}

{
  "requestThingy": "abc-123"
}

Congrats. You now have four standards.

Which is another way of saying you have zero standards.

We need to govern logs before they spread

I think logging needs to be treated more like an enterprise control.

Not in the annoying “please fill out this 47 step process before writing code” way.

I mean simple guardrails close to the developer workflow.

Things like:

{
  "requiredFields": [
    "correlationId",
    "eventName",
    "serviceName",
    "environment"
  ],
  "disallowedFields": [
    "password",
    "token",
    "ssn",
    "creditCardNumber",
    "authorizationHeader"
  ]
}

Now the policy is not just a paragraph.

It is something the system can check.

A log can be evaluated before it leaves the app.

A risky field can be blocked, redacted, warned on, or tagged.

A missing field can be detected.

A temporary exception can be tracked.

That last part matters.

Because enterprise systems need escape hatches. Sometimes teams need relaxed rules during a migration, rollout, or incident. Fine. But those exceptions should be visible.

Invisible exceptions become permanent architecture.

And permanent architecture is where temporary hacks go to buy furniture.

This is what I am building with Cerbi

This is the problem I am working on with Cerbi.

Cerbi is not meant to replace Serilog, NLog, Log4j, Logback, Pino, Winston, Zap, Datadog, Splunk, Application Insights, OpenSearch, or whatever else teams already use.

That would be a terrible sales pitch and an even worse migration plan.

The point is different:

Govern logs before they leave the application.

Cerbi’s tagline is simple:

We stop it at the source.

The idea is to put logging rules closer to where logs are created, not only after they land in a vendor.

What I think logging governance should include

The first piece is runtime governance.

Some logging behavior only exists when the app runs. Fields are dynamic. Context comes from middleware. Values come from requests. Static analysis can help, but it cannot see everything.

Runtime governance can tag violations like:

{
  "GovernanceProfileUsed": "payments-prod-v3",
  "GovernanceViolations": [
    "Missing required field: correlationId",
    "Disallowed field detected: token"
  ],
  "GovernanceRelaxed": false
}

Now governance becomes measurable.

You can see which apps follow the rules.

You can see which teams are drifting.

You can see which exceptions were allowed.

You can see which logs are safe enough to send downstream.

That is much better than hoping code review caught everything.

The next step is scanning repos

The next thing I want to push further is a repository scanner.

Because before you govern new logs, you probably need to know how bad the current state is.

A scanner could look for patterns like:

logger.info(user)
logger.error(requestBody)
console.log(token)
log.Debug("Customer data: " + customer)

It could also look for:

Unstructured log messages
Missing correlation IDs
Unsafe field names
Inconsistent event names
No governance profile attached
Sensitive data near log calls

That gives teams a simple starting question:

“What is our logging risk right now?”

Not in theory.

In the repo.

That matters for platform teams, security teams, architecture reviews, migrations, and audits.

AI can help, but it should not be in charge

I also think AI can help with logging governance.

But not as magic.

I do not want AI silently creating production policy like an intern with admin rights.

Useful AI assistance would look more like this:

Suggest required fields based on existing log patterns.
Detect fields that look sensitive.
Explain why a log violates policy.
Recommend safer structured log shapes.
Convert messy string logs into structured events.
Suggest starter governance profiles for a repo.

AI should recommend.

Humans should approve.

The system should enforce.

That is the boring version.

Which usually means it is the version that might actually survive contact with enterprise reality.

Multi-cloud makes this harder

Most teams do not send logs to one place anymore.

Some go to Azure.

Some go to AWS.

Some go to GCP.

Some go to a SIEM.

Some go to object storage.

Some go to a data platform.

Some go to an observability vendor.

That is normal now.

But logging rules should not disappear because the destination changed.

If a field is unsafe, it is unsafe before it reaches Azure.

If a field is required, it is required before it reaches AWS.

If a log is missing governance metadata, it is missing that metadata before it reaches GCP.

The destination should not define the discipline.

The application should emit governed logs from the start.

My question for other developers

This is the part I am genuinely curious about.

Do your teams actually enforce logging standards?

Not document them.

Enforce them.

Do you scan repos for unsafe logging?

Do you block or redact sensitive fields before logs leave the app?

Do you rely on code review?

Do you rely on your observability vendor?

Do you have a logging policy that everyone technically agrees with but nobody thinks about until something breaks?

I am building Cerbi because I think this is a real gap.

But I want to hear from people who have dealt with this in real systems.

Have logs ever saved you?

Have they ever lied to you?

And would you want logging governance in your developer workflow, or would it feel like one more enterprise control pretending to help?

DEV Community