I was lying in my four-year-old's bed at 8:30 PM (long story, Spider-Man was involved) when a Slack message came in:
yoo, we're logging all tenant config secrets in plain text in prod
Not dev. Not staging. Production. Tenant config. Plain text.
The line that surfaced it was raise e. But switching that to bare raise would not have prevented the incident. The secret was already baked into the exception message string before the raise happened. raise e was the symptom. The actual problem was that nobody had decided what the config object was allowed to look like in a log.
That's what this article is about.
What raise e does and why it matters here
When you write this:
try:
result = process_tenant_config(config)
except Exception as e:
logger.exception("Failed during tenant processing")
raise e
Python re-raises the exception bound to e. But the more common leak path is simpler than that: a sensitive object gets interpolated into the exception message string before the raise even happens.
This can happen two ways. The object itself gets interpolated:
raise ValueError(f"Failed processing config for {config}")
Or fields get pulled out directly, which leaks regardless of what __repr__ does:
raise ValueError(
f"Failed processing tenant={config.tenant_id} api_key={config.api_key}"
)
In the first case, Python calls repr(config) to build the string. For a plain Python class, the default repr is <TenantConfig object at 0x...>, which is safe. The risk shows up when you're using dataclasses, attrs, Pydantic, or any framework that generates a field-level repr automatically. Those are designed to be helpful for debugging, so they serialize everything by default:
from dataclasses import dataclass
@dataclass
class TenantConfig:
tenant_id: str
api_key: str
webhook_secret: str
config = TenantConfig("t-123", "sk-secret", "whsec-secret")
print(repr(config))
# TenantConfig(tenant_id='t-123', api_key='sk-secret', webhook_secret='whsec-secret')
By default, dataclasses include every field in their generated __repr__, which is great for debugging but dangerous when the model contains credentials, tokens, API keys, or other secrets.
Whatever repr(config) returns gets baked into the exception message. That string is what logger.exception() logs verbatim as part of the traceback. By the time raise e runs, the damage is already done.
The reason raise e shows up in this story at all is that it made the handler look intentional and correct during code review. It is valid Python. It does what it says. Nothing about it flags as wrong. But it was sitting in an error path that, after a new configuration feature was deployed, started routing failures through a handler that had a TenantConfig in scope. The config object ended up in the exception message. The exception message ended up in the logs. Two days of tenant secrets, in plain text, faithfully recorded.
The sleeping data leak woke up.
Why this class of bug is hard to catch
The four things that combined to cause the incident:
- An object with sensitive data in scope
- Code somewhere that interpolated that object into an exception message
- A logging setup that captured exception context
- A code path change that connected all three for the first time
Any one of those in isolation is fine. Logging exceptions is correct. Having sensitive objects in your codebase is unavoidable. Code path changes happen in every deploy. The problem is that there's no obvious place to look for this combination, and it can sit quietly for a long time before the right conditions arrive.
This is exactly the kind of thing code review misses, because each individual piece looked normal.
Safeguard 1: control what your objects look like when serialized
This is the fix that actually holds. If a sensitive object can never serialize its secrets, it doesn't matter how it ends up in an exception message or a log statement. That's because the leak can't happen.
Give sensitive models an explicit __repr__ that only includes what you'd be comfortable seeing in a log:
class TenantConfig:
def __init__(self, tenant_id, api_key, webhook_secret, integration_type):
self.tenant_id = tenant_id
self.api_key = api_key
self.webhook_secret = webhook_secret
self.integration_type = integration_type
def __repr__(self):
return (
f"TenantConfig("
f"tenant_id={self.tenant_id!r}, "
f"integration_type={self.integration_type!r}"
f")"
)
def __str__(self):
return self.__repr__()
Now f"Failed processing {config}" produces "Failed processing TenantConfig(tenant_id='t-123', integration_type='webhook')". No API key. No webhook secret. No matter what exception handler it passes through or what logging setup captures it.
This is the fix you want because it's unconditional. You're not relying on everyone who touches the codebase to remember to sanitize before logging. The object itself is safe by default.
If you're using Pydantic, SecretStr handles individual secret fields at the field level. It masks the value in repr and str by default, showing SecretStr('**********'), and requires .get_secret_value() to access the raw string:
from pydantic import BaseModel, SecretStr
class TenantConfig(BaseModel):
tenant_id: str
api_key: SecretStr
webhook_secret: SecretStr
integration_type: str
def __repr__(self):
return f"TenantConfig(tenant_id={self.tenant_id!r}, integration_type={self.integration_type!r})"
If you want to suppress specific fields from Pydantic's default repr without replacing the whole thing, Field(repr=False) does it per field:
from pydantic import BaseModel, Field
class TenantConfig(BaseModel):
tenant_id: str
api_key: str = Field(repr=False)
webhook_secret: str = Field(repr=False)
integration_type: str
Safeguard 2: sanitize at the logging layer
Repr control on the model is your first line of defense. A logging sanitizer is the catch-all for anything that slips through, including cases where someone logs a dict of config values directly rather than the model object.
import logging
SENSITIVE_KEYS = frozenset({
"api_key", "secret", "token", "password",
"credential", "webhook_secret", "private_key",
})
class SanitizingFilter(logging.Filter):
def filter(self, record: logging.LogRecord) -> bool:
if isinstance(record.args, dict):
record.args = self._sanitize(record.args)
return True
def _sanitize(self, data: dict) -> dict:
return {
k: "***REDACTED***" if k in SENSITIVE_KEYS else v
for k, v in data.items()
}
logging.getLogger().addFilter(SanitizingFilter())
Worth knowing: this filter only intercepts record.args when it's a dict, which only happens with %-style dict formatting:
logger.error("api_key=%(api_key)s", {"api_key": "sk-secret"}) # filter works
logger.error(f"api_key={api_key}") # filter does nothing
logger.error("api_key=%s", api_key) # filter does nothing
With f-strings, the message is fully formatted before the log record is created, so record.args is an empty tuple by the time the filter runs. Most modern Python logging uses f-strings, so treat this as a safety net for dict-style structured logging, not a complete solution on its own.
For structlog, add a processor instead:
import structlog
SENSITIVE_KEYS = frozenset({
"api_key", "secret", "token", "password",
"credential", "webhook_secret", "private_key",
})
def sanitize_processor(logger, method, event_dict):
for key in SENSITIVE_KEYS:
if key in event_dict:
event_dict[key] = "***REDACTED***"
return event_dict
structlog.configure(
processors=[
sanitize_processor,
structlog.processors.JSONRenderer(),
]
)
The structlog version is more useful in practice because structured logging tends to pass data as explicit key-value pairs, which means the processor actually sees the fields. The stdlib filter is most useful when you have existing code passing dicts to logger.error().
Safeguard 3: write the tests
This is the piece that prevents the next version of this incident. Without tests, a future refactor can quietly undo a safe __repr__, and nobody notices until the logs tell them.
def test_tenant_config_repr_does_not_expose_secrets():
config = TenantConfig(
tenant_id="tenant-abc",
api_key="sk-super-secret-do-not-log",
webhook_secret="whsec-also-secret",
integration_type="webhook",
)
config_repr = repr(config)
assert "sk-super-secret-do-not-log" not in config_repr
assert "whsec-also-secret" not in config_repr
assert "tenant-abc" in config_repr
def test_tenant_config_str_does_not_expose_secrets():
config = TenantConfig(
tenant_id="tenant-abc",
api_key="sk-super-secret-do-not-log",
webhook_secret="whsec-also-secret",
integration_type="webhook",
)
assert "sk-super-secret-do-not-log" not in str(config)
def test_exception_containing_tenant_config_does_not_expose_secrets():
config = TenantConfig(
tenant_id="tenant-abc",
api_key="sk-super-secret-do-not-log",
webhook_secret="whsec-also-secret",
integration_type="webhook",
)
try:
raise ValueError(f"Processing failed for {config}")
except ValueError as e:
assert "sk-super-secret-do-not-log" not in str(e)
The third test is the important one. It explicitly verifies what happens when the config object ends up in an exception message, which is the exact path that caused the original leak. It's two minutes to write and it would have caught this before it reached production.
When secrets are already in your logs
Once they're there, the engineering fix is the easy part. The harder conversation is what to do about the data that already exists.
Logs in production systems are not one thing. They're in your log aggregator, your SIEM, long-term retention storage, compliance archives, monitoring pipelines that may have already exported them somewhere downstream. "Scrub the logs" means figuring out every place those logs landed and understanding what deletion looks like in each one.
Some systems have APIs for deletion. Some have retention policies that make deletion complicated or impossible. Some require opening a support ticket with a vendor. Some require you to accept that the data exists, document it, and focus on reducing exposure going forward.
What you don't want to be figuring out for the first time at midnight during an active incident:
- Where do your logs go? All destinations, not just the primary one.
- Who owns each destination?
- What are your deletion options per system?
- Do you have a compliance obligation to notify anyone if production secrets are exposed, and what's the notification window?
Rotate the secrets regardless. Treat anything that was in the logs as potentially compromised even without evidence of access. That's standard practice, not an overreaction. Just do it deliberately: a middle-of-the-night rotation done hastily can break customer integrations, which is its own incident. Build the plan, communicate it, execute it carefully.
The actual architectural problem
Logging is treated like a default in most codebases. A thing that happens rather than a thing that's decided. You add logger.exception() when debugging, it stays, and nobody ever asks: what does this object actually look like when it gets serialized?
That's the question that needs to be on your design checklist for any model that touches secrets. Not as an afterthought when something leaks. As part of building the model.
__repr__ is not just for debugging. It's the contract your object has with anything that might serialize it. The danger isn't Python's default object repr — <TenantConfig object at 0x...> is actually fairly safe. The danger is the moment a framework, dataclass, model library, or custom implementation starts serializing fields automatically. At that point, you've implicitly made a decision about what belongs in logs, whether you meant to or not.
Quick checklist
When building a model that handles secrets:
- For models that may be serialized into logs, define
__repr__and__str__explicitly rather than relying on framework-generated defaults. - Use
SecretStr(Pydantic) orField(repr=False)for individual secret fields. - Write a test that asserts secret values don't appear in
repr(),str(), or exception messages containing the object.
At the logging layer:
- Add a sanitizing filter or structlog processor that redacts known sensitive keys.
- Know what
exc_info=Trueandlogger.exception()actually serialize in your stack. - Understand the f-string limitation: pre-formatted messages bypass field-level filters.
Before you have an incident:
- Map where your logs go. All of them.
- Know your deletion options per destination.
- Know your compliance notification obligations and timelines.
This is loosely based on a real incident I talked through in Episode 8 of Chaotic Commits. Ghost Spider made it out fine.
Top comments (0)