Logging Mastery: Debug Like Your Life Depends On It
He was staring at an error message at 2 AM. Production was down. Customers were angry. And the error log was completely useless: "Something went wrong."
That's when he realized: he'd spent years learning debugging tools, testing frameworks, profilers. But he'd never learned how to log properly.
Logging is the difference between knowing what happened and guessing. It's the difference between fixing a bug in 5 minutes and spending 5 hours in the dark.
This is the story of mastering logging—not as an afterthought, but as a first-class citizen of your code.
The Truth About Print Debugging
Everyone starts here. Something breaks. You add a print statement. Run it again. Look at the output. Remove the print statement.
It works for small programs. For tiny bugs. For code that runs once and never again.
But production code? Code that runs unattended? Code where you can't just run it again whenever you want?
Print debugging fails catastrophically.
Print statements in production are like leaving a trail of breadcrumbs in the dark. You hope someone finds them later.
Real logging is different. It's systematic. It's queryable. It persists. It's your insurance policy for when things go wrong at 3 AM on a Sunday.
The Logging Hierarchy
Before you write a single log line, you need to understand levels. They're not optional—they're the language that separates signal from noise.
DEBUG
Verbose information. What value did this variable have? What parameters were passed? What function was called? In production, you turn this off. In development, you drown in it.
INFO
Important business events. "User logged in." "Order placed." "Backup completed." These tell the story of your application running normally.
WARNING
Something unexpected happened, but we recovered. "Database query took 5 seconds." "File not found, using default." "Connection retried 3 times before succeeding."
ERROR
Something broke. The operation failed. But the application didn't crash. "Payment processing failed for order #123." "Email send timed out." These need attention.
CRITICAL
The system is failing. Pages down. Database offline. These wake people up at 3 AM. Use them sparingly and only when you mean it.
The magic: you can set the logging level globally. In production, show ERROR and CRITICAL only. During debugging, show DEBUG and up. Same code, different visibility.
Structured Logging: The Game Changer
Most developers log like this:
`logger.info(f"User {username} logged in from {ip_address}")`
That works for humans reading logs. It fails for machines searching them.
Structured logging is different:
`logger.info("user_login", extra={
"username": username,
"ip_address": ip_address,
"user_id": user.id,
"timestamp": datetime.now().isoformat(),
"session_duration": 0
})`
Now you can query: "Show me all logins from this IP address." Or "How many login attempts from user #5 today?" Or "What's the average time between login and logout?"
Structured logging transforms logs from "stuff that happened" into "queryable event streams."
The Setup That Works
Step 1: Configure Logging in Your Application
`import logging
import json
from datetime import datetime
class StructuredFormatter(logging.Formatter):
def format(self, record):
log_data = {
"timestamp": datetime.now().isoformat(),
"level": record.levelname,
"logger": record.name,
"message": record.getMessage(),
"module": record.module,
"function": record.funcName,
"line": record.lineno,
}
# Add any extra fields
if record.__dict__.get('extra'):
log_data.update(record.__dict__['extra'])
return json.dumps(log_data)
# Configure root logger
handler = logging.StreamHandler()
handler.setFormatter(StructuredFormatter())
logging.root.addHandler(handler)
logging.root.setLevel(logging.INFO)`
Step 2: Log Thoughtfully
`# Good logging
logger = logging.getLogger(__name__)
def process_payment(order_id, amount, payment_method):
logger.info("payment_initiated", extra={
"order_id": order_id,
"amount": amount,
"payment_method": payment_method
})
try:
result = payment_gateway.charge(amount, payment_method)
logger.info("payment_successful", extra={
"order_id": order_id,
"transaction_id": result.id,
"amount": amount
})
return result
except PaymentError as e:
logger.error("payment_failed", extra={
"order_id": order_id,
"error": str(e),
"amount": amount,
"attempt": 1
})
raise`
Notice: you're logging at the right moments. Entry point. Success. Failure. Each log has context—order ID, amounts, transaction IDs. When something fails, you have everything you need to reproduce it.
Step 3: Centralize and Query Your Logs
Logs scattered across 50 server instances are useless. You need centralization.
For small projects: ELK Stack (Elasticsearch, Logstash, Kibana) or Loki (lighter weight)
For managed solutions: Datadog, New Relic, Splunk
For bootstrapped teams: CloudWatch (AWS), Stackdriver (GCP), or even PostgreSQL with good indexing
The principle: all logs in one place. Searchable. Filterable. Queryable.
The Questions You Can Now Answer
With proper logging, you can answer questions that used to be unanswerable:
"What happened right before the system crashed?" — Look at logs 5 minutes prior, working backwards
"How often does this specific error occur?" — Count error events with that message
"Which user is experiencing the problem?" — Filter by user_id and see their event stream
"Is this a widespread issue or isolated?" — See how many instances/regions are affected
"How long did the operation take?" — Compare timestamp of start and end events
"What's the pattern before failures?" — Look at sequences of logs leading to errors
Common Logging Mistakes
Logging Too Much
Every single operation? Every variable change? That's noise. In production, you're drowning in INFO logs and can't see the errors. Log business events, not implementation details.
Logging Too Little
Only logging errors? When that error happens at 3 AM, you have no context. What led to it? What state was the system in? Log the journey, not just the crash.
Not Logging Context
A log that says "Request failed" is useless without knowing: which request? Which user? What parameters? Always include enough context to act on the log.
Forgetting About Performance
Logging has cost. Disk I/O, network for centralization, storage. High-frequency logging in tight loops can slow down your application. Be selective.
The Real Payoff
Last week, a customer reported an issue: their data wasn't syncing. He pulled up the logs. Found it in 90 seconds: a race condition between two services, happening only when data arrived in a specific order.
With good logs, he could see: Service A sent data at 10:23:05. Service B read it at 10:23:06. Service C expected it at 10:23:05 and timed out. The whole story, right there in the logs.
Without those logs? He'd be guessing for hours. Maybe days. Instead, he had the answer immediately.
That's the power of logging mastery. Not just fixing bugs faster. But moving from "we have a problem" to "here's why and here's the fix" in minutes instead of hours.
Your Next Step
This week, audit your logging. Look at one critical service. Ask yourself:
If this failed right now, could I figure out why in 5 minutes?
Do my logs have enough context?
Can I search them?
Are they structured or free-form?
If the answer to any of these is "no," start improving. Add structured logging. Centralize your logs. Query them. Learn what information actually matters.
Your future self—the one debugging at 3 AM—will thank you.
The difference between a 2-minute fix and a 2-hour debugging session is often just one thing: good logs.
Invest in logging infrastructure now, before you need it at 3 AM on a Sunday when your production system is down.
Part of the Developer Mastery Series
Previously: VS Code Mastery: From Environments to Excellence
Master your tools, master your debugging, master your craft
Built with Python logging, structured JSON, centralized log aggregation, and the hard-earned wisdom of debugging at 3 AM more times than he'd like to admit. Written by Andrew • January 2026



Top comments (0)