Logs have always been my most fundamental tool for seeing inside a system and understanding what's happening. However, over the years, especially with the widespread adoption of distributed systems and microservice architectures, I've repeatedly experienced how inadequate "unstructured logging," which is plain text-based logging, can be. While casually using print() or logger.info() with free text might seem easy at first, it can turn into a nightmare when things scale up.
In this post, I'll explain the difficulties unstructured logging has caused me, why it falls short, and how I've tried to escape this situation. I want to illustrate the topic with concrete situations I've encountered, both in the backend of my own side project and in a client project.
Limitations of Traditional Logging
Traditionally, developers and system administrators would produce logs in plain text format to understand the status of applications and systems. The Syslog format is a good example of this; while it has a certain structure, the content part is mostly free text. In the early stages of my career, especially in small-scale projects or with a single monolithic application, I frequently used this method. System components like Nginx access logs or PostgreSQL error logs also generally operate with a similar structure.
However, while working on a production ERP system with hundreds of different operations, each navigating through different microservices, this approach quickly hit a wall. Finding the relevant logs to understand why a shipment slip wasn't approved became akin to searching for a needle in a haystack. Since each developer logged in a different format and at a different detail level, there was no single standard.
May 28 09:34:12 webserver appname[1234]: User 'alice' failed login from 192.168.1.100: Invalid credentials.
May 28 09:34:13 dbserver postgres[5678]: [2345] ERROR: duplicate key value violates unique constraint "users_pkey"
May 28 09:34:14 appserver appname[1235]: Order 789 processed successfully.
Log lines like the above, while seemingly meaningful on their own, become nearly impossible to piece together into a complete picture of an event when they come from different services and get lost among thousands of lines. Especially in server infrastructures from the early 2000s, scanning logs with grep was somewhat effective. But today, in systems that produce gigabytes of logs per hour, this method is definitely not sustainable.
The Parsing Nightmare and Its Costs
One of the biggest problems with unstructured logs is that we constantly have to write regex-based parsing rules to extract meaningful data. Imagine hundreds of services written by different teams in a production system. Each service's log format is slightly different, and these formats can change over time. In such a scenario, writing a separate parsing rule for each log format and keeping them constantly updated turns into a complete nightmare.
In one of my projects, we had dozens of files dedicated solely to grok filters on Logstash to manage these parsing rules. When a new log format arrived, we had to spend days testing to ensure existing rules weren't broken. This situation consumed a significant portion of the time allocated for developing new features. In fact, on one occasion, due to a poorly written grok rule after a log format change, we couldn't correctly analyze the error logs of a critical service for 3 days. This could have led to a potential cyberattack attempt or a critical performance issue being noticed late.
⚠️ Regex Dependency
Regex-based parsing can negatively impact performance and is a highly error-prone process for complex log formats. An incorrect regex not only prevents you from getting the right data but can also skyrocket the CPU usage of your log collection systems.
%{TIMESTAMP_ISO8601:timestamp} %{HOSTNAME:hostname} %{WORD:app_name}\[%{INT:pid}\]: User '%{WORD:username}' failed login from %{IP:client_ip}: %{GREEDYDATA:message}
A grok rule like the one above can parse a simple log line. However, if the log format changes slightly (e.g., if service_name comes instead of appname), this rule becomes instantly ineffective. Such continuous updates significantly increase operational costs. Furthermore, the CPU and memory spent by log collection tools (Fluentd, Logstash, etc.) to run these regexes become a significant cost item for large log volumes. In a system with a log flow of 300MB/s per day, I've seen Logstash queues swell and logs start dropping due to parsing errors. Issues like these compromise the reliability of logs and limit our real-time monitoring capabilities.
Loss of Correlation and Context
In distributed systems, a user request can pass through multiple microservices. For example, an order creation process might go through the order service, then to the inventory service, then to the payment service, and finally to the notification service. When using unstructured logging, each of these different services produces its logs in its own format. When you want to diagnose a problem, finding the answer to the question "which services did this order process go through, what happened at each step?" becomes nearly impossible.
While working on a complex supply chain integration for a client project, I spent days trying to understand why an order was stuck in the "processing" status. Manually comparing logs from different services, trying to reconstruct the event by looking at timestamps, was torture. Ultimately, I understood how critical it was to include values like trace_id and span_id in the logs. However, parsing and correlating these IDs consistently in unstructured logs was another challenge.
# Service A Log
2026-06-01T10:00:01Z serviceA[100]: Request received for order 123.
# Service B Log
2026-06-01T10:00:02Z serviceB[200]: Checking stock for item X. Order ID: 123.
# Service C Log
2026-06-01T10:00:03Z serviceC[300]: Payment processed for order 123.
Trying to combine logs with just the Order ID, as in the example above, isn't always sufficient. Especially in systems with heavy asynchronous operations, logs from different processes with the same Order ID can get mixed up. Without a unique request identifier like trace_id, it's very difficult to understand the entire lifecycle of a process and identify bottlenecks. This posed a major obstacle, especially when I was trying to design "real-time dashboards." Relying solely on plain text logs to understand which service is causing how much latency feels like a gamble to me. I delved deeper into this topic in my article titled [related: Observability Techniques in Distributed Systems].
Weakness in Metric Generation and Alerting Mechanisms
Logs are not just for debugging; they are also crucial for monitoring the overall health of a system and deriving performance metrics. Unstructured logs also bring significant limitations in this regard. For example, if you want to derive the error rate or average response time of a specific endpoint from logs, you again need complex parsing rules.
In the backend of one of my side projects, I wanted to monitor the response times and error codes of APIs from the logs. Initially, I tried to extract status code and request time fields from Nginx access logs using grep and count them with a simple script. However, this only showed the status at the Nginx layer and didn't reflect the actual latencies or errors within the application. In the application logs, since this information was written as free text, obtaining reliable metrics was very difficult.
ℹ️ Importance of Reliable Metrics
Reliable metrics allow you to instantly understand your system's health status and proactively address potential issues. Deriving metrics from unstructured logs is often both a slow and error-prone process.
On one occasion, I saw a performance degradation due to the exhaustion of the PostgreSQL connection pool as an ambiguous message like "connection timeout" in the logs. I had to write a custom fail2ban-like rule to automatically count this message and trigger an alert. Such manual interventions increase operational load and reduce the system's proactive observability. If the logs were structured, a metric like connection_pool_error_count could be obtained instantly, and an alert could be triggered automatically when a threshold was exceeded. I explained how I handled such situations in my article [related: PostgreSQL Performance Tuning Guide].
Security and Compliance Challenges
Logs also play a critical role in security. In the event of an attack or a data breach, logs are the first source to consult to understand how the incident unfolded. However, unstructured logs have significant vulnerabilities in this regard as well. The accidental writing of sensitive data (PII - Personally Identifiable Information) into logs poses a great risk.
I was horrified when I saw user passwords or credit card numbers accidentally written into debug logs in a production ERP system. Since these logs were in plain text, it was very difficult to automatically detect and mask or delete them. Manually scanning logs and cleaning up such data was both time-consuming and error-prone. This situation was unacceptable in terms of compliance with regulations like GDPR or KVKK.
# An example of unstructured log containing sensitive data
2026-06-01T10:05:10Z auth_service[500]: User login failed for 'john.doe@example.com' with password 'MySecretPassword123'.
When a log line like this is recorded as plain text, anyone with log access can reach sensitive data. With structured logging, we can control log fields more granularly and automatically mask fields containing sensitive data or avoid logging them altogether. Furthermore, even with system tools like auditd, ensuring the auditability of free-text logs is very difficult. Tracking all interactions of a specific user or process on the system is only possible with structured and contextually rich logs. This is as important as managing SELinux or AppArmor profiles.
Transition to Structured Logging and Its Benefits
All these challenges led me to structured logging. Structured logging is the practice of formatting log messages according to a predefined structure (usually JSON or key-value pairs). This makes logs easily readable, parsable, and queryable by machines.
My preference has generally been logging in JSON format. Especially in my FastAPI and Python projects, configuring the logging module to output in JSON format has incredibly sped up both my development and operational processes. Adding a trace_id to the log context and having it automatically reflected in every log line became possible with just a few lines of code.
import logging
import json
import uuid
class JsonFormatter(logging.Formatter):
def format(self, record):
log_entry = {
"timestamp": self.formatTime(record, self.datefmt),
"level": record.levelname,
"message": record.getMessage(),
"service": getattr(record, 'service', 'unknown'),
"trace_id": getattr(record, 'trace_id', str(uuid.uuid4())),
"extra_data": getattr(record, 'extra_data', {})
}
return json.dumps(log_entry)
# Logger setup
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
handler.setFormatter(JsonFormatter())
logger.addHandler(handler)
# Example usage
logger.info("User logged in successfully", extra={'service': 'auth_service', 'username': 'mustafa'})
Using a JSON formatter like the one above, my logs now look like this:
{"timestamp": "2026-06-01T10:15:00,123", "level": "INFO", "message": "User logged in successfully", "service": "auth_service", "trace_id": "a1b2c3d4-e5f6-7890-1234-567890abcdef", "extra_data": {"username": "mustafa"}}
This format is a structure that log collection systems (Elasticsearch, Loki, Splunk, etc.) can directly parse, index, and query. Now, in case of an error, I can filter all logs with a specific trace_id in seconds, count logs of INFO or ERROR level belonging to a specific service. This gives me immense power in terms of observability.
Of course, structured logging also has its trade-offs. Log sizes can increase slightly because each key and value is written repeatedly. However, this small size increase is negligible compared to the analysis and debugging ease gained. Furthermore, log collection systems can compress and index such structured data much more efficiently. This means that total costs can even be lower than unstructured logs. I've personally observed how much difference structured logs make in query performance, especially when working with GIN indexes in PostgreSQL or JSON data types in Redis.
Conclusion: Structured Logging is Not a Luxury, It's a Necessity
While unstructured logging might seem easy for small projects or initially, it quickly leads you to a dead end as your systems grow, become distributed, and operational complexity increases. In my 20 years of field experience, I've seen firsthand how systems that don't take logging practices seriously can result in disasters.
Transitioning to structured logging is not just a development preference; it's a necessity for operational efficiency, security, and compliance. It allows us to better understand the internal workings of our systems, detect problems faster, and make proactive interventions. I strongly recommend re-evaluating your logging strategy for your next project or when improving your existing systems. Although it requires a little effort initially, the value it will bring in the long run is invaluable.
Top comments (0)