loading...
Cover image for Logging done right, makes your life bright!

Logging done right, makes your life bright!

grhegde09 profile image Gajanan H ・4 min read

I have been working in application development on embedded devices for a few years now. All the embedded platforms that we work on do not have a good development environment. Some platforms don't even have GDB. So we have to fully rely on logs to analyse the issues.

When the only way to debug your code is through logs, some people become lazy and don't log judiciously. Instead, a lot of information is logged, irrespective of the importance of the information or the frequency of logs. When you work in a team of 15+ developers(in which some are lazy!) the logs can easily bloat up. In a recent project, we had so many logs that the logging rate was ~2 Mega Bytes per minute. It is a nightmare to comb through thousands of lines of logs to find the logs you are interested in and analyse issues.

Following are some ways in which we improved our situation.

A better logger

Multiple log levels(duh!):

One of the most important(and common) feature for any logger is the ability to distinguish logs based on their importance. We defined the following four levels for logs.

  1. Error - For fatal error scenarios.
  2. Warning - For non-fatal errors.
  3. Info - For important information which can aid in finding problems easily.
  4. Debug - For everything else. Including function entry, exit logs to a developer's sad story!

We added ability to select the logging level through a config file in the logger module for each run of the application. This helped us a lot. For example, we could just turn off all the logs in debug level and we would have much less logs to go through.

Module level logging:

Though multiple log levels were very helpful, they were not sufficient. What we found is that the ability to log based on modules is also very important. So we created an identifier for each module(say each library we created). We added the ability to select the log level for each of these modules via a config file. This was a boon! Now we could customise the logs however we wanted. For example, if I was debugging Module A which depended on Module B, I could configure the logger module to enable debug level logs in Module A, info level logs in Module B and error level logs for all other modules.

A log with all the information:

Every line in our logger had a lot of information. It included, time at which the log was written, log level, thread name which generated the log, actual log, fully qualified name of the function which generated the log, etc. Having all these fields in each log has helped us a lot in debugging the issues.

Runtime control:

After we had the above features only thing left to build the ultimate logger was to add the ability to do these customisations during run time. The typical behaviour we had in our systems was that once the application is in some problematic state, it would remain in that state. So it would be very helpful to have the ability to customise log levels for modules in run time. So we used a file watcher to watch the config file for changes and whenever a change was detected, reloaded the logger module config in memory. Voila! We had the ultimate logger!

Better awareness about logging

Having a good logger is just a part of the equation. Having logs placed judiciously with appropriate information throughout the code is also very important. Not having logs in important parts of the code is as bad as having unnecessary logs in the code. Following are some thumb-rules we followed to improve out logging.

  • If a module generates too many logs, make the logs periodic instead of event based. For example, if a video encoder is logging all the information for every frame, it will generate too many logs in a second. Instead all the information can be accumulated and logged at an interval say 1 second.
  • Avoid redundancy in logs. i.e. Do not log function name, thread id as part of the log message as it already gets added to log by the logger.
  • Follow the logging levels strictly.
  • Have all the input arguments logged in Info level in the APIs.
  • Have function Entry, Exit logs in Info level in the APIs(as an exception to the other rule.)
  • Default log level for all utilities should be Error. This change can be done after the utilities have been thoroughly tested and not expected to have any bugs.
  • Add a log for every else in your code.

Something extra wouldn't hurt

LogAnalyzer plugin for sublime text

I was a windows user for first 3 years of my professional life. I loved using applications like Notepad++ with Analyse Plugin and LogExpert. I used to love these tools as they helped me in analysing the logs quickly. Since then I have switched to macOs and have struggled to get a good tool to analyse logs. Notepad++ on wine is OK, but key bindings are a pain. TextWrangler is a good tool. But I wish it did more than what it does.

So I decided to write a small plugin on Sublime text to help me analyse the logs. I wrote some dirty code over a weekend to get the basic functionality I wanted and it has pretty much stuck at that state. Please feel free to use/improve the plugin here.

PS: This is my first article. So please shower some love and pardon my mistakes! Do provide your feedback :)

Log well. Live well!

Discussion

pic
Editor guide
Collapse
phlash909 profile image
Phil Ashby

Having also worked on embedded systems for a good number of years including those that end up in space (GDB definitely not an option!), I now encourage a slightly different approach:

  • Write debugging / logging user stories, this helps the team keep in mind why they are adding log lines..
  • Never have a run-time module/level selection mechanism, because that elusive bug will only show up once (en.wikipedia.org/wiki/Heisenbug), instead use a log filter to achieve the same results, while retaining the detail when it's needed.
  • Maybe think of logging / debugging working like a hardware logic analyser, keep as much recent info in a circular buffer as you can afford storage, take snapshots when 'interesting' events occur (like crashes!), that way you have the detail leading up to the event, but are not overwhelmed by volume, nor exhaust storage when everything is working. We caught two live bugs from space using this technique :)
Collapse
fossterer profile image
Shashank Sabniveesu

Good suggestions Phil! I was initially interested to find out what Gajanan meant by

So we used a file watcher to watch the config file for changes and whenever a change was detected, reloaded the logger module config in memory.

But you seem to be advising something different. Can you give an example of how you used

a log filter to achieve the same results
?

Collapse
phlash909 profile image
Phil Ashby

Hi Shashank, Gajanan makes a good point about being able to adjust logging levels at runtime /while an application is exhibiting a fault/, he says: "once the application is in some problematic state, it would remain in that state".

This is an assumption that is not always true (sadly!), hence my approach of creating a logging system as a detailed (debug level) history recorder albeit with limited capacity. Provided faults are detectable (eg: system crash, exceptions), then the debug level history buffer can be saved at that point, hopefully with enough duration & detail to find the root cause. If you are familiar with the concept of core dump files, this is similar, but more manageable than a dirty great memory blob, and with event history, not just a memory snapshot at the point of failure.

If, in addition to crash traces, we also want to create a long term activity log, we would apply a filter (eg: log levels, log sources or both) to the history buffer, and emit these filtered events to suitable storage (eg: log file).

Such a filter can respond to configuration changes in much the same way that Gajanan describes for setting a 'log config' at source, that is: detecting file changes and reloading itself, thereby changing the activity log content. The important difference is that the debug level history buffer and crash dumps are not affected by changes to an activity filter.

FYI: In one system, we had a separate management card that held the event history in RAM for a number of daughter boards, and filtered this information to disk storage. The management card had a CLI, so activity logging could be managed interactively as well as via configuration files (Cisco IOS has similar features).

Thread Thread
grhegde09 profile image
Gajanan H Author

I mostly agree with Phill. However unfortunate it is, the following is true! 1. It may be very hard to reproduce an issue with proper log settings. 2. System may not remain in the problematic state forever.

But like I mentioned the size of the log file on some systems are limited. We have a rotating log file of 50MB size. You may not believe me when I say 50MB is not sufficient to capture the logs for 30 minutes when we set log level to debug/trace. We don't have a way of backing up these logs to some server to analyse later. What we have is just 50MB! So we have to apply filters at the generation point!

We have to smartly design our logging system after considering all the limitations of the target system!

Thread Thread
phlash909 profile image
Phil Ashby

Sucks to be limited to 50MB and no way to siphon off to longer term storage! How about allocating some of that space to a short term circular log at debug level, enough to capture a few seconds of history or a couple of cycles through the main application loop (probably <1MB?), then as/when bad things happen, copy that buffer to a new file and raise the alarm... you'll likely get 50+ chances to see what's going wrong before running out of space and having to remove the oldest logs.

Collapse
benaryorg profile image
#benaryorg

As an ops person there is something that I had to deal with recently that kind of fits in here.

A logfile should consist of lines where every single one is exactly one message.
No interleaved messages, no messages longer than one line.
This way I can view it at runtime, tail it easily, comb through them and filter them.

less has some amazing features like a tail -f mode (using F), or filtering (using &), but if I cannot quickly find out why your app suddenly crashed with just less, then I'd call your logformat garbage.

Sorry for that strong wording, but there's situations where every seconds counts, and these are the situations you should be logging for.

Collapse
aptituz profile image
Patrick Schönfeld

„Sorry for that strong wording, but there's situations where every seconds counts, and these are the situations you should be logging for.“

As another kinda Ops-Person I have to disagree.

Logging has to fit and be useful for several cases, including the situations and avoidance of the situations you describe. It does not make that much sense, to define boundaries for the usefulness of logging by almost the simplest of all tools we have in our belt. That‘s like saying that every dish in the kitchen needs to be cookable with the microwave, because there are situations were ... well, you know: cooking needs to happen quick.

In fact it is the other way round: if your tool is too simple for the job, then maybe you should use different tools. Like, lets say, use the tools made for analysing, storing and querying log data.

Collapse
benaryorg profile image
#benaryorg

if your tool is too simple for the job, then maybe you should use different tools. Like, lets say, use the tools made for analysing, storing and querying log data.

But most of these tools don't belong on a server, or aren't quite applicable in a server-only environment.

Logging has to fit and be useful for several cases, including the situations and avoidance of the situations you describe

And in a lot of cases these extra information you want to be included does not fit in the environment described.

So I am proposing that we just keep writing minimal log-files with information needed to find and fix problems fast, for example as NCSA Common Log Format files.

And for everything else you log to a different sink.
That could be the tools for analysing logs, for example an ELK stack.
You log all the information to the ELK stack and keep on the system only the minimum log-information.

A prime example would be profiling information.
Tideways is really nice for profiling, but you wouldn't want that information to be interspersed within your access log, right?
So you put it in a different sink (in that case directly over the net).

That‘s like saying that every dish in the kitchen needs to be cookable with the microwave, because there are situations were ... well, you know: cooking needs to happen quick.

So in those terms what I want from developers is to always have a microwave pizza on top of the refrigerator so one can make that when in a hurry fast.
If you have time for cooking then you can always spare the time to get out the more hard to reach food below the pizza.

Collapse
ghost profile image
Ghost

I would recommend a different sort of logging level:

Debug - Any info a developer might ever want. This should be as verbose as possible and only be enabled when debugging (duh).
Info - Information you generally want, but don't need for everyday business. Start/Stop of services, certain events happening, etc.
Warning - Warnings are reserved for potential error sources. For instance, when a config file was not found and the default is assumed instead, or when a primary data source is not available and a secondary is used instead.
Error - Errors are for when a process fails irrecoverably, but the application as a whole remains stable. For instance, when a program tries to save a file, but write permission was denied.
Fatal - These logs indicate that tje program encountered an error from which it cannot recover. Include as much information as possible here.

I know this is quite similar to what the original post suggested, but distinguishing between fatal and non-fatal errors is very important.

Collapse
pedromanoel profile image
Pedro Manoel Fabiano Alves Evangelista

Your article is really well written and easy to understand! I would say I'm a beginner in logging and this article really helped.

I'm curious if you have a guideline for how to write log messages and what should be logged. Just like writing commit messages, I have a hard time with this. For instance, sometimes the name of the function is clear enough, and writing a message that simply states the function name with different wording seems to be redundant. What's your experience with this?

Collapse
grhegde09 profile image
Gajanan H Author

Glad that the article helped you.

We have set up our logging macros in such a way that we get the function name which generated the log for each line of the log file. So mentioning from which function the log got generated is totally redundant.
When a function I called returns failure, I generally log the function name which returned failure along with the error code and if possible a brief description of the error code.
Eg:

void myStupidFunction()
{
    char *fileName = "myFile.txt";
    File *pFile = fopen (fileName,"w");
    if(NULL == pFile)
    {
        LOG_ERROR("fopen failed for file:%s with error code:%d and error description:%s",fileName,errno,std::strerror(errno));
    }   
}

Collapse
phlash909 profile image
Phil Ashby

The principle Gajanan demonstrates here is simple: "never hide information from the logs" (credentials being the only exception to this rule!). In this case a function that returns an error.

I would also suggest using a machine parseable textual log format, defined in a logging story for the team, that will make life much easier for operations and your monitoring tools (SIEM, protective monitoring, etc.)

The use of macros (in C anyway) or similar language features to add consistent source code traceability is also a Good Thing (tm), that's what the FILE and LINE macros are for :)

Collapse
kritulrathod profile image
Kritz Rathod

Would be helpful to add a table for logging levels.

Info: Parameters, Calculated Values,
Error: Error Messager, Stack Trace,
Verbos: Function Entry-Exit Points, Code Flow,
etc.

Should be shared with the team for uniformity in logger usage.

Collapse
gregopet profile image
Gregor Petrin

As for log analyzing competition, Lnav is really good. It has filters, context aware navigation, can mesh multiple log files chronologically, it has zoomable histograms and it can even evaluate sql clauses against your log. It's also a single binary to install with no dependencies and you can have it on your server since it works via ncurses. Open source, too.

lnav.org

Collapse
grhegde09 profile image
Gajanan H Author

Looks interesting. Let me give it a try. Thanks for sharing :)

Collapse
paseanodesign profile image
Paesano Design, Inc.

Great stuff, thanks for the post. So many times logging is squeezed out of the project timeline with looming deadlines... :(

Collapse
junaidatique profile image
Junaid Atique

Nice article. I am using loggly to handle logs. It gives the functionality to log on different levels plus distributing modules wise. Another thing I use is to mention the function name along with module. Loggly provide nice searching through the logs.
My only suggestion is to log only required parameters. I used to log complete objects which was disastrous.

Collapse
grhegde09 profile image
Gajanan H Author

Thanks. Loggly seems interesting. Not sure how easy it will be to get it working with our system. I will give it a try.
We have name of the function from which the log is generated for every line of log. That is a very helpful piece of information.
The last point is kinda debatable. If you are accepting objects as arguments to your API, it is better to dump all the input arguments in the log (probably with 'Debug' level). In other cases, Yes, you should log only the parameters you need to debug.

Collapse
vaibks profile image
Vaibhav Shinde

Great article!
Thank you. I am loving this place. 🙂

Collapse
dallgoot profile image
dallgoot

"We added the ability to select the log level for each of these modules via a config file."
i'm very curious on this since i only have a vague idea on how to do it codewise :)

Collapse
asynccrazy profile image
Sumant H Natkar

Logging is really important when you are writing enterprise applications, we use azure blob storage for our logging.

Collapse
zegio profile image
Tony

Happen to have a working snippet for how you handled watching the config file and propagating changes live?

Collapse
grhegde09 profile image
Gajanan H Author

We used QFileSystemWatcher to watch the config file and added a thread-safe method to the logger apply the new config.
You can find some example code for QFileSystemWatcher here

Collapse
uniquer profile image
uniquer

Nice post. Just a thought if logger can capture each module logs % and throughput, that would help to debug the debug logs :)

Collapse
grhegde09 profile image
Gajanan H Author

That's a neat idea :) I will give it a try!