I have been working in application development on embedded devices for a few years now. All the embedded platforms that we work on do not have a good development environment. Some platforms don't even have GDB. So we have to fully rely on logs to analyse the issues.
When the only way to debug your code is through logs, some people become lazy and don't log judiciously. Instead, a lot of information is logged, irrespective of the importance of the information or the frequency of logs. When you work in a team of 15+ developers(in which some are lazy!) the logs can easily bloat up. In a recent project, we had so many logs that the logging rate was ~2 Mega Bytes per minute. It is a nightmare to comb through thousands of lines of logs to find the logs you are interested in and analyse issues.
Following are some ways in which we improved our situation.
A better logger
Multiple log levels(duh!):
One of the most important(and common) feature for any logger is the ability to distinguish logs based on their importance. We defined the following four levels for logs.
- Error - For fatal error scenarios.
- Warning - For non-fatal errors.
- Info - For important information which can aid in finding problems easily.
- Debug - For everything else. Including function entry, exit logs to a developer's sad story!
We added ability to select the logging level through a config file in the logger module for each run of the application. This helped us a lot. For example, we could just turn off all the logs in debug level and we would have much less logs to go through.
Module level logging:
Though multiple log levels were very helpful, they were not sufficient. What we found is that the ability to log based on modules is also very important. So we created an identifier for each module(say each library we created). We added the ability to select the log level for each of these modules via a config file. This was a boon! Now we could customise the logs however we wanted. For example, if I was debugging Module A which depended on Module B, I could configure the logger module to enable debug level logs in Module A, info level logs in Module B and error level logs for all other modules.
A log with all the information:
Every line in our logger had a lot of information. It included, time at which the log was written, log level, thread name which generated the log, actual log, fully qualified name of the function which generated the log, etc. Having all these fields in each log has helped us a lot in debugging the issues.
After we had the above features only thing left to build the ultimate logger was to add the ability to do these customisations during run time. The typical behaviour we had in our systems was that once the application is in some problematic state, it would remain in that state. So it would be very helpful to have the ability to customise log levels for modules in run time. So we used a file watcher to watch the config file for changes and whenever a change was detected, reloaded the logger module config in memory. Voila! We had the ultimate logger!
Better awareness about logging
Having a good logger is just a part of the equation. Having logs placed judiciously with appropriate information throughout the code is also very important. Not having logs in important parts of the code is as bad as having unnecessary logs in the code. Following are some thumb-rules we followed to improve out logging.
- If a module generates too many logs, make the logs periodic instead of event based. For example, if a video encoder is logging all the information for every frame, it will generate too many logs in a second. Instead all the information can be accumulated and logged at an interval say 1 second.
- Avoid redundancy in logs. i.e. Do not log function name, thread id as part of the log message as it already gets added to log by the logger.
- Follow the logging levels strictly.
- Have all the input arguments logged in Info level in the APIs.
- Have function Entry, Exit logs in Info level in the APIs(as an exception to the other rule.)
- Default log level for all utilities should be Error. This change can be done after the utilities have been thoroughly tested and not expected to have any bugs.
- Add a log for every
elsein your code.
Something extra wouldn't hurt
LogAnalyzer plugin for sublime text
I was a windows user for first 3 years of my professional life. I loved using applications like Notepad++ with Analyse Plugin and LogExpert. I used to love these tools as they helped me in analysing the logs quickly. Since then I have switched to macOs and have struggled to get a good tool to analyse logs. Notepad++ on wine is OK, but key bindings are a pain. TextWrangler is a good tool. But I wish it did more than what it does.
So I decided to write a small plugin on Sublime text to help me analyse the logs. I wrote some dirty code over a weekend to get the basic functionality I wanted and it has pretty much stuck at that state. Please feel free to use/improve the plugin here.
PS: This is my first article. So please shower some love and pardon my mistakes! Do provide your feedback :)