Serverless architectures such as AWS Lambda have created new challenges in debugging code. Without a solid logging framework in place, you could waste hours, or even days, tracking down simple defects in your functions. A strategic logging framework can be a powerful way to track down and resolve bugs.
Let’s walk through how to get the most out of logging Lambda functions. We’ll set up and troubleshoot code to find the root cause of a defect, look at some best practices for logging Lambda functions, and explore setting up alerts.
Several years ago, leaving logging to an afterthought was common practice, and often seen as “good enough.” You’d push your code to production and wait. When something went wrong, you’d launch your debugger, step through your code, and track down the issue.
Now, however, with serverless architectures such as Lambda functions, stepping through code is not a simple task. Therefore, it’s essential to create a logging strategy before your defects happen. Otherwise, you might find yourself wasting hours, or even days, trying to figure out why your code keeps failing.
AWS has built-in logging for Lambda functions called CloudWatch that works for basic purposes. Recent updates, such as CloudWatch Analytics, have made the product more useful. Anything you send to console.log() in your function will be sent to CloudWatch and visible through the AWS console.
However, a log management tool like SolarWinds® Papertrail™ gives you features that CloudWatch doesn’t support, such as live tail mode (viewing logs in real time), aggregation of logs across all your services or even platforms and providers, and the ability for you (or your whole team) to monitor your logs without living in the AWS console.
For our example, we’re going to use the PaperWatch tool to create a second Lambda function that transfers our logs from CloudWatch to Papertrail. We won’t cover the details here, but you can do this yourself by following these detailed instructions in the Papertrail documentation.
Let’s take a look at our Lambda function. Our function is written in Node.js and retrieves the latest price of bitcoin.
As you can see, our function calls an external API (coinmarketcap.com) to get the latest information on bitcoin (which has an ID of 1). Our function then parses the response to get the price and returns the price to the user.
This is relatively simple code, and it works well when we test it. We go ahead and deploy this code to production and for the first few weeks, there aren’t any issues. Several weeks later the price of bitcoin jumps, and our function becomes very popular. Suddenly, our users start receiving intermittent errors and our function has stopped working. Worst of all, it’s failing silently, and seemingly randomly, for no obvious reason. People are complaining and we need to get our function working again.
A little logging in our function will go a long way towards debugging our issue. Let’s add in the log statements we should have added before we went live.
Now let’s redeploy and monitor our logs using live tail mode so that we can see, in real time, what’s happening when users call our code.
Thanks to the logs, the issue is now obvious. The external function we’re calling is rate-limited, and we’ve hit our free, unpaid limit. There are too many calls happening too fast. And since we didn’t write code to handle the case of the external call failing, the function is failing.
This would have been an especially difficult defect to track down since the conditions for failure (heavy load) most likely only exist in production, and the rate-limit resets every sixty seconds. But with the appropriate log statements, and with the ability to see the logs in real time as users call our function, finding and addressing the issue is quick and easy.
For a more robust function, our next steps would be to pay for our access so that the rate limits are removed, and to add a check for the various response codes our external call might return so that we handle the errors appropriately. But while we’re in Papertrail, let’s go ahead and set up an alert so that if a rate call limit happens again, we’ll get an email.
Setting up an alert in Papertrail is easy. We simply search for the type of log entry we want to trigger the alert, then save that search using “Save & Setup an Alert.”
Now we enter the details of our alert and save.
Setting up logging with AWS and Papertrail is simple, but very important. Logging is easy to take for granted until something goes wrong. Without the logging in place, we’d have spent a long time trying to figure out why our function was failing. Papertrail allowed us to not only see the logs easily, but to see that failure in real time, allowing us to quickly debug, fix, and redeploy.