Corey Cleary

Posted on Feb 5, 2019 • Originally published at coreycleary.me

Why should your Node.js application not handle log routing?

#javascript #node #devops

Originally published at coreycleary.me. This is a cross-post from my content blog. I publish new content every week or two, and you can sign up to my newsletter if you'd like to receive my articles directly to your inbox! I also regularly send cheatsheets and other freebies.

It is not the responsibility of the application to route logs.

12 Factor says that logs should go to STDOUT. WAT? WHY?

I just configured my whole application code to write logs to custom log files. What's wrong with that?

Logging is one of those things that can sometimes be a black box for developers. Maybe you have a dedicated DevOps person who takes care of logging infrastructure for you, or maybe it's your first time working on this side of things.

It can be one of those things you leave until last to take care of while you're too busy working on writing code. Many do, making the "best practices" around logging seem like something you can just ignore, if you even understand them in the first place...

We're going to take a look at deciphering the reasons behind the best practices of decoupling your logging from your application, and where you should actually log to. And for purposes of this post, "log routing" - as referenced in the title - refers to picking up and pushing the logs to an intended logging target that is not your application or application process.

Best practices illuminated

You may have heard of the 12 Factor App before, which is considered the canonical "best practices" document around creating modern, scalable applications.

From the "12 Factor App best practices regarding logs":

A twelve-factor app never concerns itself with routing or storage of its output stream. It should not attempt to write to or manage logfiles. Instead, each running process writes its event stream, unbuffered, to stdout.... In staging or production deploys, each process’ stream will be captured by the execution environment, collated together with all other streams from the app, and routed to one or more final destinations for viewing and long-term archival. These archival destinations are not visible to or configurable by the app, and instead are completely managed by the execution environment.

That's a lot to decipher, so let's break it down.

"A twelve-factor app never concerns itself with routing or storage of its output stream."

The first major reason why you don't want your application code to handle routing of logs itself is due to separation of concerns. We often think of this separation in terms of pieces of code between services and between services themselves, but this applies to the more "infrastructural" components as well. Your application code should not handle something that should be handled by infrastructure.

This code below is an example of highly coupled application code.

const { createLogger, transports, winston } = require('winston');
const winston-mongodb = require('winston-mongodb');

// log to two different files
const logger = createLogger({
  transports: [
    new transports.File({ filename: 'combined.log' }),

  ],
  exceptionHandlers: [
    new transports.File({ filename: 'exceptions.log' })
  ]
});

// log to MongoDB
winston.add(winston.transports.MongoDB, options);

Let's leave the deployment environment concerns out of it for a moment, which we'll look at later, and instead focus on the application itself.

Just by having the application handle logging, it now has taken on another "concern" under its wing. By defining what the logging outputs are, the application now handles both application/business logic AND logging logic.

What if you need to change your logging location later? That's another code change and deployment (and more if you have a strenuous QA/change control/deployment process). And what if you get a logfile name wrong? Again, another code change and deployment.

This is not to say your application should take an extreme stance towards logging and avoid log statements as well - you do have to log something, after all - but it is to say that the log routing adds another layer which does not belong in the application if you want to decouple components of your code and keep your application code clean.

Next,

"It should not attempt to write to or manage logfiles. Instead, each running process writes its event stream, unbuffered, to stdout." (Side note: although it specifically mentions `stdout`, I take it to mean `stdout` and `stderr`, and cursory Google searches seem to confirm this.)

I already discussed above why logging to outputs like files and databases is not a good practice from a separation of concerns perspective. But this is where the environment concerns start to be addressed.

In Node.js applications, you are still logging to something and that is the console (using usually either console.log() or console.error()).

The console, under the hood, prints to stdout for console.log() and stderr for console.error(), so simply by using this module, it looks like we pass this test.

And this test exists for a reason: if you've worked with physical or even virtual (but not container/cloud) servers before, you might have had only a handful of them, or at least a size that was manageable enough to manually configure the logfiles, their locations, and any other setup.

Now imagine your application has had some big success and is onboarding hundreds of new users everyday. Your team has begun migrating to a cloud-based environment, and you have to plan for your application scaling on-demand from 1 instances to 50. You won't know where those instances are running, so you can't control where exactly the logfiles get written to.

It's more useful to have stream | target, as opposed to target -> (your routing solution) -> target. Streams give us the ability to pipe anywhere, composing together powerful pipelines. If you've ever used Linux/Unix, you can build up powerful operations simply by piping streams together, like searching for text within a file: cat example.txt | grep sometext. stdout/stderr gives you this power. For example, you could pipe from stdout to a logfile if you wanted to.

Furthermore, cloud applications are ephemeral. They can spin up, spin down, crash, etc. which means the logs are ephemeral too.

So while we started out looking at why an application should not handle routing logs to files/databases/other persistent storage targets, this brings up the question: is it ok to log to those targets at all?

Next,

"In staging or production deploys, each process’ stream will be captured by the execution environment, collated together with all other streams from the app, and routed to one or more final destinations for viewing and long-term archival. These archival destinations are not visible to or configurable by the app, and instead are completely managed by the execution environment."

This helps answer that question. It's OK to route the logs to persistent storage (and you, in fact, absolutely should) if the execution environment does this routing from the stdout/stderr logs.

This also re-affirms the separation of concerns covered earlier. We can't be sure where a logfile might end up. And if a container crashes - and logfiles weren't being picked up by a log router in the first place - you're screwed. Good luck debugging the reason your application crashed in the first place.

Cool, but then how do you manage logs in production? Is there a tool that picks up whatever is sent to stdout/stderr?

This is actually where the log routing piece comes in, the whole thing this post has attempted to dissuade you from handling from within your application code.

For simplicity's sake, assume you're using Docker for your containers as part of your cloud environment. The Docker daemon that runs on your Docker host - not your container - will by default pick up the logs from stdout/stderr from your container(s).

You configure the Docker daemon to use a logging driver, which does the actual log routing work of picking them up and routing them to a given storage target, like so:

In the daemon.json file,

{
  "log-driver": "splunk", // just using Splunk as an example, it could be another storage type
  "log-opts": {
    "splunk-token": "",
    "splunk-url": "",
    ...
  }
}

You can view a list of logging drivers - which, again, do the work of picking up the logs and routing them - supported by Docker here. The list includes Greylog, Splunk, syslog, and other log aggregators you might be familiar with.

Routing the logs somewhere is important so that, in case your application crashes, boots up with scaling up, shuts down with scaling down, you have a persistent storage location from which to view them.

But it's important that this is done at the infrastructural level, for the reason discussed above.

A complete logging picture based on what's been discussed here would look like:

Wrapping up

To summarize the reasons you don't want to handle routing from your application and, by extension, to something other than stdout/stderr:

keeping the log routing responsibility out of your application code:
- keeps the code cleaner
- makes log routing locations easier to change without deployments
scaling applications/containers means it's harder to have control over logfiles
scaling applications also means they're more ephemeral, meaning logfiles might not be there depending on the state of the container
writing to, say a file or database, over stdout/stderr ties you down to those log targets, takes away your flexibility to pipe the output of stdout/stderr to whatever targets you want, and change this on the fly

To address one last question you might be having: what if you're not using a cloud environment or containers?

My thoughts on this are as follows. The approach I've laid out here is still useful, because:

you may one day move from physical or virtual servers to a cloud / container approach, making the migration path much easier on yourself or the team that will be doing the work
you still keep that separation of concerns
you can always just pipe the `stdout` to a log file or other persistent storage target and get the same advantages as a Docker daemon would provide

As you are working on implementing logging or reviewing your current logging code - if you are deciding between using a logging framework versus console.log() and console.error(), I wrote a post on that that can help you make the decision here. Just make sure keep this post here in mind and write to stdout/stderr from the logging framework unless you absolutely have reason to write to something else.

I'm writing a lot of new content to help make Node and JavaScript easier to understand. Easier, because I don't think it needs to be as complex as it is sometimes. If you enjoyed this post and found it helpful here's that link again to subscribe to my newsletter!

Top comments (7)

panta82 • Feb 6 '19

The only caveat to this is that log style that is good to appear during development in terminal (colors, descriptive text messages) is different from log style that is good for production (meatadata, timestamps). So I would still use a logging library instead console.log, just configure it differently in dev and prod.

kurisutofu • Feb 6 '19

Good article!
And good timing as far as I'm concerned :)

I'm writing an API and outputting to console.log/console.error and was wondering if it was better/recommended to write a log function early and use it everywhere or keep my output to the stdout.
I was supposed to research that tonight so you saved me time!

Mihail Malo • Feb 24 '19

I would definitely use a function, since you will need to timestamp and tag your logs, log structured data, etc.

And what if you want to extract part of the application into an internal library?Libraries shouldn't touch IO, they can only accept a logger injection.

And what if you want to use the code in a frontend web application? Who will route your logs there?

So yeah, always use a function/library, just don't try to connect to the logging server instead of stdout yourself, especially in a stateless deployment.

Mihail Malo • Feb 22 '19 • Edited

const winston-mongodb

That's an interesting identifier you got there.
Not valid, but interesting.

Anthony Jackman • Feb 5 '19

@corey . Awesome article. Right article, right time. I am getting to that point in my own project. Now to investigate how to get it done using nginx hosted on Azure and DigitalOcean. Thank you!