Dave McAllister for Scalyr

Posted on Oct 15, 2019 • Originally published at scalyr.com

An In-Depth Guide to Nginx Metrics

#programming #monitoring #software

In our guides Zen and the Art of System Monitoring and How to Monitor Nginx: The Essential Guide, we cover our monitoring philosophy. We also recommend a specific set of metrics to monitor and alerts to set for maximum Nginx happiness.

Here, we'd like to dive into the nitty-gritty of those essential Nginx metrics. We'll discuss what exactly they mean and why they're important. This will also serve as a primer for some of the (perhaps esoteric) terminology associated with web servers.

You can think of this as a companion the official Nginx documentation and as an appendix to our Nginx monitoring guide.

For now, this guide covers only the metrics available via ngx_http_stub_status_module, plus those associated with the F/OSS version of Nginx. More comprehensive metrics are available atngx_http_status_module. (This is included with the commercial version, Nginx Plus.)

So roll up your sleeves, grab your Slanket or Snuggie, and let's talk Nginx metrics.

Data Sources

Metrics are available from two sources:

Nginx Status Modules: This is the most direct way to get the goods. You can get data by polling a configurable web page (such as /status), or through variables written to log files. Polling is the best way to go, since Nginx does not provide embedded variables for everything in the status module.
Log Files: Nginx, like most web servers, maintains an "access log" with a record of each request handled. You can use log_format to instruct Nginx to include variable values in each log record. Monitoring tools that can read log records can use these variables to create more metrics.

It's All About the Connections

The status module metrics are mostly about connections, so it's worth talking about those in more detail.

An Nginx server generally plays two roles. Its primary role is to serve content to clients. Content can be web pages, images, video, or raw API data. Clients used to be only web browsers, but now they include native mobile apps and other servers that consume your APIs.

Nginx's secondary role is as a proxy between clients and "upstream servers." Upstream servers do more of the heavy lifting for a given request. They run application servers like Unicorn for Ruby on Rails, or Tomcat for Java. These application servers handle the "dynamic" part of dynamic web pages.

As a proxy, Nginx can do a few things really, really well. It can serve local static content (images and non-dynamic HTML files) very quickly. It can act as a cache for content from upstream servers, and it can load-balance requests among several upstream servers.

Establishing Connections

Before servers and clients can talk to each other, they have to establish a formal network-level connection. Application-level data flows over these connections via the higher-level protocols that Nginx supports. These protocols are most commonly HTTP. But they can also be WebSockets, SPDY, TCP streams, and even several mail protocols.

One layer down from the application data (at the TCP layer of networking), a "handshake" process occurs. This handshake negotiates the establishment of a connection. The specific details are beyond the scope of this article, but you can learn more on Wikipedia if you're so inclined. For even more on the specifics of connections and how they relate to HTTP, see A Software Developer's Guide to HTTP - Connections.

Nginx connections can be in one of several states:

Accepted: A connection moves to the Accepted state after the TCP handshake. It then takes one of three sub-states:

Idle / Waiting: This connection is not currently sending or receiving data. This happens between the end of the request and the start of the response, or between a finished response and the next request. As of HTTP 1.1, all connections are persistent (i.e., remain open) unless declared otherwise.

Reading: Nginx is reading data from the client.

Writing: Nginx is writing data to the client.

Handled: This means nginx has finished writing data to the client. It has successfully finished and closed the request.

Dropped: This means nginx ended the connection before finishing the request. (This usually happens because of a resource or configuration limit.)

Nginx Status Module Metrics

Now let's take a look at the metrics available via ngx_http_stub_status_module:

Active connections: The current number of active (accepted) connections from clients. Includes all connections with the statuses Idle / Waiting, Reading, and Writing.

accepts : The total number of accepted connections from clients since the nginx master process started. Note that reloading configurations or restarting worker processes will not reset this metric. If you terminate and restart the master process, you will reset the metric.

handled : The total number of handled connections from clients since the nginx master process started. This will be lower than accepts only in cases where a connection is dropped before it is handled.

requests: The total number of client requests since the nginx master process started. A request is an application-level (HTTP, SPDY, etc.) event. It occurs when a client requests a resource via the application protocol. A single connection can (and often does) make many requests. So most of the time, there are more requests than accepted/handled connections.

Reading : The current number of (accepted) connections from clients where nginx is reading the request. Measured at the time the status module was queried.

Writing : The current number of connections from clients where nginx is writing a response back to the client.

Waiting : The current number of connections from clients that are in the Idle / Waiting state.

Default Access Log Variables

It's beyond the scope of this guide to dive into every nginx log variable. Instead, we're going to take a close look at a few variables that are particularly important when it comes to monitoring.

The default nginx access log (obtained by declaring log_format combined in your configuration file) uses the following variables:

$body_bytes_sent : The number of bytes sent to the client as the response body (not including the response header). This allows you to monitor individual response sizes. It also gives you a rough measure of outbound bandwidth.

$http_referer : The HTTP Referer header from the incoming HTTP request. The browser sets this header to the URL of the page that links to the current requested resource.

Two notes on this:

The word "referrer," at least in English, is spelled with two Rs, but the original misspelling from the HTTP spec (RFC 1945) managed to stick around.
Since the client that makes the request also sets this header, it may not always be accurate. Malicious clients can change or "spoof" this header, which results in referral spam.

$http_user_agent : The User-Agent header from the incoming HTTP request. This identifies the specific browser, bot, or other software that issued the request. It can include other info, too, such as the client's operating system. The format is slightly different between human-operated web browsers and bots, but the theme is the same. Wikipedia has some awesome nitty-gritty details on user agent strings.

$remote_addr : The IP address of the client making the request. If the request passed through an intermediate device, such as a NAT firewall, web proxy, or your load balancer, this will be the address of the last device to relay the request.

$remote_user : The username supplied if HTTP Basic authentication is used for the request.

$request : The raw HTTP request line. An example of a (familiar) request line is as follows:

GET /community/guides/an-in-depth-guide-to-nginx-metrics/ HTTP/1.1

This is actually a compound variable made from three sub-variables (each of which is accessible individually if needed):

$request_method $request_uri $server_protocol

Let's break down of these variables:

$request_method : The HTTP method of the request. The most common methods used by browsers are GET and POST, but the spec also includes HEAD, PUT, DELETE, OPTIONS, TRACE, and CONNECT. (You can find details on each in the preceding link.)
$request_uri : The URI of the requested page, including query arguments. Even if nginx returns a different resource to the client, for instance because mod_rewrite was used, this field will still log the URI of the original request.
$server_protocol : The application-level protocol and version used in the request. You'll most likely see HTTP/1.0 or HTTP/1.1, but nginx supports SPDY, WebSockets, and several mail protocols as well.

$time_local : The local (server) time the request was received. Shown in the format dd/mm/yyyy:hh:mm:ss UTC-offset.

$status : The numeric HTTP Status Code of the response. This is a key variable to monitor. It tells you about errors, missing pages, and other unusual events.

Additional Request Variables

On top of the defaults, check out these variables to track even more metrics.

$bytes_sent : The total number of bytes sent to the client in the response, including headers. This is similar to $body_bytes_sent, but it provides a more complete picture.

$content_length : The HTTP content length request header field. This is the total size (in bytes) of the body of the request, as sent by the client.

$request_length : The full request length (in bytes). This includes the request line, header, and body, as calculated by nginx.

If you're interested in monitoring overall incoming bandwidth, use $request_length. Because $content_length is drawn from a request header, the client calculates it. This means it has the potential to be spoofed (in the case of a DDoS attack, for example).

$request_timeand $upstream_response_time — $request_time is the total time taken for nginx (and any upstream servers) to process a request and send a response. Nginx measures time in seconds with millisecond resolution. This is the primary source you should use for your server's response time metric.

Nginx starts the clock as soon as it reads the first bytes from the client request. It stops the clock after it sends the last bytes to the client. Note that this includes the processing time for upstream servers. If you want to break out those metrics, use $upstream_response_time, which only measures the upstream server's response time.

The language can be a bit confusing here, so let's clarify. Even though the variable is "request" time, it actually measures the elapsed time of the full request-response cycle (from the nginx server's perspective).

$uri : The current URI of the request. Internal to nginx, this value may change during request processing (i.e., in the case of rewrites). The value that nginx logs represents the URI of the resource the client receives.

This differs from $request_uri in that $request_uri does not reflect any URL rewrites internal to the nginx server.

$gzip_ratio : The compression ratio of the response (the ratio between the original and compressed response sizes). This applies if you've enabled gzip response compression. Through ngx_http_gzip_module, this feature pipes responses through gzip before sending them to the client. It can reduce the size of responses by 50% or more and provide significant outbound bandwidth savings. Even though gzip is extremely fast, there is still material overhead in the compression process, both in terms of CPU usage and response time. Some of this overhead is worth the savings in transfer time from serving a smaller file.

Be sure to monitor your resource usage closely if you decide to use gzip compression. Further details about compression and decompression are available from NGINX directly.

$connection : The connection serial number. Nginx assigns a unique number to each connection. When multiple requests occur on a single connection, they will all have the same connection serial number. Serial numbers reset when the master nginx process terminates. This means serial numbers get reused, so they won't be unique for long periods of time.

$connection_requests : The number of requests made through this $connection.

Additional Server Level Variables

$host : The client used this DNS name to find your server, as seen in the Host HTTP header. If that header is empty or missing, nginx substitutes the name in the first server_name directive in your nginx configuration.

$http_HEADERNAME and $upstream_http_HEADERNAME : This follows the pattern of $http_referer and $http_user_agent above. Nginx allows you to log any HTTP request headers by referencing $http_ and the header name. (You must convert the header name to lowercase and replace dashes with underscores.)

You can also access the headers returned by any upstream servers by adding "upstream_" to the front of the desired header name.

$msec : The current time, in milliseconds, from the Unix epoch of 1/1/1970. This allows you to determine the exact time at which a request took place.

Open Question: Is this the time that nginx receives the request, or the time that nginx writes the log statement after the completion of the request?

$pid : The Process ID (PID) of the nginx worker that handled the request. This is to track metrics and workload of each worker individually.

Note: Worker processes can (and do) crash. The nginx master process can also stop and restart worker processes, so PIDs can come and go. The underlying server can also reuse PIDs.

$server_addr and $server_name : The IP address ($server_addr) or name ($server_name) of the nginx server that accepted a request. This is useful in a multi-server, load-balancing environment. In that case, you need to monitor which requests (and which metrics) each server handles.

Note: Computation of the IP address requires a system call unless you specifically bind an address using the listendirective in your configuration. Keep in mind that system calls can add significant overhead and impact server performance. Unless you need the full IP, $server_name is a better choice for many installations.

This post was updated by Casey Dunham. Casey recently launched his own security business, is known for his unique approaches to all areas of application security, stemming from his 10+ year career as a professional software developer. His strengths include secure SDLC development consulting; threat modeling; developer training; and auditing web, mobile, and desktop applications for security flaws.

To find out more on Scalyr and monitoring Nginx check out our docs. Or Try Scalyr for yourself.

This blog is an updated version of one originally written in 2017.

DEV Community