DEV Community

Peter Cooper
Peter Cooper

Posted on

3 1

List unique paths requested by popularity in NGINX logs (access.log)

We had a big pile of NGINX access.log files for our site and wanted to quickly know all of the unique paths that had been requested.

If your access.log file(s) follow a reasonably standard format that looks like this:

127.0.154.222 - - [19/Oct/2020:06:26:59 +0000] "GET / HTTP/1.1" 301 178 "-" "-"
Enter fullscreen mode Exit fullscreen mode

.. then you can use this solution:

awk -F\" '{print $2}' access.log | awk '{print $2}' | sort | uniq -c | sort -g
Enter fullscreen mode Exit fullscreen mode

The output will look like this:

[lots of stuff here]
    104 /xmlrpc.php
    114 /wp-includes/wlwmanifest.xml
    121 /robots.txt
    161 /feed/
    336 /
   3056 //xmlrpc.php
  53786 /wp-login.php
Enter fullscreen mode Exit fullscreen mode

So what's going on?

awk -F\" '{print $2}' access.log splits each line on the first quotation mark and returns the second part.

awk '{print $2}' then skips the HTTP verb (GET/POST/PUT/etc.) and prints out the path (which follows the space after the HTTP verb).

sort sorts the output into groups of the same thing which..

uniq -c then turns into a list of the unique paths only. The -c prefixes the output with the number of non-unique lines.

sort -g then sorts the lines in numeric order.

Want the result in descending numeric order? Use sort -gr instead.

Heroku

This site is built on Heroku

Join the ranks of developers at Salesforce, Airbase, DEV, and more who deploy their mission critical applications on Heroku. Sign up today and launch your first app!

Get Started

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more