We had a big pile of NGINX access.log files for our site and wanted to quickly know all of the unique paths that had been requested.
If your access.log file(s) follow a reasonably standard format that looks like this:
127.0.154.222 - - [19/Oct/2020:06:26:59 +0000] "GET / HTTP/1.1" 301 178 "-" "-"
.. then you can use this solution:
awk -F\" '{print $2}' access.log | awk '{print $2}' | sort | uniq -c | sort -g
The output will look like this:
[lots of stuff here]
    104 /xmlrpc.php
    114 /wp-includes/wlwmanifest.xml
    121 /robots.txt
    161 /feed/
    336 /
   3056 //xmlrpc.php
  53786 /wp-login.php
So what's going on?
awk -F\" '{print $2}' access.log splits each line on the first quotation mark and returns the second part.
awk '{print $2}' then skips the HTTP verb (GET/POST/PUT/etc.) and prints out the path (which follows the space after the HTTP verb).
sort sorts the output into groups of the same thing which..
uniq -c then turns into a list of the unique paths only. The -c prefixes the output with the number of non-unique lines.
sort -g then sorts the lines in numeric order.
Want the result in descending numeric order? Use sort -gr instead.
 

 
    
Top comments (0)