DEV Community

Daniel Kukula
Daniel Kukula

Posted on

2 1

Parse Nginx access log with nimble_parsec

Lately I played with some string parsing and found nimble_parsec a library which does the job perfect. I will show you how to convert nginx logs to something readable in elixir. A typical log line looks like:

127.0.0.1 - - [25/Dec/2020:08:15:53 +0000] 
"GET /img/key.svg HTTP/1.1" 200 8305 
"http://localhost/styles/css/app.css" 
"Mozilla/5.0 Firefox/84.0"
Enter fullscreen mode Exit fullscreen mode

this fields are:

remote_addr - remote_user [time_local] 
"request" status bytes_sent 
"referer" 
"user_agent"
Enter fullscreen mode Exit fullscreen mode

I found the description on stack overflow

To start matching on our string we need to define our parser in a new file:

defmodule NginxParsex do
  import NimbleParsec

  defparsec( :ngnix_parser, integer(1))
end
Enter fullscreen mode Exit fullscreen mode

this just matches a string starting with an integer

iex(1)> log = "127.0.0.1 - - "
iex(2)> NginxParsex.ngnix_parser(log)
{:ok, [1], "27.0.0.1 - - ", %{}, {1, 0}, 1}
Enter fullscreen mode Exit fullscreen mode

What happens here our parser matches the first integer and returns a tuple with :ok, a list of items that matched [1], and the rest of the string. The other items in the tuple are additional data used by the parser.
We can extract our full ip address creating a parser like this:

  ip = 
    integer(min: 1, max: 3)
    |> ignore(string("."))
    |> integer(min: 1, max: 3)
    |> ignore(string("."))
    |> integer(min: 1, max: 3)
    |> ignore(string("."))
    |> integer(min: 1, max: 3)

  defparsec( :ngnix_parser, ip)
Enter fullscreen mode Exit fullscreen mode

and after running it we got

iex(1)>   NginxParsex.ngnix_parser(log)
{:ok, [127, 0, 0, 1], " - - ", %{}, {1, 0}, 9}
Enter fullscreen mode Exit fullscreen mode

as you can see we are extracting all the numbers and omitting the dots, as we don't need all the integers a string is enough so we may also extract it as a string that consists of numbers and dots:

ip = ascii_string([?., ?0..?9], min: 7, max: 15) 
Enter fullscreen mode Exit fullscreen mode

now we get back:

iex(1)> NginxParsex.ngnix_parser(log) 
{:ok, ["127.0.0.1"], " - - ", %{}, {1, 0}, 9}
Enter fullscreen mode Exit fullscreen mode

the next part is not really useful for us " - - " we may either omit it using the ignore macro

  ip = 
    ascii_string([?., ?0..?9], min: 7, max: 15)
    |> ignore(string(" - - "))
Enter fullscreen mode Exit fullscreen mode

and our first test string is parsed

iex(1)>   NginxParsex.ngnix_parser(log)
{:ok, ["127.0.0.1"], "", %{}, {1, 0}, 14}
Enter fullscreen mode Exit fullscreen mode

the problem is the second dash may be a remote_user which we will skip and match on the opening [

  ip = 
    ascii_string([?., ?0..?9], min: 7, max: 15)
    |> ignore(eventually(ascii_char([?[])))
Enter fullscreen mode Exit fullscreen mode

as our current log was missing this part I'll add it:

log = ~s(127.0.0.1 - - [25/Dec/2020:08:15:53 +0000] )
iex(1)>   NginxParsex.ngnix_parser(log)
{:ok, ["127.0.0.1"], "25/Dec/2020:08:15:53 +0000]", %{}, {1, 0}, 15}
Enter fullscreen mode Exit fullscreen mode

Next part we need to match is the date string - this is very well explained in the documentation we just need to make some adjustments. also we will extract our date and time parsers. Now our module looks like this:

defmodule NginxParsex do
  import NimbleParsec

  ip = 
    ascii_string([?., ?0..?9], min: 7, max: 15)

  date =
    integer(2)
    |> ignore(string("/"))
    |> ascii_string([?a..?z, ?A..?Z], 3)
    |> ignore(string("/"))
    |> integer(4)

  time =
    integer(2)
    |> ignore(string(":"))
    |> integer(2)
    |> ignore(string(":"))
    |> integer(2)
    |> ignore(string(" "))
    |> ignore(ascii_char([?-, ?+]))
    |> ignore(integer(4))


  defparsec( :ngnix_parser,
    ip
    |> ignore(eventually(ascii_char([?[])))
    |> concat(date)
    |> ignore(string(":"))
    |> concat(time)
    |> ignore(string("] "))
  )
end
Enter fullscreen mode Exit fullscreen mode

When running the code we got:

iex(1)> NginxParsex.ngnix_parser(log)                        
{:ok, ["127.0.0.1", 25, "Dec", 2020, 8, 15, 53], "", %{}, {1, 0}, 43}
Enter fullscreen mode Exit fullscreen mode

Cool our date and time is parsed, we can add more stuff from the line

log = ~s(127.0.0.1 - - [25/Dec/2020:08:15:53 +0000] "GET /img/key.svg HTTP/1.1" 200 8305 "http://localhost/styles/css/app.css" "Mozilla/5.0 Firefox/84.0")
Enter fullscreen mode Exit fullscreen mode

Now we need to match the string inside quotes:

  string_in_quotes =
    ignore(ascii_char([?"]))
    |>  ascii_string([not: ?"], min: 1)
    |> ignore(ascii_char([?"]))

  defparsec( :ngnix_parser,
    ip
    |> ignore(eventually(ascii_char([?[])))
...
    |> concat(string_in_quotes)
  )
Enter fullscreen mode Exit fullscreen mode

our result:

NginxParsex.ngnix_parser(log)                                                                            
{:ok, ["127.0.0.1", 25, "Dec", 2020, 8, 15, 53, "GET /img/key.svg HTTP/1.1"],
 " 200 8305 \"http://localhost/styles/css/app.css\" \"Mozilla/5.0 Firefox/84.0\"",
 %{}, {1, 0}, 70}
Enter fullscreen mode Exit fullscreen mode

whats left are some spaces, numbers and two quoted string - we can reuse the parts we already have and our full parser looks now:

  defparsec( :ngnix_parser,
    ip
    |> ignore(eventually(ascii_char([?[])))
    |> concat(date)
    |> ignore(string(":"))
    |> concat(time)
    |> ignore(string("] "))
    |> concat(string_in_quotes)
    |> ignore(string(" "))
    |> integer(min: 1)
    |> ignore(string(" "))
    |> integer(min: 1)
    |> ignore(string(" "))
    |> concat(string_in_quotes)
    |> ignore(string(" "))
    |> concat(string_in_quotes)
Enter fullscreen mode Exit fullscreen mode

the result is now:

{:ok,
 ["127.0.0.1", 25, "Dec", 2020, 8, 15, 53, "GET /img/key.svg HTTP/1.1", 200,
  8305, "http://localhost/styles/css/app.css", "Mozilla/5.0 Firefox/84.0"], "",
 %{}, {1, 0}, 144}
Enter fullscreen mode Exit fullscreen mode

to retrieve the data we can pattern match on the result:

  {:ok, 
[ ip, day, month, year, hour, minute, seconds, request, code, size, referrer, user_agent ],
 _, _, _, _} = NginxParsex.ngnix_parser(log)
{:ok,
 ["127.0.0.1", 25, "Dec", 2020, 8, 15, 53, "GET /img/key.svg HTTP/1.1", 200,
  8305, "http://localhost/styles/css/app.css", "Mozilla/5.0 Firefox/84.0"], "",
 %{}, {1, 0}, 144}
iex(111)> user_agent
"Mozilla/5.0 Firefox/84.0"
Enter fullscreen mode Exit fullscreen mode

now when we got all variables we can process them and wrap it in a map

  @month_map %{
    "Jan" => 1,
    "Feb" => 2,
    "Mar" => 3,
    "Apr" => 4,
    "May" => 5,
    "Jun" => 6,
    "Jul" => 7,
    "Aug" => 8,
    "Oct" => 9,
    "Sep" => 10,
    "Nov" => 11,
    "Dec" => 12
  }
  %{
    ip: ip,
    date: Date.new!(year, @month_map[month], day),
    time: Time.new!(hour, minute, seconds),
    request: request,
    code: code,
    size: size,
    referrer: URI.decode(referrer),
    user_agent: user_agent
 }
%{
  code: 200,
  date: ~D[2020-12-25],
  ip: "127.0.0.1",
  referrer: "http://localhost/styles/css/app.css",
  request: "GET /img/key.svg HTTP/1.1",
  size: 8305,
  time: ~T[08:15:53],
  user_agent: "Mozilla/5.0 Firefox/84.0"
}

Enter fullscreen mode Exit fullscreen mode

Thanks for reading - I tested it on a 2MB file that I have on my local machine and it can parse it all to the end.
A full file we wrote today can be found below:

defmodule NginxParsex do
import NimbleParsec
ip =
ascii_string([?., ?0..?9], min: 7, max: 15)
date =
integer(2)
|> ignore(string("/"))
|> ascii_string([?a..?z, ?A..?Z], 3)
|> ignore(string("/"))
|> integer(4)
time =
integer(2)
|> ignore(string(":"))
|> integer(2)
|> ignore(string(":"))
|> integer(2)
|> ignore(string(" "))
|> ignore(ascii_char([?-, ?+]))
|> ignore(integer(4))
string_in_quotes =
ignore(ascii_char([?"]))
|> ascii_string([not: ?"], min: 1)
|> ignore(ascii_char([?"]))
defparsec( :ngnix_parser,
ip
|> ignore(eventually(ascii_char([?[])))
|> concat(date)
|> ignore(string(":"))
|> concat(time)
|> ignore(string("] "))
|> concat(string_in_quotes)
|> ignore(string(" "))
|> integer(min: 1)
|> ignore(string(" "))
|> integer(min: 1)
|> ignore(string(" "))
|> concat(string_in_quotes)
|> ignore(string(" "))
|> concat(string_in_quotes)
)
end
view raw nginx_parsex.ex hosted with ❤ by GitHub

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay