Useful Commands for Log Analysis

#logs #forensics #bash #cybersecurity

Over the past few months, I’ve been performing more and more analysis in Linux environments and had the opportunity to refine my go-to commands, picking up a few new (to me) tricks. In this post, I’ll share some of the techniques I like to use and encourage you to share other tips/tricks you’ve used to perform analysis on Linux systems, either as your analysis environment or as your target evidence (or both)!

This is written at the introductory level, to help those who may have not have experienced performing analysis within a bash/sh/zsh (or other) command line environment before.

Warning: this post does not contain Python

Spoiler — I didn’t get as far this weekend on this post as I wanted, so I’ll keep this one short and put another one out soon with more tips & tricks.

For those newer to the command line

If you are newer to the bash command line, please use the manual (man) pages for documentation. It takes a little while to understand how they are written and where the detail you are looking for lives, so it is best to start using them early. Here's an example of using the man command to learn more about ls:

Common man pages are found on linux.die.net, though they should be available on the same system as where the command is found, as shown above. There are also great sources such as explainshell.com and Julia Evan's reference illustrations & cheat sheets (such as this on about the man command).

For those experienced with the command line please share your favorite resources!

Working with logs

Log data is not only a very common source of evidence on Linux platforms but is also easier to work with at the command line since it is, generally, semi-structured text data.

Identifying the most interesting content in unknown logs

As part of the process we, in DFIR, like to preview and get a sense of what is useful versus what is noise in a log file. To assist with this data reduction, we can use a few tools and processes to cut down on review time.

One trick I like to employ is the use of less in combination with grep -v. In the example below, we will be looking at a server's auth.log, a common log file, and we are interested in seeing successful authentications. While some of us may know to start looking for strings such as "Accepted publickey" from experience, we will walk through getting to that point using the grep -v method. While grep is a great utility for searching datasets, we want to find the inverse of our pattern and need to use the-v parameter:

As seen above, we add keep adding new log messages to remove from our output, until we see something of interest (ie the Accepted publickey statement). Now, we can instead grep for Accepted publickey as shown below:

A few notes on this method:

We can use OR statements (|) to form one larger grep statement, though do what is most comfortable for you
If the log dataset is small enough it may be best to scroll through the text file
Inversely to this method, we could start by searching for IP addresses, usernames, and timestamps depending on how much of that information is already known (as sometimes we aren’t lucky enough to have any of those indicators up-front)
After identifying what string is useful, go back and confirm you didn’t accidentally over-exclude content through the use of one or more of your patterns
We interchange use fgrep, and egrep in place of grep. These are variations on the standard grep interface, though fgrep is essentially an alias for grep -f
fgrep runs much faster as it only searches fixed strings. It is a good default as a fair amount of the time we are searching for a string without any patterns.
egrep allows for extended patterns and is a bit slower. It changes the behavior of the patterns, further detailed in man re_format

Pulling out useful statistics from log files

Another useful technique is to extract information such as ‘how many IP addresses attempted authentication to the machine’ and related ‘what usernames were they using’. To do this, using the same log as before, we can leverage grep, less, and awk.

Let’s use a pattern we discovered previously, “Invalid user”, to pull these types of answers. The below output shows the first 5 attempts, using head, where we see the username and IP address in the same message.

Since we want to only extract the IP address, for the first part of the question, let’s use the awk command. This command allows us to process text, in this case, printing selected columns of data. By default, awk will split on spaces though we can change the delimiter if needed. While awk has many functions, we will use the print feature to select the column with the IP address. Since awk will split on spaces, we will select the 10th column (column numbering starts at 1).

Great — we now have a list of IP addresses. Let’s now generate some stats using sort and uniq. As the names suggest, we will generate a unique list of IP addresses and gather a count of how many times they appear in those messages:

The new statements sort | uniq -c | sort -rn is what generates our nicely formatted list. The uniq command requires sorted input to properly deduplicate, uniq -c provides a count in addition to a deduplicated list, and finally sort -rn provides a numerically sorted (-n) list in reversed order (-r). Since this is a statement I use fairly often, I have made two aliases that I find useful:

And now I can re-run the prior command using the alias:

A few notes on this method:

Using space delimiters is dangerous, especially in log files. Imagine, for example, if a username (somehow) contained a space character. We would no longer be able to use column 10 as our IP address column for that row and would need to employ a different technique
The aliases provided only read from stdin. This works for my use case but is an important consideration. Worst-case scenario, we can always run cat $file | usort to leverage the alias.
Adding in the username, or any other field, would be as easy as specifying an additional column number in the awk statement. We would have to reconsider how we generate statistics though, as the usort alias will read the whole line when providing the counts.
We can use other tools, such as cut to provide similar functionality to awk. Find the ones you like and can remember and use those :)

One last piece on useful statistics, we can quickly generate larger counts using the wc utility. Leveraging the above command, we will use wc to get a count to the number of lines containing "Invalid user":

$ fgrep Invalid\ user auth.log | wc -l 
3549

This utility allows us to count other values, such as characters and words, but in this case, we specified -l to only get us the number of lines.

Sorry for the abrupt and early stop, but I wanted to memorialize this before it became another multi-weekend project that took too long to release. I hope to continue to put smaller posts like this out, hoping they help someone looking to add more bash/sh/zsh/other shell command line environment into their casework.

Next post ideas:

Working with JSON data at the command line
Writing useful loops
List of useful aliases and one-liners

Thoughts on the above? Leave a comment below!

Originally posted 2018–09–30

Update 2019–01–18 : Since writing this post, I’ve come across a useful tool for prototyping longer or iterative bash statements. The ultimate plumber, up, is a really great tool for testing new statement and saving the final iteration to a script for re-use.

Originally published on October 1, 2018.