In a Linux environment working with text-based files is part of the daily workflow. Whether you're parsing logs, cleaning structured data, or automating system reporting, tools like awk, sed, grep, find, and their counterparts are indispensable.
This article walks you through a realistic scenario to learn and apply these powerful tools in a step-by-step format.
π Table of Contents
- Scenario Overview
- Step 1: Set Up the Practice Environment
- Step 2: Find the Log Files (
find) - Step 3: Filter Failed Logins (
grep) - Step 4: Remove Comments (
sed) - Step 5: Extract IP Addresses (
awk) - Step 6: Cut Out Usernames (
cut) - Step 7: Remove Duplicates (
sort+uniq) - Step 8: Normalize Case (
tr) - Step 9: Chain Commands (
xargs) - Full Workflow Example
- Command Cheat Sheet
- Summary
π§βπ» Scenario: Audit and Clean a Login Activity Log
You're a junior system administrator tasked with auditing a collection of login activity logs. These files:
- Are spread across multiple directories
- Contain both successful and failed logins
- Include redundant entries
- Use a structured
key=valueformat common in system-generated logs
Your goal: extract meaningful data, clean it, and produce a simple report.
π§ Step 1: Set Up the Practice Environment
To simulate a real-world situation, you'll create a set of log files to work with.
π Create a Directory
mkdir -p ~/log_audit/logs/2025
cd ~/log_audit/logs/2025
π Create Sample Log Files
cat <<EOF > log1.txt
[2025-06-01 08:12:55] LOGIN: user=john src=192.168.1.10 status=SUCCESS
[2025-06-01 08:13:02] LOGIN: user=mary src=192.168.1.15 status=FAIL
[2025-06-01 08:13:02] LOGIN: user=mary src=192.168.1.15 status=FAIL
[2025-06-01 08:15:42] LOGIN: user=alice src=10.0.0.5 status=SUCCESS
EOF
cat <<EOF > log2.txt
# Generated by login system
[2025-06-01 08:18:22] LOGIN: user=bob src=192.168.1.20 status=FAIL
[2025-06-01 08:19:00] LOGIN: user=john src=192.168.1.10 status=SUCCESS
EOF
These files mimic logs from a PAM-enabled login system. Each entry contains a timestamp, user, source IP, and login result.
π Step 2: Find the Log Files (find)
find ~/log_audit/logs -name "*.txt"
-
find: Searches for files in a directory tree. -
-name "*.txt": Filters for.txtlog files.
π Step 3: Filter Failed Logins (grep)
grep "status=FAIL" *.txt
-
grep: Searches for patterns in text. -
"status=FAIL": Finds entries for failed login attempts.
Use the -i flag for case-insensitive matching:
grep -i "status=fail" *.txt
πͺ Step 4: Remove Comments (sed)
sed '/^#/d' log2.txt
-
sed: A stream editor for filtering and transforming text. -
/^#/d: Removes lines that begin with#(comments or metadata).
To update the file directly:
sed -i '/^#/d' log2.txt
/^#/: Matches lines that start with #.
d: Deletes those lines from the stream.
This command does not modify the file unless -i is added.
Be cautious with this command because it is possible to remove meaningful information that may have been commented out intentionally, for example:
#2025-06-01: Backup completed
π Step 5: Extract IP Addresses (awk)
-
awk: is a powerful text processing tool used to scan, extract, and manipulate text, especially structured data.
awk -F 'src=| status' '{print $2}' log1.txt
-
-F 'src=| status': Defines a custom field separator using regex. -
{print $2}: Extracts the IP address from the line.
β In Greater Detail:
-F 'src=| status'
-Fsets the field separator (the character(s) that split each line into fields).-
'src=| status'is a regular expression that uses two delimiters:-
src=β marks the beginning of the IP address. -
statusβ marks the end of the IP address.
-
This means:
Field
$1: everything beforesrc=Field
$2: the value betweensrc=andstatusβ the IP addressField
$3: everything afterstatus
Example:
[2025-06-01 08:12:55] LOGIN: user=john src=192.168.1.10 status=SUCCESS
Becomes:
$1:[2025-06-01 08:12:55] LOGIN: user=john$2:192.168.1.10$3:SUCCESS
If you receive different results, check your syntax... you may have an additional space in the delimiter.
βοΈ Step 6: Cut Out Usernames (cut)
grep "user=" log1.txt | cut -d '=' -f 2 | cut -d ' ' -f1
- First
cut: Extracts everything afteruser=. - Second
cut: Removes trailing text after the space.
β In Greater Detail:
-
grep "user=" log1.txt- Filters lines in
log1.txtthat contain the stringuser=. - Why? Ensures you're only processing lines with login information.
- Filters lines in
-
cut -d '=' -f 2- Extracts the part after
user=. -
-d '=': Sets the delimiter to=. -
-f 2: Selects the second field (the part afteruser=).
- Extracts the part after
-
cut -d ' ' -f1- Trims off everything after the username.
-
-d ' ': Sets the delimiter to a space. -
-f1: Selects the first field (the username only).
π Step 7: Remove Duplicates (sort + uniq)
sort log1.txt | uniq
To count how many times each line appears:
sort log1.txt | uniq -c
π Step 8: Normalize Case (tr)
awk -F 'status=' '{print $2}' log1.txt | tr '[:upper:]' '[:lower:]'
-
tr: Translates characters. - Converts uppercase status values to lowercase.
π Step 9: Chain Commands (xargs)
find . -name "*.txt" | xargs grep "status=SUCCESS" | awk -F 'user=| src' '{print $2}' | sort | uniq
-
xargs: Convertsfindoutput into arguments forgrep. - Result: Clean list of users with successful logins.
π§ͺ Full Workflow Example: Cleaned Login Report
find . -name "*.txt" | xargs sed '/^#/d' | grep "status=SUCCESS" | awk -F 'user=| src' '{print $2}' | sort | uniq > clean_logins.txt
π§° Command Cheat Sheet
| Tool | Purpose | Example |
|---|---|---|
find |
Locate files in directories | find . -name "*.txt" |
grep |
Search for matching lines | grep "status=FAIL" |
sed |
Edit or delete lines in streams | sed '/^#/d' |
awk |
Extract and process fields | `awk -F 'src= |
{% raw %}cut
|
Remove selected portions of text | cut -d '=' -f 2 |
sort |
Sort lines in text files | sort log.txt |
uniq |
Remove or count duplicate lines | uniq -c |
tr |
Translate or delete characters | tr '[:upper:]' '[:lower:]' |
xargs |
Pass input as arguments to commands | xargs grep "SUCCESS" |
β Summary
-
awkandsedare essential for processing structured text on Linux systems. - When combined with tools like
grep,cut,sort,uniq,tr,xargs, andfind, you have a full-featured toolkit for automated log parsing, data cleaning, and report generation. - The scenario above mirrors a common task in system administration: making sense of raw logs and producing actionable insights.
π¬ Need a Challenge? Try the following:
Now that youβve mastered the basics:
- Use
awkto generate formatted CSV reports - Automate your workflows with
cron - Explore real system logs with
journalctlor/var/log/secure










Top comments (0)