In a Linux environment working with text-based files is part of the daily workflow. Whether you're parsing logs, cleaning structured data, or automating system reporting, tools like awk
, sed
, grep
, find
, and their counterparts are indispensable.
This article walks you through a realistic scenario to learn and apply these powerful tools in a step-by-step format.
π Table of Contents
- Scenario Overview
- Step 1: Set Up the Practice Environment
- Step 2: Find the Log Files (
find
) - Step 3: Filter Failed Logins (
grep
) - Step 4: Remove Comments (
sed
) - Step 5: Extract IP Addresses (
awk
) - Step 6: Cut Out Usernames (
cut
) - Step 7: Remove Duplicates (
sort
+uniq
) - Step 8: Normalize Case (
tr
) - Step 9: Chain Commands (
xargs
) - Full Workflow Example
- Command Cheat Sheet
- Summary
π§βπ» Scenario: Audit and Clean a Login Activity Log
You're a junior system administrator tasked with auditing a collection of login activity logs. These files:
- Are spread across multiple directories
- Contain both successful and failed logins
- Include redundant entries
- Use a structured
key=value
format common in system-generated logs
Your goal: extract meaningful data, clean it, and produce a simple report.
π§ Step 1: Set Up the Practice Environment
To simulate a real-world situation, you'll create a set of log files to work with.
π Create a Directory
mkdir -p ~/log_audit/logs/2025
cd ~/log_audit/logs/2025
π Create Sample Log Files
cat <<EOF > log1.txt
[2025-06-01 08:12:55] LOGIN: user=john src=192.168.1.10 status=SUCCESS
[2025-06-01 08:13:02] LOGIN: user=mary src=192.168.1.15 status=FAIL
[2025-06-01 08:13:02] LOGIN: user=mary src=192.168.1.15 status=FAIL
[2025-06-01 08:15:42] LOGIN: user=alice src=10.0.0.5 status=SUCCESS
EOF
cat <<EOF > log2.txt
# Generated by login system
[2025-06-01 08:18:22] LOGIN: user=bob src=192.168.1.20 status=FAIL
[2025-06-01 08:19:00] LOGIN: user=john src=192.168.1.10 status=SUCCESS
EOF
These files mimic logs from a PAM-enabled login system. Each entry contains a timestamp, user, source IP, and login result.
π Step 2: Find the Log Files (find
)
find ~/log_audit/logs -name "*.txt"
-
find
: Searches for files in a directory tree. -
-name "*.txt"
: Filters for.txt
log files.
π Step 3: Filter Failed Logins (grep
)
grep "status=FAIL" *.txt
-
grep
: Searches for patterns in text. -
"status=FAIL"
: Finds entries for failed login attempts.
Use the -i
flag for case-insensitive matching:
grep -i "status=fail" *.txt
πͺ Step 4: Remove Comments (sed
)
sed '/^#/d' log2.txt
-
sed
: A stream editor for filtering and transforming text. -
/^#/d
: Removes lines that begin with#
(comments or metadata).
To update the file directly:
sed -i '/^#/d' log2.txt
/^#/
: Matches lines that start with #
.
d
: Deletes those lines from the stream.
This command does not modify the file unless -i
is added.
Be cautious with this command because it is possible to remove meaningful information that may have been commented out intentionally, for example:
#2025-06-01: Backup completed
π Step 5: Extract IP Addresses (awk
)
-
awk
: is a powerful text processing tool used to scan, extract, and manipulate text, especially structured data.
awk -F 'src=| status' '{print $2}' log1.txt
-
-F 'src=| status'
: Defines a custom field separator using regex. -
{print $2}
: Extracts the IP address from the line.
β In Greater Detail:
-F 'src=| status'
-F
sets the field separator (the character(s) that split each line into fields).-
'src=| status'
is a regular expression that uses two delimiters:-
src=
β marks the beginning of the IP address. -
status
β marks the end of the IP address.
-
This means:
Field
$1
: everything beforesrc=
Field
$2
: the value betweensrc=
andstatus
β the IP addressField
$3
: everything afterstatus
Example:
[2025-06-01 08:12:55] LOGIN: user=john src=192.168.1.10 status=SUCCESS
Becomes:
$1
:[2025-06-01 08:12:55] LOGIN: user=john
$2
:192.168.1.10
$3
:SUCCESS
If you receive different results, check your syntax... you may have an additional space in the delimiter.
βοΈ Step 6: Cut Out Usernames (cut
)
grep "user=" log1.txt | cut -d '=' -f 2 | cut -d ' ' -f1
- First
cut
: Extracts everything afteruser=
. - Second
cut
: Removes trailing text after the space.
β In Greater Detail:
-
grep "user=" log1.txt
- Filters lines in
log1.txt
that contain the stringuser=
. - Why? Ensures you're only processing lines with login information.
- Filters lines in
-
cut -d '=' -f 2
- Extracts the part after
user=
. -
-d '='
: Sets the delimiter to=
. -
-f 2
: Selects the second field (the part afteruser=
).
- Extracts the part after
-
cut -d ' ' -f1
- Trims off everything after the username.
-
-d ' '
: Sets the delimiter to a space. -
-f1
: Selects the first field (the username only).
π Step 7: Remove Duplicates (sort
+ uniq
)
sort log1.txt | uniq
To count how many times each line appears:
sort log1.txt | uniq -c
π Step 8: Normalize Case (tr
)
awk -F 'status=' '{print $2}' log1.txt | tr '[:upper:]' '[:lower:]'
-
tr
: Translates characters. - Converts uppercase status values to lowercase.
π Step 9: Chain Commands (xargs
)
find . -name "*.txt" | xargs grep "status=SUCCESS" | awk -F 'user=| src' '{print $2}' | sort | uniq
-
xargs
: Convertsfind
output into arguments forgrep
. - Result: Clean list of users with successful logins.
π§ͺ Full Workflow Example: Cleaned Login Report
find . -name "*.txt" | xargs sed '/^#/d' | grep "status=SUCCESS" | awk -F 'user=| src' '{print $2}' | sort | uniq > clean_logins.txt
π§° Command Cheat Sheet
Tool | Purpose | Example |
---|---|---|
find |
Locate files in directories | find . -name "*.txt" |
grep |
Search for matching lines | grep "status=FAIL" |
sed |
Edit or delete lines in streams | sed '/^#/d' |
awk |
Extract and process fields | `awk -F 'src= |
{% raw %}cut
|
Remove selected portions of text | cut -d '=' -f 2 |
sort |
Sort lines in text files | sort log.txt |
uniq |
Remove or count duplicate lines | uniq -c |
tr |
Translate or delete characters | tr '[:upper:]' '[:lower:]' |
xargs |
Pass input as arguments to commands | xargs grep "SUCCESS" |
β Summary
-
awk
andsed
are essential for processing structured text on Linux systems. - When combined with tools like
grep
,cut
,sort
,uniq
,tr
,xargs
, andfind
, you have a full-featured toolkit for automated log parsing, data cleaning, and report generation. - The scenario above mirrors a common task in system administration: making sense of raw logs and producing actionable insights.
π¬ Need a Challenge? Try the following:
Now that youβve mastered the basics:
- Use
awk
to generate formatted CSV reports - Automate your workflows with
cron
- Explore real system logs with
journalctl
or/var/log/secure
Top comments (0)