Learn how to use grep, awk, sed, and jq for efficient Linux text processing. This practical guide covers syntax, real-world examples, and best practices for sysadmins, developers, and data engineers.
Source of the article:Dev Resource Hub
If you’ve spent any time working in Linux, you know text processing is non-negotiable. Whether you’re parsing gigabytes of server logs, extracting insights from CSV files, automating config edits, or wrangling JSON from APIs—you need tools that work fast and flexibly.
The good news? Linux comes with four built-in powerhouses: grep, awk, sed, and jq. Each has a unique superpower, but together they handle 90% of text-related tasks. In this guide, we’re skipping the dry theory and focusing on what you can actually use today. Let’s dive in.
Introduction to Linux Text Processing Tools
Text processing in Linux boils down to four core tasks: searching, extracting, editing, and parsing structured data. These tools are lightweight, pre-installed on most distributions, and designed for command-line efficiency. Here’s a quick breakdown of their roles:
-
grep: The “search master” for finding patterns in text -
awk: The “data wizard” for extracting and calculating from structured text -
sed: The “stream editor” for batch text modifications -
jq: The “JSON hero” for filtering and transforming JSON data
1. grep: Find Text Like a Pro
grep (short for Global Regular Expression Print) is your first stop for locating lines that match a pattern. It’s lightning-fast, even on large files, and supports regular expressions for granular searches.
Key Features
- Works with basic and extended regular expressions
- Searches recursively through directories
- Offers case-insensitive, line numbering, and inverse match options
Practical Examples
Basic Search: Find all “ERROR” entries in a log file:
grep "ERROR" server.log
Case-Insensitive + Line Numbers: Catch “error” or “Error” with line numbers:
grep -i -n "error" server.log
Recursive Search: Find “TODO” comments in all Python files:
grep -r "TODO" *.py
Invert Match: Show lines that don’t contain “DEBUG” (great for filtering noise):
grep -v "DEBUG" server.log
2. awk: Extract & Analyze Structured Data
awk isn’t just a tool—it’s a mini-programming language for text. It excels at processing line-by-line structured data (like CSVs or logs with fixed fields) by splitting lines into columns and applying logic.
Key Features
- Splits lines into customizable fields (default: whitespace)
- Supports conditionals, loops, and arithmetic
- Uses
BEGIN/ENDblocks for setup/teardown tasks
Practical Examples
Extract CSV Fields: Print names and cities from users.csv (columns: name,age,city):
awk -F',' '{print $1", "$3}' users.csv
Output:
Alice, New York
Bob, London
Charlie, Paris
Conditional Filtering: List users older than 30:
awk -F',' '$2 > 30 {print $1}' users.csv
Calculate Totals: Sum all ages in the CSV:
awk -F',' '{sum += $2} END {print sum}' users.csv
3. sed: Batch Edit Text Streams
sed (Stream Editor) is built for modifying text without opening files. It’s perfect for find-and-replace, deleting lines, or inserting content—especially in scripts.
Key Features
- Performs in-place edits or outputs to the terminal
- Uses regex for pattern matching
- Non-interactive (ideal for automation)
Practical Examples
Find-and-Replace: Replace “ERROR” with “WARNING” in server.log (preview first):
sed 's/ERROR/WARNING/g' server.log
In-Place Edit: Modify the file directly (add .bak to create a backup: -i.bak):
sed -i 's/ERROR/WARNING/g' server.log
Delete Lines: Remove all lines containing “DEBUG”:
sed '/DEBUG/d' server.log
4. jq: Tame JSON Data
With APIs and JSON configs everywhere, jq is a must-have for parsing JSON in the command line. It turns messy JSON into readable output and lets you filter/transform data with simple syntax.
Key Features
- Queries nested JSON objects/arrays
- Supports filtering, mapping, and aggregation
- Formats output for readability
Practical Examples
Given data.json:
[
{"name": "Alice", "age": 25, "city": "New York"},
{"name": "Bob", "age": 30, "city": "London"},
{"name": "Charlie", "age": 35, "city": "Paris"}
]
Extract Names: Get all names from the JSON array:
jq '.[].name' data.json
Filter by Age: Find users older than 30:
jq '.[] | select(.age > 30) | .name' data.json
Combining Tools: Real-World Pipelines
The real magic happens when you chain these tools with Linux pipes (|). Here are two common workflows:
Example 1: Analyze Web Server Logs
Extract IPs and URLs from 404 errors in access.log:
grep "404" access.log | awk '{print $1, $7}'
Example 2: Transform JSON Logs
Filter /api endpoints and replace status “200” with “OK” in api.log:
jq '.[] | select(.endpoint | startswith("/api"))' api.log | sed 's/"status": 200/"status":"OK"/g'
Pro Tips for Mastery
- Test Regex Incrementally: Complex patterns break easily—test parts first with
grep -E(extended regex). - Backup Before Editing: Always use
sed -i.bakto create backups, or test commands without-ifirst. - Learn Common Flags:
grep:-i(case-insensitive),-r(recursive) -
awk:-F(field separator),END(final action) -
sed:s/pattern/replace/g(global replace) -
jq:.[](iterate arrays),select()(filter)
man Use Pages: man grep or man jq has deep docs for edge cases.
Final Thoughts
grep, awk, sed, and jq aren’t just “tools”—they’re time-savers that turn tedious text tasks into one-liners. The more you experiment with them (start small: parse a log, edit a CSV), the more they’ll become second nature.
What’s your go-to text processing workflow? Drop a comment below—we’d love to hear how you use these tools in your projects!
Top comments (0)