DEV Community

Cover image for Linux Text Processing: Master grep, awk, sed & jq for Developers
Tiger Smith
Tiger Smith

Posted on

Linux Text Processing: Master grep, awk, sed & jq for Developers

Learn how to use grep, awk, sed, and jq for efficient Linux text processing. This practical guide covers syntax, real-world examples, and best practices for sysadmins, developers, and data engineers.

Source of the article:Dev Resource Hub


If you’ve spent any time working in Linux, you know text processing is non-negotiable. Whether you’re parsing gigabytes of server logs, extracting insights from CSV files, automating config edits, or wrangling JSON from APIs—you need tools that work fast and flexibly.

The good news? Linux comes with four built-in powerhouses: grepawksed, and jq. Each has a unique superpower, but together they handle 90% of text-related tasks. In this guide, we’re skipping the dry theory and focusing on what you can actually use today. Let’s dive in.

Introduction to Linux Text Processing Tools

Text processing in Linux boils down to four core tasks: searching, extracting, editing, and parsing structured data. These tools are lightweight, pre-installed on most distributions, and designed for command-line efficiency. Here’s a quick breakdown of their roles:

  • grep: The “search master” for finding patterns in text
  • awk: The “data wizard” for extracting and calculating from structured text
  • sed: The “stream editor” for batch text modifications
  • jq: The “JSON hero” for filtering and transforming JSON data

1. grep: Find Text Like a Pro

grep (short for Global Regular Expression Print) is your first stop for locating lines that match a pattern. It’s lightning-fast, even on large files, and supports regular expressions for granular searches.

Key Features

  • Works with basic and extended regular expressions
  • Searches recursively through directories
  • Offers case-insensitive, line numbering, and inverse match options

Practical Examples

Basic Search: Find all “ERROR” entries in a log file:

grep "ERROR" server.log
Enter fullscreen mode Exit fullscreen mode

Case-Insensitive + Line Numbers: Catch “error” or “Error” with line numbers:

grep -i -n "error" server.log
Enter fullscreen mode Exit fullscreen mode

Recursive Search: Find “TODO” comments in all Python files:

grep -r "TODO" *.py
Enter fullscreen mode Exit fullscreen mode

Invert Match: Show lines that don’t contain “DEBUG” (great for filtering noise):

grep -v "DEBUG" server.log
Enter fullscreen mode Exit fullscreen mode

2. awk: Extract & Analyze Structured Data

awk isn’t just a tool—it’s a mini-programming language for text. It excels at processing line-by-line structured data (like CSVs or logs with fixed fields) by splitting lines into columns and applying logic.

Key Features

  • Splits lines into customizable fields (default: whitespace)
  • Supports conditionals, loops, and arithmetic
  • Uses BEGIN/END blocks for setup/teardown tasks

Practical Examples

Extract CSV Fields: Print names and cities from users.csv (columns: name,age,city):

awk -F',' '{print $1", "$3}' users.csv
Enter fullscreen mode Exit fullscreen mode

Output:

Alice, New York
Bob, London
Charlie, Paris
Enter fullscreen mode Exit fullscreen mode

Conditional Filtering: List users older than 30:

awk -F',' '$2 > 30 {print $1}' users.csv
Enter fullscreen mode Exit fullscreen mode

Calculate Totals: Sum all ages in the CSV:

awk -F',' '{sum += $2} END {print sum}' users.csv
Enter fullscreen mode Exit fullscreen mode

3. sed: Batch Edit Text Streams

sed (Stream Editor) is built for modifying text without opening files. It’s perfect for find-and-replace, deleting lines, or inserting content—especially in scripts.

Key Features

  • Performs in-place edits or outputs to the terminal
  • Uses regex for pattern matching
  • Non-interactive (ideal for automation)

Practical Examples

Find-and-Replace: Replace “ERROR” with “WARNING” in server.log (preview first):

sed 's/ERROR/WARNING/g' server.log
Enter fullscreen mode Exit fullscreen mode

In-Place Edit: Modify the file directly (add .bak to create a backup: -i.bak):

sed -i 's/ERROR/WARNING/g' server.log
Enter fullscreen mode Exit fullscreen mode

Delete Lines: Remove all lines containing “DEBUG”:

sed '/DEBUG/d' server.log
Enter fullscreen mode Exit fullscreen mode

4. jq: Tame JSON Data

With APIs and JSON configs everywhere, jq is a must-have for parsing JSON in the command line. It turns messy JSON into readable output and lets you filter/transform data with simple syntax.

Key Features

  • Queries nested JSON objects/arrays
  • Supports filtering, mapping, and aggregation
  • Formats output for readability

Practical Examples

Given data.json:

[
  {"name": "Alice", "age": 25, "city": "New York"},
  {"name": "Bob", "age": 30, "city": "London"},
  {"name": "Charlie", "age": 35, "city": "Paris"}
]
Enter fullscreen mode Exit fullscreen mode

Extract Names: Get all names from the JSON array:

jq '.[].name' data.json
Enter fullscreen mode Exit fullscreen mode

Filter by Age: Find users older than 30:

jq '.[] | select(.age > 30) | .name' data.json
Enter fullscreen mode Exit fullscreen mode

Combining Tools: Real-World Pipelines

The real magic happens when you chain these tools with Linux pipes (|). Here are two common workflows:

Example 1: Analyze Web Server Logs

Extract IPs and URLs from 404 errors in access.log:

grep "404" access.log | awk '{print $1, $7}'
Enter fullscreen mode Exit fullscreen mode

Example 2: Transform JSON Logs

Filter /api endpoints and replace status “200” with “OK” in api.log:

jq '.[] | select(.endpoint | startswith("/api"))' api.log | sed 's/"status": 200/"status":"OK"/g'
Enter fullscreen mode Exit fullscreen mode

Pro Tips for Mastery

  • Test Regex Incrementally: Complex patterns break easily—test parts first with grep -E (extended regex).
  • Backup Before Editing: Always use sed -i.bak to create backups, or test commands without -i first.
  • Learn Common Flagsgrep-i (case-insensitive), -r (recursive)
  • awk-F (field separator), END (final action)
  • seds/pattern/replace/g (global replace)
  • jq.[] (iterate arrays), select() (filter)

man Use Pagesman grep or man jq has deep docs for edge cases.

Final Thoughts

grep, awk, sed, and jq aren’t just “tools”—they’re time-savers that turn tedious text tasks into one-liners. The more you experiment with them (start small: parse a log, edit a CSV), the more they’ll become second nature.

What’s your go-to text processing workflow? Drop a comment below—we’d love to hear how you use these tools in your projects!

Top comments (0)