TL;DR
As an AWS infrastructure engineer, I got tired of scrolling through hundreds of lines of JSON just to find simple information. So I built hawk - a CLI tool that brings pandas-like data analysis to JSON, YAML, and CSV files with a unified query language.
π GitHub Repository
π¦
Try it now
The Problem: JSON Hell in Daily DevOps
Picture this: It's 2 AM, you're troubleshooting a production issue, and you need to quickly check the status of an EC2 instance:
aws ec2 describe-instances --instance-ids i-1234567890abcdef0
What you get back is a 100+ line JSON blob. You need one piece of information: is the instance running?
What you have to do:
- Scroll through endless nested objects
- Mentally parse the structure
- Find the needle in the haystack
- Repeat this process dozens of times per day
This workflow is broken. There has to be a better way.
Existing Tools Fall Short
awk: Powerful but Limited
# Works for CSV, but JSON? Good luck.
awk -F',' '{print $2}' data.csv
- β No structured data support
- β Complex syntax that's hard to remember
- β Line-based processing doesn't fit JSON/YAML
jq: JSON-only Powerhouse
# Great for JSON, but what about YAML?
jq '.Reservations[0].Instances[0].State.Name' instance.json
- β JSON-only (no YAML/CSV support)
- β Complex syntax for simple operations
- β Need format conversion for mixed workflows
pandas: Overkill for CLI Tasks
# Just to check instance status? Really?
import pandas as pd
import json
with open('data.json') as f:
data = json.load(f)
# ... 10 more lines of setup
- β Heavy Python environment
- β Script files for simple queries
- β Not designed for CLI workflows
Enter hawk: Unified Data Analysis for CLI
hawk combines the best aspects of existing tools while solving their limitations:
β
Unified syntax across JSON, YAML, and CSV
β
pandas-like operations (select, group_by, aggregations)
β
Instant structure overview with info
command
β
Lightning fast (built in Rust)
β
Single binary distribution
hawk in Action
1. Instant Structure Understanding
Before (traditional approach):
aws ec2 describe-instances > instances.json
cat instances.json | less # Scroll through 100+ lines...
After (with hawk):
aws ec2 describe-instances | hawk '. | info'
# === Data Information ===
# Total records: 1
# Type: Object Array
# Fields: 1
#
# Field Details:
# Reservations Array (e.g., [1 items])
#
# Array Fields:
# Reservations [1 items]
# ββ Groups, Instances, OwnerId, ReservationId
Game changer. You instantly understand the data structure without scrolling or guessing.
2. Simple Data Access
# Check instance state (clean table output)
hawk '.Reservations[0].Instances[0].State' instances.json
# Code Name
# 16 running
# Extract multiple fields
hawk '.Reservations[0].Instances[0] | select_fields(InstanceId, InstanceType, State)' instances.json
# InstanceId InstanceType State
# i-1234567890abcdef0 t3.micro {Name: "running", Code: 16}
3. Real-World DevOps Scenarios
Find stopped instances:
aws ec2 describe-instances | hawk '.Reservations[0].Instances[].State | select(.Name == "stopped")'
Count instances by type:
aws ec2 describe-instances | hawk '.Reservations[0].Instances[] | group_by(.InstanceType) | count'
# InstanceType Count
# t3.micro 5
# t3.small 3
# t3.medium 2
Security audit - check security groups:
aws ec2 describe-instances | hawk '.Reservations[0].Instances[].SecurityGroups[0]'
# GroupName GroupId
# launch-wizard-1 sg-1234567890abcdefg
# web-server-sg sg-0987654321fedcba0
Format Flexibility: Same Syntax, Every Format
The magic of hawk is format-agnostic operations:
# JSON
hawk '.users[] | select(.age > 30) | group_by(.department) | count' data.json
# YAML (same syntax!)
hawk '.users[] | select(.age > 30) | group_by(.department) | count' data.yaml
# CSV (same syntax!)
hawk '.users[] | select(.age > 30) | group_by(.department) | count' data.csv
No more tool switching. No more syntax learning. One language, all formats.
Beyond AWS: Universal Data Analysis
hawk isn't just for AWS CLI. It works with any structured data:
Kubernetes Management
kubectl get pods -o json | hawk '.items[] | select(.status.phase == "Running") | count'
API Development
curl -s "https://api.github.com/users/kyotalab/repos" | hawk '.[] | select(.language == "Rust") | count'
Log Analysis
hawk '.logs[] | select(.level == "ERROR") | group_by(.source) | count' app-logs.json
Configuration Auditing
hawk '.services[] | select(.ports) | select_fields(name, ports)' docker-compose.yaml
Design Philosophy: Constraints Create Simplicity
One key decision shaped hawk's usability: intentional limitations.
Instead of supporting unlimited nested field access like .field.subfield.deep.access
, hawk encourages step-by-step exploration:
# Encouraged approach
hawk '.Reservations[].Instances[]' data.json # Step 1: Get instances
hawk '.Reservations[].Instances[].Placement' data.json # Step 2: Get placement info
hawk '.Reservations[].Instances[].Placement | group_by(.AvailabilityZone) | count' data.json
Why this constraint helps:
- β Debuggable: See intermediate results
- β Learnable: Build understanding step-by-step
- β Reliable: Fewer complex edge cases
- β Readable: Clear intent in each command
Real Impact: Measurable Time Savings
Since implementing hawk in my daily workflow:
- β±οΈ Structure understanding: 5 minutes β 10 seconds
- π Data extraction: 2-3 minutes β 30 seconds
- π Quick analysis: 10 minutes β 2 minutes
- π‘ Learning curve: Days (jq) β Hours (hawk)
Daily savings: ~30 minutes for an infrastructure engineer handling 20+ data analysis tasks.
Technical Implementation: Why Rust?
Performance & Distribution
- Single binary: No dependency hell
- Fast startup: Critical for CLI tools
- Memory efficient: Handle large JSON files smoothly
Developer Experience
- Type safety: Catch errors at compile time
- Pattern matching: Perfect for JSON parsing
- Ecosystem: Excellent crates for CLI development (clap, serde)
Learning Opportunity
Building hawk was also a personal Rust learning project, preparing for backend development roles beyond infrastructure.
Community Response
The reception has been encouraging:
- 80+ likes on Zenn (Japanese tech platform)
- 24+ GitHub stars and growing
- Active discussions about use cases and features
What's Next?
Short-term Roadmap
- [ ] More aggregation functions (median, percentiles)
- [ ] Output format options (CSV, TSV, custom)
- [ ] Performance optimizations for large files
- [ ] Windows binary releases
Long-term Vision
- [ ] Plugin system for custom operations
- [ ] Integration with popular CLI tools
- [ ] Language bindings for other ecosystems
Try hawk Today
Installation
# macOS/Linux (Homebrew)
brew install kyotalab/tools/hawk
# Or download binary from releases
# https://github.com/kyotalab/hawk/releases
Quick Start
# Explore any JSON structure
curl -s "https://api.github.com/users/kyotalab" | hawk '. | info'
# Try with your AWS CLI output
aws ec2 describe-instances | hawk '. | info'
# Analyze CSV data
hawk '.[] | group_by(.category) | count' data.csv
Closing Thoughts
Building hawk taught me that sometimes the best solution isn't adding more featuresβit's removing friction from existing workflows.
Every day, developers spend countless hours fighting with data formats, scrolling through endless output, and context-switching between tools. hawk represents a small step toward developer happiness: making common tasks feel effortless.
If you're tired of JSON/YAML/CSV wrangling in your daily workflow, give hawk a try. I'd love to hear about your use cases and feedback!
Links
- π GitHub Repository
- π Documentation
- π Issues & Feature Requests
Connect
- π¬ Comments below for discussion
What's your biggest pain point with structured data analysis in CLI? Let me know in the comments! π
Top comments (0)