Kyota Nakada

Posted on Jul 14

Why I Built hawk - A CLI Tool That Brings pandas-Like Operations to JSON/YAML/CSV

#rust #cli #devops

TL;DR

As an AWS infrastructure engineer, I got tired of scrolling through hundreds of lines of JSON just to find simple information. So I built hawk - a CLI tool that brings pandas-like data analysis to JSON, YAML, and CSV files with a unified query language.

🔗 GitHub Repository

🦅 Try it now

The Problem: JSON Hell in Daily DevOps

Picture this: It's 2 AM, you're troubleshooting a production issue, and you need to quickly check the status of an EC2 instance:

aws ec2 describe-instances --instance-ids i-1234567890abcdef0

What you get back is a 100+ line JSON blob. You need one piece of information: is the instance running?

What you have to do:

Scroll through endless nested objects
Mentally parse the structure
Find the needle in the haystack
Repeat this process dozens of times per day

This workflow is broken. There has to be a better way.

Existing Tools Fall Short

awk: Powerful but Limited

# Works for CSV, but JSON? Good luck.
awk -F',' '{print $2}' data.csv

❌ No structured data support
❌ Complex syntax that's hard to remember
❌ Line-based processing doesn't fit JSON/YAML

jq: JSON-only Powerhouse

# Great for JSON, but what about YAML?
jq '.Reservations[0].Instances[0].State.Name' instance.json

❌ JSON-only (no YAML/CSV support)
❌ Complex syntax for simple operations
❌ Need format conversion for mixed workflows

pandas: Overkill for CLI Tasks

# Just to check instance status? Really?
import pandas as pd
import json
with open('data.json') as f:
    data = json.load(f)
# ... 10 more lines of setup

❌ Heavy Python environment
❌ Script files for simple queries
❌ Not designed for CLI workflows

Enter hawk: Unified Data Analysis for CLI

hawk combines the best aspects of existing tools while solving their limitations:

✅ Unified syntax across JSON, YAML, and CSV

✅ pandas-like operations (select, group_by, aggregations)

✅ Instant structure overview with info command

✅ Lightning fast (built in Rust)

✅ Single binary distribution

hawk in Action

1. Instant Structure Understanding

Before (traditional approach):

aws ec2 describe-instances > instances.json
cat instances.json | less  # Scroll through 100+ lines...

After (with hawk):

aws ec2 describe-instances | hawk '. | info'

# === Data Information ===
# Total records: 1
# Type: Object Array
# Fields: 1
# 
# Field Details:
#   Reservations    Array      (e.g., [1 items])
#
# Array Fields:
#   Reservations    [1 items]
#     └─ Groups, Instances, OwnerId, ReservationId

Game changer. You instantly understand the data structure without scrolling or guessing.

2. Simple Data Access

# Check instance state (clean table output)
hawk '.Reservations[0].Instances[0].State' instances.json
# Code  Name
# 16    running

# Extract multiple fields
hawk '.Reservations[0].Instances[0] | select_fields(InstanceId, InstanceType, State)' instances.json
# InstanceId           InstanceType  State
# i-1234567890abcdef0  t3.micro      {Name: "running", Code: 16}

3. Real-World DevOps Scenarios

Find stopped instances:

aws ec2 describe-instances | hawk '.Reservations[0].Instances[].State | select(.Name == "stopped")'

Count instances by type:

aws ec2 describe-instances | hawk '.Reservations[0].Instances[] | group_by(.InstanceType) | count'
# InstanceType  Count
# t3.micro      5
# t3.small      3
# t3.medium     2

Security audit - check security groups:

aws ec2 describe-instances | hawk '.Reservations[0].Instances[].SecurityGroups[0]'
# GroupName         GroupId
# launch-wizard-1   sg-1234567890abcdefg
# web-server-sg     sg-0987654321fedcba0

Format Flexibility: Same Syntax, Every Format

The magic of hawk is format-agnostic operations:

# JSON
hawk '.users[] | select(.age > 30) | group_by(.department) | count' data.json

# YAML (same syntax!)
hawk '.users[] | select(.age > 30) | group_by(.department) | count' data.yaml

# CSV (same syntax!)
hawk '.users[] | select(.age > 30) | group_by(.department) | count' data.csv

No more tool switching. No more syntax learning. One language, all formats.

Beyond AWS: Universal Data Analysis

hawk isn't just for AWS CLI. It works with any structured data:

Kubernetes Management

kubectl get pods -o json | hawk '.items[] | select(.status.phase == "Running") | count'

API Development

curl -s "https://api.github.com/users/kyotalab/repos" | hawk '.[] | select(.language == "Rust") | count'

Log Analysis

hawk '.logs[] | select(.level == "ERROR") | group_by(.source) | count' app-logs.json

Configuration Auditing

hawk '.services[] | select(.ports) | select_fields(name, ports)' docker-compose.yaml

Design Philosophy: Constraints Create Simplicity

One key decision shaped hawk's usability: intentional limitations.

Instead of supporting unlimited nested field access like .field.subfield.deep.access, hawk encourages step-by-step exploration:

# Encouraged approach
hawk '.Reservations[].Instances[]' data.json              # Step 1: Get instances
hawk '.Reservations[].Instances[].Placement' data.json    # Step 2: Get placement info
hawk '.Reservations[].Instances[].Placement | group_by(.AvailabilityZone) | count' data.json

Why this constraint helps:

✅ Debuggable: See intermediate results
✅ Learnable: Build understanding step-by-step
✅ Reliable: Fewer complex edge cases
✅ Readable: Clear intent in each command

Real Impact: Measurable Time Savings

Since implementing hawk in my daily workflow:

⏱️ Structure understanding: 5 minutes → 10 seconds
🔍 Data extraction: 2-3 minutes → 30 seconds
📊 Quick analysis: 10 minutes → 2 minutes
💡 Learning curve: Days (jq) → Hours (hawk)

Daily savings: ~30 minutes for an infrastructure engineer handling 20+ data analysis tasks.

Technical Implementation: Why Rust?

Performance & Distribution

Single binary: No dependency hell
Fast startup: Critical for CLI tools
Memory efficient: Handle large JSON files smoothly

Developer Experience

Type safety: Catch errors at compile time
Pattern matching: Perfect for JSON parsing
Ecosystem: Excellent crates for CLI development (clap, serde)

Learning Opportunity

Building hawk was also a personal Rust learning project, preparing for backend development roles beyond infrastructure.

Community Response

The reception has been encouraging:

80+ likes on Zenn (Japanese tech platform)
24+ GitHub stars and growing
Active discussions about use cases and features

What's Next?

Short-term Roadmap

[ ] More aggregation functions (median, percentiles)
[ ] Output format options (CSV, TSV, custom)
[ ] Performance optimizations for large files
[ ] Windows binary releases

Long-term Vision

[ ] Plugin system for custom operations
[ ] Integration with popular CLI tools
[ ] Language bindings for other ecosystems

Try hawk Today

Installation

# macOS/Linux (Homebrew)
brew install kyotalab/tools/hawk

# Or download binary from releases
# https://github.com/kyotalab/hawk/releases

Quick Start

# Explore any JSON structure
curl -s "https://api.github.com/users/kyotalab" | hawk '. | info'

# Try with your AWS CLI output
aws ec2 describe-instances | hawk '. | info'

# Analyze CSV data
hawk '.[] | group_by(.category) | count' data.csv

Closing Thoughts

Building hawk taught me that sometimes the best solution isn't adding more features—it's removing friction from existing workflows.

Every day, developers spend countless hours fighting with data formats, scrolling through endless output, and context-switching between tools. hawk represents a small step toward developer happiness: making common tasks feel effortless.

If you're tired of JSON/YAML/CSV wrangling in your daily workflow, give hawk a try. I'd love to hear about your use cases and feedback!

Connect

💬 Comments below for discussion

What's your biggest pain point with structured data analysis in CLI? Let me know in the comments! 👇

DEV Community