DEV Community

Kyota Nakada
Kyota Nakada

Posted on

Why I Built hawk - A CLI Tool That Brings pandas-Like Operations to JSON/YAML/CSV

TL;DR

As an AWS infrastructure engineer, I got tired of scrolling through hundreds of lines of JSON just to find simple information. So I built hawk - a CLI tool that brings pandas-like data analysis to JSON, YAML, and CSV files with a unified query language.

πŸ”— GitHub Repository

πŸ¦… Try it now

The Problem: JSON Hell in Daily DevOps

Picture this: It's 2 AM, you're troubleshooting a production issue, and you need to quickly check the status of an EC2 instance:

aws ec2 describe-instances --instance-ids i-1234567890abcdef0
Enter fullscreen mode Exit fullscreen mode

What you get back is a 100+ line JSON blob. You need one piece of information: is the instance running?

What you have to do:

  1. Scroll through endless nested objects
  2. Mentally parse the structure
  3. Find the needle in the haystack
  4. Repeat this process dozens of times per day

This workflow is broken. There has to be a better way.

Existing Tools Fall Short

awk: Powerful but Limited

# Works for CSV, but JSON? Good luck.
awk -F',' '{print $2}' data.csv
Enter fullscreen mode Exit fullscreen mode
  • ❌ No structured data support
  • ❌ Complex syntax that's hard to remember
  • ❌ Line-based processing doesn't fit JSON/YAML

jq: JSON-only Powerhouse

# Great for JSON, but what about YAML?
jq '.Reservations[0].Instances[0].State.Name' instance.json
Enter fullscreen mode Exit fullscreen mode
  • ❌ JSON-only (no YAML/CSV support)
  • ❌ Complex syntax for simple operations
  • ❌ Need format conversion for mixed workflows

pandas: Overkill for CLI Tasks

# Just to check instance status? Really?
import pandas as pd
import json
with open('data.json') as f:
    data = json.load(f)
# ... 10 more lines of setup
Enter fullscreen mode Exit fullscreen mode
  • ❌ Heavy Python environment
  • ❌ Script files for simple queries
  • ❌ Not designed for CLI workflows

Enter hawk: Unified Data Analysis for CLI

hawk combines the best aspects of existing tools while solving their limitations:

βœ… Unified syntax across JSON, YAML, and CSV

βœ… pandas-like operations (select, group_by, aggregations)

βœ… Instant structure overview with info command

βœ… Lightning fast (built in Rust)

βœ… Single binary distribution

hawk in Action

1. Instant Structure Understanding

Before (traditional approach):

aws ec2 describe-instances > instances.json
cat instances.json | less  # Scroll through 100+ lines...
Enter fullscreen mode Exit fullscreen mode

After (with hawk):

aws ec2 describe-instances | hawk '. | info'

# === Data Information ===
# Total records: 1
# Type: Object Array
# Fields: 1
# 
# Field Details:
#   Reservations    Array      (e.g., [1 items])
#
# Array Fields:
#   Reservations    [1 items]
#     └─ Groups, Instances, OwnerId, ReservationId
Enter fullscreen mode Exit fullscreen mode

Game changer. You instantly understand the data structure without scrolling or guessing.

2. Simple Data Access

# Check instance state (clean table output)
hawk '.Reservations[0].Instances[0].State' instances.json
# Code  Name
# 16    running

# Extract multiple fields
hawk '.Reservations[0].Instances[0] | select_fields(InstanceId, InstanceType, State)' instances.json
# InstanceId           InstanceType  State
# i-1234567890abcdef0  t3.micro      {Name: "running", Code: 16}
Enter fullscreen mode Exit fullscreen mode

3. Real-World DevOps Scenarios

Find stopped instances:

aws ec2 describe-instances | hawk '.Reservations[0].Instances[].State | select(.Name == "stopped")'
Enter fullscreen mode Exit fullscreen mode

Count instances by type:

aws ec2 describe-instances | hawk '.Reservations[0].Instances[] | group_by(.InstanceType) | count'
# InstanceType  Count
# t3.micro      5
# t3.small      3
# t3.medium     2
Enter fullscreen mode Exit fullscreen mode

Security audit - check security groups:

aws ec2 describe-instances | hawk '.Reservations[0].Instances[].SecurityGroups[0]'
# GroupName         GroupId
# launch-wizard-1   sg-1234567890abcdefg
# web-server-sg     sg-0987654321fedcba0
Enter fullscreen mode Exit fullscreen mode

Format Flexibility: Same Syntax, Every Format

The magic of hawk is format-agnostic operations:

# JSON
hawk '.users[] | select(.age > 30) | group_by(.department) | count' data.json

# YAML (same syntax!)
hawk '.users[] | select(.age > 30) | group_by(.department) | count' data.yaml

# CSV (same syntax!)
hawk '.users[] | select(.age > 30) | group_by(.department) | count' data.csv
Enter fullscreen mode Exit fullscreen mode

No more tool switching. No more syntax learning. One language, all formats.

Beyond AWS: Universal Data Analysis

hawk isn't just for AWS CLI. It works with any structured data:

Kubernetes Management

kubectl get pods -o json | hawk '.items[] | select(.status.phase == "Running") | count'
Enter fullscreen mode Exit fullscreen mode

API Development

curl -s "https://api.github.com/users/kyotalab/repos" | hawk '.[] | select(.language == "Rust") | count'
Enter fullscreen mode Exit fullscreen mode

Log Analysis

hawk '.logs[] | select(.level == "ERROR") | group_by(.source) | count' app-logs.json
Enter fullscreen mode Exit fullscreen mode

Configuration Auditing

hawk '.services[] | select(.ports) | select_fields(name, ports)' docker-compose.yaml
Enter fullscreen mode Exit fullscreen mode

Design Philosophy: Constraints Create Simplicity

One key decision shaped hawk's usability: intentional limitations.

Instead of supporting unlimited nested field access like .field.subfield.deep.access, hawk encourages step-by-step exploration:

# Encouraged approach
hawk '.Reservations[].Instances[]' data.json              # Step 1: Get instances
hawk '.Reservations[].Instances[].Placement' data.json    # Step 2: Get placement info
hawk '.Reservations[].Instances[].Placement | group_by(.AvailabilityZone) | count' data.json
Enter fullscreen mode Exit fullscreen mode

Why this constraint helps:

  • βœ… Debuggable: See intermediate results
  • βœ… Learnable: Build understanding step-by-step
  • βœ… Reliable: Fewer complex edge cases
  • βœ… Readable: Clear intent in each command

Real Impact: Measurable Time Savings

Since implementing hawk in my daily workflow:

  • ⏱️ Structure understanding: 5 minutes β†’ 10 seconds
  • πŸ” Data extraction: 2-3 minutes β†’ 30 seconds
  • πŸ“Š Quick analysis: 10 minutes β†’ 2 minutes
  • πŸ’‘ Learning curve: Days (jq) β†’ Hours (hawk)

Daily savings: ~30 minutes for an infrastructure engineer handling 20+ data analysis tasks.

Technical Implementation: Why Rust?

Performance & Distribution

  • Single binary: No dependency hell
  • Fast startup: Critical for CLI tools
  • Memory efficient: Handle large JSON files smoothly

Developer Experience

  • Type safety: Catch errors at compile time
  • Pattern matching: Perfect for JSON parsing
  • Ecosystem: Excellent crates for CLI development (clap, serde)

Learning Opportunity

Building hawk was also a personal Rust learning project, preparing for backend development roles beyond infrastructure.

Community Response

The reception has been encouraging:

  • 80+ likes on Zenn (Japanese tech platform)
  • 24+ GitHub stars and growing
  • Active discussions about use cases and features

What's Next?

Short-term Roadmap

  • [ ] More aggregation functions (median, percentiles)
  • [ ] Output format options (CSV, TSV, custom)
  • [ ] Performance optimizations for large files
  • [ ] Windows binary releases

Long-term Vision

  • [ ] Plugin system for custom operations
  • [ ] Integration with popular CLI tools
  • [ ] Language bindings for other ecosystems

Try hawk Today

Installation

# macOS/Linux (Homebrew)
brew install kyotalab/tools/hawk

# Or download binary from releases
# https://github.com/kyotalab/hawk/releases
Enter fullscreen mode Exit fullscreen mode

Quick Start

# Explore any JSON structure
curl -s "https://api.github.com/users/kyotalab" | hawk '. | info'

# Try with your AWS CLI output
aws ec2 describe-instances | hawk '. | info'

# Analyze CSV data
hawk '.[] | group_by(.category) | count' data.csv
Enter fullscreen mode Exit fullscreen mode

Closing Thoughts

Building hawk taught me that sometimes the best solution isn't adding more featuresβ€”it's removing friction from existing workflows.

Every day, developers spend countless hours fighting with data formats, scrolling through endless output, and context-switching between tools. hawk represents a small step toward developer happiness: making common tasks feel effortless.

If you're tired of JSON/YAML/CSV wrangling in your daily workflow, give hawk a try. I'd love to hear about your use cases and feedback!


Links

Connect

  • πŸ’¬ Comments below for discussion

What's your biggest pain point with structured data analysis in CLI? Let me know in the comments! πŸ‘‡

Top comments (0)