DEV Community

Priyank Malviya
Priyank Malviya

Posted on

7 CLI Tools Every Data Engineer Needs in 2025

Why CLI Tools Still Matter

In a world of flashy GUIs and cloud consoles, CLI tools remain the secret weapon of productive data engineers.

Why?

  • Speed — No mouse hunting, no loading screens
  • Scriptability — Automate everything with bash, Python, or cron
  • Remote-friendly — SSH into any server and you're productive immediately
  • Composability — Pipe output between tools (tool1 | tool2 | tool3)
  • Lower resource usage — No Electron apps eating 2GB of RAM

The best data engineers I know live in the terminal. Here are the 7 CLI tools that will 10x your productivity in 2025.


1. Frosty — AI Agent for Multi-Database Operations 🥇

GitHub: https://github.com/Gyrus-Dev/frosty

Best for: Natural language database operations across Snowflake, PostgreSQL, SQL Server

Price: Free (open-source)

What It Does

Frosty is an AI agent that translates plain English into database operations. Instead of writing SQL or DDL manually, you type:

"show me my top 10 customers by revenue last quarter"
"set up MFA for all users without it"
"why is my warehouse spend up 40% this month?"
"create a data pipeline for S3 CSV ingestion with daily loads"
Enter fullscreen mode Exit fullscreen mode

Frosty handles schema discovery, query generation, safety validation, and execution.

Why It's #1

Feature Frosty Traditional Tools
Natural language input
Multi-database support ✅ (Snowflake + Postgres + SQL Server) Varies
Auto-generate DDL
Safety gates (block DROP)
Self-hosted Sometimes
Price Free $0–$$$

Real-World Use Cases

Data Engineering:

> "set up a data pipeline for S3 CSV ingestion"
# Generates: external stage, file format, table, COPY INTO, task schedule
# Time: 5 minutes vs. 2-3 hours manually
Enter fullscreen mode Exit fullscreen mode

Security:

> "enforce MFA for all users without it"
# Generates ALTER statements, shows preview, executes with approval
Enter fullscreen mode Exit fullscreen mode

Cost Governance:

> "why is warehouse spend up 40% this month?"
# Queries metering views, returns itemized breakdown with recommendations
Enter fullscreen mode Exit fullscreen mode

Setup

git clone https://github.com/Gyrus-Dev/frosty.git
cd frosty
pip install -r requirements.txt
# Configure .env with credentials + API key
python -m src.frosty_ai.objagents.main
Enter fullscreen mode Exit fullscreen mode

Model Support

  • OpenAI (GPT-4, GPT-4o)
  • Anthropic (Claude 3.5, Claude 3)
  • Google (Gemini 2.0, Gemini 1.5)
  • Ollama (local, free)

Swap models in one .env line — no code changes.

Safety

  • DROP unconditionally blocked in code
  • CREATE OR REPLACE requires explicit approval
  • Read-only modes for analyst roles
  • Audit logging of all executed statements

Verdict

Must-have for any data engineer working with Snowflake, PostgreSQL, or SQL Server. The time savings on repetitive tasks alone justify installation. Multi-database unified interface launching late April 2025.

Rating: ⭐⭐⭐⭐⭐ (5/5)


2. dbt (Data Build Tool) — SQL Transformations

Website: https://getdbt.com

Best for: Analytics engineering, SQL-based transformations in data warehouses

Price: Free (Core), $$$ (Cloud)

What It Does

dbt lets you write SQL transformations that are version-controlled, tested, and documented. You write SELECT statements; dbt handles DDL, dependencies, and orchestration.

Example Workflow

-- models/customers.sql
{{
  config(
    materialized='table',
    tags=['customers', 'core']
  )
}}

select
    customer_id,
    sum(revenue) as total_revenue,
    count(order_id) as order_count
from {{ ref('orders') }}
group by 1
Enter fullscreen mode Exit fullscreen mode

Run with:

dbt run --select customers
dbt test --select customers
dbt docs generate
Enter fullscreen mode Exit fullscreen mode

Key Features

  • Modular SQL — Use ref() to build dependency graphs
  • Testing — Built-in assertions (unique, not_null, accepted_values)
  • Documentation — Auto-generated data dictionary
  • CI/CD — Integrate with GitHub Actions, GitLab CI
  • Environments — Dev, staging, prod with separate schemas

When to Use It

✅ Building analytics models in Snowflake/BigQuery/Redshift

✅ Team collaboration on SQL transformations

✅ Need for testing and documentation

❌ Real-time streaming (use dbt + Airflow or dbt + Kafka)

❌ Non-SQL transformations (use Python + Pandas instead)

Verdict

Industry standard for analytics engineering. If you're doing SQL transformations in a data warehouse, dbt should be in your stack.

Rating: ⭐⭐⭐⭐⭐ (5/5)


3. pgcli / mycli — Enhanced Database Clients

GitHub: https://github.com/dbcli/pgcli (PostgreSQL)

GitHub: https://github.com/dbcli/mycli (MySQL/MariaDB)

Best for: Interactive SQL with auto-completion and syntax highlighting

Price: Free (open-source)

What They Do

These are drop-in replacements for psql and mysql with superpowers:

  • Auto-completion — Table names after FROM, column names after WHERE
  • Syntax highlighting — Colorized SQL in the terminal
  • Smart completion — Context-aware suggestions
  • Multi-line mode — Write complex queries across lines
  • Fuzzy search — Quickly find tables/columns

Example

# Instead of psql
psql -h localhost -U postgres -d mydb

# Use pgcli
pgcli -h localhost -U postgres -d mydb
Enter fullscreen mode Exit fullscreen mode

Now type:

SELECT c.  # Auto-completes: customer_id, customer_name, created_at, ...
FROM customers c
WHERE c.   # Auto-completes column names again
Enter fullscreen mode Exit fullscreen mode

Key Features

Feature pgcli/mycli psql/mysql
Auto-completion ❌ (basic in psql)
Syntax highlighting
Smart completion
Multi-line mode Manual (\e in psql)
Fuzzy search

When to Use It

✅ Daily SQL work in PostgreSQL or MySQL

✅ Want better UX than default CLI

✅ Working with large schemas (auto-completion saves time)

❌ Need GUI features (ER diagrams, visual query builder)

❌ Working with unsupported databases (only Postgres/MySQL)

Verdict

Instant productivity boost for PostgreSQL and MySQL users. Installation takes 30 seconds; you'll wonder how you lived without it.

Rating: ⭐⭐⭐⭐⭐ (5/5)


4. ingestr — Data Copying Between Databases

GitHub: https://github.com/igorbarinov/ingestr

Best for: Copying data between databases without custom code

Price: Free (open-source)

What It Does

ingestr copies data from source to destination databases with support for incremental loads. No custom Python scripts needed.

Example

# Copy entire table
ingestr copy \
  --source postgres://user:pass@localhost/mydb \
  --source-table users \
  --destination snowflake://account/user/pass \
  --destination-table analytics.users

# Incremental load (append only)
ingestr copy \
  --source postgres://... \
  --source-table orders \
  --destination snowflake://... \
  --destination-table analytics.orders \
  --mode append \
  --incremental-key created_at
Enter fullscreen mode Exit fullscreen mode

Supported Sources/Destinations

  • PostgreSQL
  • MySQL
  • SQL Server
  • BigQuery
  • Snowflake
  • Redshift
  • CSV/Parquet files

Key Features

  • Incremental loadsappend, delete+insert, create+replace
  • Schema auto-detection — No manual DDL writing
  • Scheduling — Integrate with cron, Airflow, Prefect
  • Logging — Row counts, timing, error handling

When to Use It

✅ Copying data between databases regularly

✅ Need incremental loads (CDC-like behavior)

✅ Want to avoid custom ETL scripts

❌ Complex transformations (use dbt or Spark)

❌ Real-time streaming (use Kafka + connectors)

Verdict

Huge time-saver for data replication tasks. Replaces hours of custom script writing with a single command.

Rating: ⭐⭐⭐⭐ (4/5)


5. fzf — Fuzzy Finder

GitHub: https://github.com/junegunn/fzf

Best for: Quickly searching files, commands, history

Price: Free (open-source)

What It Does

fzf is a general-purpose fuzzy finder that lets you search anything in the terminal interactively.

Examples

# Search command history
# Press Ctrl+R, start typing, fzf finds matching commands

# Find and open a file
vim $(fzf)

# Search SQL files in a project
cat $(find . -name "*.sql" | fzf)

# Kill a process interactively
ps aux | fzf | awk '{print $2}' | xargs kill -9

# Preview file contents while searching
fzf --preview 'head -100 {}'
Enter fullscreen mode Exit fullscreen mode

Key Features

  • Fuzzy matching — Type custmr and it finds customer_orders.sql
  • Interactive — Filter results as you type
  • Preview — See file contents before selecting
  • Keybindings — Ctrl+R for history, Ctrl+T for files
  • Integration — Works with vim, emacs, bash, zsh

When to Use It

✅ Large codebases with many files

✅ Frequent command history searches

✅ Want to navigate faster in the terminal

❌ Prefer GUI file explorers

❌ Working with small projects (less value)

Verdict

Universal productivity booster. Once you get used to fzf, you'll miss it in every other terminal session.

Rating: ⭐⭐⭐⭐⭐ (5/5)


6. jq — JSON Processor

GitHub: https://github.com/jqlang/jq

Best for: Parsing, filtering, transforming JSON in the terminal

Price: Free (open-source)

What It Does

jq is a lightweight command-line JSON processor. It's essential for working with APIs, logs, and JSON-heavy data pipelines.

Examples

# Pretty-print JSON
curl https://api.example.com/data | jq

# Extract specific fields
curl https://api.example.com/users | jq '.[] | {name, email}'

# Filter by condition
curl https://api.example.com/orders | jq '.[] | select(.status == "pending")'

# Aggregate data
curl https://api.example.com/orders | jq '[.[] | .amount] | add'

# Transform structure
curl https://api.example.com/data | jq '{total: .count, items: .results}'
Enter fullscreen mode Exit fullscreen mode

Key Features

  • SQL-like syntax — Filter, map, reduce JSON
  • Chaining — Pipe multiple operations
  • Error handling — Graceful failures on malformed JSON
  • Colorized output — Readable by default

When to Use It

✅ Working with REST APIs

✅ Parsing JSON logs

✅ Transforming JSON for downstream systems

❌ XML data (use xmllint instead)

❌ Complex transformations (use Python + Pandas)

Verdict

Essential for any data engineer working with APIs or JSON data. The learning curve is steep but worth it.

Rating: ⭐⭐⭐⭐⭐ (5/5)


7. httpie — HTTP Client

Website: https://httpie.io

Best for: API testing and debugging

Price: Free (CLI), $$$ (Desktop app)

What It Does

httpie is a user-friendly HTTP client for the terminal. It's like curl but with readable syntax and colorized output.

Examples

# GET request
http GET https://api.example.com/users

# POST with JSON body
http POST https://api.example.com/users name="John" email="john@example.com"

# With authentication
http --auth user:pass GET https://api.example.com/protected

# Download a file
http --download GET https://example.com/file.zip

# Set custom headers
http GET https://api.example.com/data X-API-Key:abc123
Enter fullscreen mode Exit fullscreen mode

Key Features

Feature httpie curl
Readable syntax ❌ (flags everywhere)
Colorized output ❌ (needs flags)
JSON auto-formatting ❌ (pipe to jq)
Interactive mode
Download files

When to Use It

✅ Testing REST APIs

✅ Debugging webhook payloads

✅ Quick API experiments

❌ Production scripts (curl is more universal)

❌ Complex authentication flows (use Postman/Insomnia)

Verdict

Better than curl for interactive API work. The syntax is intuitive, and the output is immediately readable.

Rating: ⭐⭐⭐⭐ (4/5)


Honorable Mentions

These didn't make the top 7 but are still worth knowing:

Tool Purpose
bat cat with syntax highlighting and Git integration
ripgrep (rg) Faster grep for code search
tmux Terminal multiplexer for session management
httpie Already covered, but worth repeating
peepDB Quick database table inspection without SQL
Sling Another data copying tool (alternative to ingestr)
Meltano ELT orchestration with CLI interface

How to Get Started

This Weekend

Pick 2-3 tools and install them:

# Frosty (AI database agent)
git clone https://github.com/Gyrus-Dev/frosty.git
cd frosty && pip install -r requirements.txt

# pgcli (if you use PostgreSQL)
pip install pgcli

# fzf (fuzzy finder)
brew install fzf  # macOS
sudo apt install fzf  # Ubuntu

# jq (JSON processor)
brew install jq  # macOS
sudo apt install jq  # Ubuntu

# httpie (HTTP client)
brew install httpie  # macOS
sudo apt install httpie  # Ubuntu
Enter fullscreen mode Exit fullscreen mode

Next Week

  • Try each tool on a real task (not just installation)
  • Replace one existing workflow with a CLI alternative
  • Share what you learned with your team

Long-Term

  • Build a personal toolkit of 10-15 go-to CLI tools
  • Script repetitive tasks using these tools
  • Contribute to open-source CLI projects you use

Final Thoughts

CLI tools aren't about rejecting modern GUIs — they're about having the right tool for the job.

Some tasks are better in a GUI (ER diagrams, visual query building). But for:

  • Quick data exploration
  • Repetitive admin tasks
  • Automation and scripting
  • Remote server work

CLI tools are unmatched.

Start with Frosty for database operations, add fzf for navigation, and jq for JSON work. You'll be shocked at how much faster you move.


Want to try Frosty?

  • 🐙 GitHub — Star to get notified about multi-DB launch
  • 💬 Discord — Join the community
  • 📧 Contact — Questions or demo requests

About the author: This list was compiled by the team behind Frosty, an AI agent for multi-database operations. We live in the terminal and love sharing tools that make data engineering faster.

Top comments (0)