DEV Community: Arya Gorjipour

I wanted a task tracker that gets out of my way and remembers what I shipped. So I built shipflow

Arya Gorjipour — Tue, 07 Jul 2026 13:00:00 +0000

It's Friday afternoon. Your async standup update is due, and the question is the same one it's always been: what did you actually get done this week?

Your options:

Open Linear / Notion / Todoist and scroll a board that's 40% stale, 30% aspirational, and 30% things you finished weeks ago but never dragged into the Done column.
Scroll git log --oneline and try to reverse-engineer a week of intent from commit messages you wrote for a completely different audience.
Stare at the ceiling and type "worked on the auth refactor, some bug fixes" for the third week running.

All three are bad. The first makes you maintain a second job to feed a tool. The second has the data but none of the framing — commits are about code, not intentions. The third is a quiet lie you tell because reconstructing the truth costs more than it's worth.

shipflow is what sits in the gap. It's a single Rust binary that lives in your terminal. You add a one-line intention, you mark it done when it ships, and on Friday you run one command that tells you exactly what you shipped — in a format you can paste straight into your standup. No board to maintain. No cloud. No due dates nagging you. Just cargo install shipflow.

$ shipflow add "Ship auth refactor" --tags rust,auth
Added 01JABC12 — Ship auth refactor

$ shipflow done auth
Done 01JABC12 — Ship auth refactor
  linked: a1b2c3d Fix OAuth callback redirect

$ shipflow report week --format md
# What I shipped — this week
...

That's basically the whole product surface.

Why I quit my task manager

I was a Todoist/Notion/Linear user for years, and I kept hitting the same wall: those tools are project management software wearing a personal-productivity costume. They want due dates, priorities, sub-tasks, labels, recurring schedules, automations, and a board you tend like a garden. None of that is free — the cost is that maintaining the tool becomes its own task. And the punchline is that after all that ceremony, none of them could cleanly answer the only question I cared about at the end of the week: what did I ship?

The data was technically in there, buried under filters and views. But "the data is technically in there" is exactly the failure mode. I'd context-switch out of my terminal, find the right project, drag a card, set a status, and lose the thread of what I was actually doing. The tool was pulling me out of flow to feed itself.

The opposite extreme is no better. A TODO.md and a scattering of // TODO comments have zero memory and zero reflection — they tell you what's left, never what's done. They rot. They never produce a Friday summary.

shipflow is the middle I wanted: fast enough to capture an intention without leaving the keyboard, with just enough memory to tell me what I finished when I ask. It tracks two things and refuses to track a third. Intentions (what I mean to ship) and shipped work (what I actually did). It deliberately has no due dates, no priorities, no reminders, no recurring tasks, no sub-tasks. That's not a roadmap gap — it's the entire point.

What I actually get out of it

Two things, every week.

The Friday report is the whole reason this exists. I run one command and get a clean, dated summary of everything I shipped, formatted as Markdown:

$ shipflow report week --format md

# What I shipped — this week

_Generated 2026-06-27 16:41 UTC_

## Summary

- **Shipped:** 5
- **Linked commits:** 4
- **Focus areas:** `rust` (3), `cli` (2), `auth` (1)

## Shipped

### Ship auth refactor (2026-06-27)

**Tags:** `rust` `auth`

**Commit:** `a1b2c3d` — Fix OAuth callback redirect

### Fix parser edge case on empty input (2026-06-25)

**Tags:** `rust` `cli`

**Commit:** `9f0e1d2` — Handle zero-length token stream

That output goes straight into my standup channel. No reconstructing, no embellishing, no staring at the ceiling. The week wrote itself down as it happened, and the report just reads it back. There's a plain-text table version too (report week with no --format) for when I just want to look at it in the terminal.

The second payoff is the one I didn't expect: it kills the "I did nothing this week" feeling. Every developer knows that Friday sensation — the vague conviction that you spun your wheels for five days. It's almost always false, and you can't argue yourself out of it, because the feeling isn't made of evidence. shipflow hands you the evidence. Five shipped tasks, four of them tied to real commits, three of them in the area you said you'd focus on. There's a name for this — the progress principle: the strongest day-to-day motivator at work is the sense of making progress, and we are genuinely bad at noticing our own. A report week is a cheap, honest correction to that blind spot. It changed how the end of my week feels more than any productivity app ever did, and it did it by being less of an app.

Time periods are today, week, month, and all, so the same trick works for a monthly retro or a year-in-review.

How it actually works

The full loop is four verbs.

# Inside a git repo — tasks live in .shipflow/tasks.json
shipflow add "Fix parser edge case" --tags rust,cli
shipflow add "Write weekly reflection" --note "Keep it short"

shipflow list --status open          # what's still on the hook
shipflow done parser                 # mark done; interactive commit linking when TTY
shipflow report week                 # the payoff

shipflow status                      # counts + current git context
shipflow board                       # optional kanban TUI (Open / Done)

Two details worth calling out, because they're the difference between a tool I tolerate and one I actually reach for:

You don't need IDs. shipflow done parser works because shipflow resolves a query against both ULID prefixes and partial titles. I added the task as "Fix parser edge case" and I close it by typing the word I remember. If the query is ambiguous it tells me which tasks matched instead of guessing.

done links the commit that did the work. When you're in a git repo and on a TTY, marking a task done shows you your recent commits and lets you attach one:

Recent commits:
  0) Skip
  1) a1b2c3d Fix OAuth callback redirect
  2) 9f0e1d2 Handle zero-length token stream
  3) 7c4b8a0 Bump clap to 4.5
Link commit [0-5]:

Pick one and the report now ties your stated intention to the actual SHA that shipped it. For scripts and hooks there's --commit <sha> and --no-link, so nothing breaks when there's no terminal attached.

Who this is for (and who it isn't)

Good fit:

Solo developers who live in the terminal and want intention tracking without leaving it.
Anyone who has to produce a weekly status, standup update, or async retro and resents rebuilding it from memory.
People who want per-project history — what got shipped in this repo — without a central account.
Anyone allergic to maintaining a board as a second job.

Bad fit (and I'll be explicit about this):

Teams that need a shared board, assignment, or real-time collaboration. shipflow is single-user and local-first by design. You can commit .shipflow/tasks.json to make shipped work team-visible in a repo, but there's no sync, no assignment, no notifications, no presence.
Anyone who needs deadlines. There are no due dates, no reminders, no recurring tasks. If your work is deadline-driven, this is a real gap, not an oversight — it's the wrong tool and I'd rather tell you now.
People who want priorities, sub-tasks, or dependencies. Not here, not planned. shipflow is intention + reflection, not a planner.
People who want a GUI or a mobile app. It's a CLI with an optional terminal kanban view. That's the entire interface.

The scope is deliberately small. If you've ever closed a task manager thinking "I just wanted a notepad that remembers what I finished," that's the exact size of this thing.

Under the hood

For the readers who like the implementation decisions — these are all documented as ADRs in the repo, but the short version:

Storage is a pretty-printed JSON file, not a database. tasks.json, serialized with serde_json pretty printing. The reasoning: universal tooling (jq, any editor), predictable git diffs, and zero learning curve. You can read your own data without shipflow installed. A schema version field is baked in so future formats can migrate cleanly.
Writes are atomic. Every mutation writes to a temp file and renames over the original (fs::rename), with a Windows-specific path because Windows won't rename over an existing file the way Unix does. A killed process mid-write doesn't corrupt your task list.
Git is shelled out, not linked. It calls git rev-parse, git log, and git cat-file as subprocesses rather than pulling in libgit2. For what shipflow needs — repo root, current branch, recent commits, commit validation — subprocess git is plenty, and it keeps the binary small and the compile fast. If git isn't installed, shipflow degrades gracefully to no-git mode instead of failing.
Task IDs are ULIDs. Sortable, collision-resistant, and merge-friendly. That last property matters: if two branches each add a task to a per-repo tasks.json, ULIDs don't collide the way sequential integers would, so the merge is sane.
Per-repo by default, global as a fallback. Inside a git repo, tasks live in <repo>/.shipflow/tasks.json. Outside one, they fall back to your XDG config dir (via the directories crate). --global forces the global store from anywhere. Want your intentions private? echo ".shipflow/" >> .gitignore. Want shipped work visible to the team? Commit it. Your call, per repo.
The kanban TUI is behind a default-on feature flag. board is built on ratatui, gated behind a tui feature that's on by default. cargo install shipflow --no-default-features gives you a leaner binary with just the CLI if you never want the board.
Boring-on-purpose quality bar. Clippy is configured to deny unwrap_used and expect_used — no panics smuggled in through lazy error handling. Errors are typed with thiserror and printed with owo-colors, which respects NO_COLOR and CLICOLOR and turns itself off when stdout isn't a TTY. Logging is silent unless you set RUST_LOG. Edition 2024, MSRV 1.88.

Tradeoffs worth being honest about

A few decisions have real downsides, and you should know them before you commit:

It's a whole-file JSON rewrite on every mutation. Add a task, the entire tasks.json is re-serialized and atomically rewritten; reads load the whole file into memory. For personal task lists — hundreds, low thousands of entries — this is invisibly fast and the simplicity is worth it. If you somehow accumulate hundreds of thousands of tasks, this is the wrong design and you'll feel it. Personal task lists don't reach that scale, which is exactly why JSON was the right call here and would be the wrong call for, say, a log indexer.

report week and report month are rolling windows, not calendar-aligned. "This week" means the last 7 days from right now, and "this month" means the last 30 — they don't reset on Monday or on the 1st. (today is calendar-aligned to local midnight.) For a standup this is usually what you want, but if you expect a Monday-to-Sunday boundary, it'll surprise you. Honest sharp edge.

There's no edit and no reopen yet. You can add and done, but you can't currently rename a task, retag it, or move a done task back to open. If you fat-finger a title, today the fix is editing tasks.json directly (which, to be fair, is exactly why it's human-readable JSON). These are the two gaps I feel most often myself.

Local-first means no sync. Repo-mode tasks travel with the repo if you commit them; global tasks don't travel at all. If you work across a desktop and a laptop and want one unified list, shipflow won't give you that out of the box.

Install and try it

Linux / macOS:

curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/Aryagorjipour/shipflow/releases/latest/download/shipflow-installer.sh | sh

Windows (PowerShell):

irm https://github.com/Aryagorjipour/shipflow/releases/latest/download/shipflow-installer.ps1 | iex

From crates.io:

cargo install shipflow
# or, if you have it:
cargo binstall shipflow

Minimal build without the TUI:

cargo install shipflow --no-default-features

Then the thirty-second tour:

shipflow add "Try shipflow" --tags meta
shipflow list
shipflow done try
shipflow report week --format md

Generate shell completions while you're at it:

shipflow completions fish > ~/.config/fish/completions/shipflow.fish
shipflow completions zsh  > ~/.zfunc/_shipflow
shipflow completions bash > ~/.local/share/bash-completion/completions/shipflow

MSRV is Rust 1.88, edition 2024, dual-licensed MIT OR Apache-2.0.

Where it's thin (PRs welcome)

shipflow is young — v0.1.4 as I write this — and I've kept the scope tight on purpose. But there are a handful of genuine gaps where contributions would land cleanly, all of them well-defined:

edit and reopen. Rename/retag a task, and move a done task back to open. The two I miss most. The storage layer already round-trips tasks by ID, so this is mostly a command and a TaskStore method each.
Calendar-aligned report windows. An option to make week mean Monday-to-now instead of a rolling 7 days. The period logic lives in one small enum in report.rs.
A JSON report format. report does text and md today; a --format json would make it pipeable into other tooling the way jq lovers expect.
Linking more than one commit to a task, or auto-suggesting by branch name.

The repo has CI, integration tests against the real binary (assert_cmd), ADRs for the design decisions, and an issue/PR template. If you pick something up, the contribution path is documented.

A note on context

I built shipflow because I quit a tool that wanted to run my life when all I wanted was to remember what I'd done. I dogfood it daily now, and the part that genuinely stuck wasn't the adding of tasks — every tool does that — it was the Friday report. Closing the week with concrete evidence of what shipped, instead of a vague feeling that I'd been busy, turned out to matter more than any feature I'd been paying for.

It's open source because if you've felt that same Friday dread, you might as well have the antidote too.

About

Arya Gorjipour — backend-focused engineer (.NET, Go, Rust), based in Tehran. Maintainer of logdive and shipflow.

GitHub: @Aryagorjipour
Twitter/X: @Arysmart1
LinkedIn: arysmart

Issues, bug reports, and pull requests welcome. If shipflow ends up writing your standup for you, I'd like to hear about it.

logdive v0.3.0 — the one where I finally added parens (and four more things my heart wanted)

Arya Gorjipour — Sun, 07 Jun 2026 14:00:00 +0000

v0.2.0 was a good release. I was happy with it. I used it.

Then I kept trying to write (level=error OR level=warn) AND service=payments and the tool just... didn't know what parens were. Three separate times. Same query. Same sigh. Same manual rewrite to flatten it.

I shipped logdive to scratch my own itch. My itch wasn't done itching.

v0.3.0 is five things my heart kept asking for while using v0.2. Parenthesised queries are the headline — I literally promised that in the v0.2 article. But there's also pagination in both the CLI and the API, case-insensitive level queries (because level=ERROR and level=error should absolutely be the same thing), a distrreleases/tag/v0.3.0oless Docker image that doesn't need curl to healthcheck itself, and a website that now exists.

# The thing I kept trying to write
logdive query '(level=error OR level=warn) AND service=payments'

# Page through results instead of drowning in them
logdive query 'service=payments' --limit 50 --offset 100

# Or through the API — same thing, different surface
curl 'http://localhost:4000/query?q=service%3Dpayments&limit=50&offset=100'

# These are finally identical. They were not.
logdive query 'level=ERROR'
logdive query 'level=error'

# Smaller container. No curl. Healthchecks itself.
docker pull ghcr.io/aryagorjipour/logdive:0.3.0

cargo install logdive logdive-api --force

417 tests passing. Five milestones. Binaries still 3.9 MB and 4.2 MB.

M1 — Parenthesised queries

This was the one. The v0.2 article literally put it at the top of the contributions list and called it "the v0.3 flagship." Accountability shipped.

The v0.2 grammar had two levels: OR > AND. AND binds tighter. Good enough until you need OR of multiple conditions grouped with AND of something else. Then you're doing De Morgan algebra on your query and that's not what you want at midnight.

query     := or_expr [ TIME_RANGE ]
or_expr   := and_expr (OR and_expr)*
and_expr  := clause (AND clause)*
clause    := field OP value
           | field CONTAINS string
           | "(" or_expr ")"    ← new
           | TIME_RANGE

(level=error OR level=warn) AND service=payments generates:

WHERE ((lower(level) = ? OR lower(level) = ?) AND json_extract(fields, '$.service') = ?)

The inner group gets its own SQL sub-expression. You can't construct something that silently breaks precedence — the generator always parenthesises each nesting level.

The new AST variant is Clause::Group(Box<QueryNode>). The Box is there because Rust won't let you have a recursively-sized type without it, which is a very Rust thing to be strict about. An arena would be cleaner if query parsing were a hot path. It isn't, so — Box.

M2 — `--offset` and the rename I had to make

The additive part: --offset is now a real flag.

logdive query 'service=payments' --limit 50
logdive query 'service=payments' --limit 50 --offset 50
logdive query 'service=payments' --limit 50 --offset 100

--offset 0 and no flag at all are the same thing. Default limit is still 1000. --limit 0 still means "all of them, good luck."

Adding offset meant execute(query, conn, Option<usize>) had to become execute(query, conn, QueryOptions). A bare Option<usize> for one parameter is fine. For two it starts getting philosophical. The struct should have been there from v0.1. It's there now.

The breaking part: --format on the query subcommand is now --output.

# v0.2
logdive query 'level=error' --format json

# v0.3
logdive query 'level=error' --output json

--format already existed on ingest to pick the input log format (JSON, logfmt, plain). Two different --format flags doing two different things on two different subcommands is a documentation problem that keeps getting worse. One word to fix it. I did not add a deprecated alias — a deprecated alias that silently works is just confusion that lives for three more versions.

M3 — HTTP pagination

GET /query now takes ?offset=. Mirrors --offset exactly.

curl 'http://localhost:4000/query?q=level%3Derror&limit=50&offset=0'
curl 'http://localhost:4000/query?q=level%3Derror&limit=50&offset=50'

The benchmark number I didn't expect: page 1 at 100k rows costs ~42 ms, deep page at offset 2450 costs ~50 ms. 8 ms overhead to skip 2450 rows. That's because LIMIT x OFFSET y in SQLite counts forward from zero — no scroll cursor, no magic. For building a UI on top of the API it's fine. For "give me rows 500,000 through 500,050" — use a time range query, it'll be faster and make more sense anyway.

M4 — `level=ERROR` and `level=error` are the same query now

This one seems like it should have been there from the start. It wasn't. If your service logs WARN and you searched level=warn, you got nothing and probably thought the tool was broken.

# All three hit the same index, return the same rows
logdive query 'level=ERROR'
logdive query 'level=warn'
logdive query 'level=Warning'

Implementation: a functional expression index.

CREATE INDEX IF NOT EXISTS idx_level_norm ON log_entries(lower(level));

The executor routes every level field lookup through lower(level) = ? with a Rust-lowercased bind value.

The wrong path I went down first: ALTER TABLE ADD COLUMN level_norm TEXT GENERATED ALWAYS AS (lower(level)) STORED. This works in SQLite! But it means existing databases need a migration and new installs use CREATE TABLE. You need a version guard to tell them apart. The functional index approach needs none of that — CREATE INDEX IF NOT EXISTS is idempotent, runs on every Indexer::open(), picks up existing databases automatically.

It's in docs/traps.md now. That file is starting to earn its name.

The benchmark result: lowercase, uppercase, mixed-case level queries on 100k rows — all three at ~51 ms. Identical. The index is doing exactly what it's supposed to do.

M5 — Distroless Docker and `--health-check`

The v0.2 Dockerfile had this healthcheck:

HEALTHCHECK CMD curl -fs http://localhost:4000/version || exit 1

curl is 3.6 MB. It's in the image to make one TCP connection every 30 seconds. I finally stopped accepting this.

Runtime stage is now gcr.io/distroless/cc-debian12:nonroot. No shell, no package manager, no curl, uid 65532. Container drops to ~15 MB.

Since distroless has no shell, CMD curl ... in the Dockerfile gets rejected at build time. Good. That rejection forced me to do the right thing.

logdive-api now takes --health-check:

logdive-api --health-check
# opens TcpStream::connect("127.0.0.1:<port>")
# exits 0 if the server is up, exits 1 if it isn't

HEALTHCHECK CMD ["/usr/local/bin/logdive-api", "--health-check"]

The binary checks itself. No curl. No shell.

One trap: you can't do RUN mkdir -p /data in a distroless runtime stage. No shell to interpret it. You have to create the directory in the builder stage and copy it across:

# Builder
RUN mkdir -p /data

# Runtime
COPY --from=builder /data /data

The error message when you forget this is not particularly helpful. Now it's here instead.

Breaking changes

Three things that will break something:

Scope	Old	New	Fix
CLI	`query --format json`	`query --output json`	one word in your scripts
`logdive-core` lib	`execute(q, conn, Option<usize>)`	`execute(q, conn, QueryOptions)`	see below
Docker	`curl GET /version` healthcheck	`--health-check` TCP flag	update compose / k8s probes

Library migration:

// v0.2
execute(&query, &conn, Some(1000))?;
execute(&query, &conn, None)?;

// v0.3
use logdive_core::executor::QueryOptions;
execute(&query, &conn, QueryOptions { limit: Some(1000), offset: None })?;
execute(&query, &conn, QueryOptions::default())?;  // no limit, no offset

QueryOptions::default() is zero-offset, no-limit. If you were passing None before, it's a drop-in.

Benchmarks

New groups for v0.3 features (100k rows):

What	Scenario	Number
OR queries	2-branch, 50% match	68 ms
OR queries	4-branch, 100% match	99 ms
OR queries	JSON field	2.5 ms
Paren groups	`(A OR B) AND C`, 12.5% match	45 ms
Case-insensitive level	lowercase / UPPERCASE / Mixed	~51 ms, identical
Pagination	page 1	42 ms
Pagination	deep page (offset 2450)	50 ms

Ingest numbers haven't changed (v0.2 ingest paths weren't touched):

What	Number
Batched insert, 10k rows	~189k rows/s
Parse + insert, 10k rows	~150k rows/s

The 4-branch OR at 99 ms looks alarming until you remember it's returning every single row from a 100k-row corpus. The bottleneck is serialisation. The query engine itself is fine.

Tradeoffs, I'll be honest

--format → --output will break your scripts. No alias. One word, then it's clean. If you want to be annoyed at me about it — fair, but this was the right call.

Distroless means no shell in the container. You can't docker exec -it mycontainer bash anymore. Use gcr.io/distroless/cc-debian12:debug if you need to poke around. The tradeoff is worth it, but it's a real operational change worth knowing about before you're in an incident.

Deep offset pagination has a cost. SQLite walks from row zero every time. For page 50 it's fine. For page 50,000, consider a time-range query instead — it'll be faster and is usually what you actually wanted anyway.

The full landing page redesign is still pending. The site has v0.3.0 content — accurate numbers, updated terminal preview. The full Astro 5 + Tailwind 4 redesign is waiting on a design file. Not v0.3's problem.

The website is live

Speaking of which: aryagorjipour.github.io/logdive is updated and real.

Stat cards reflect the current benchmarks. Terminal preview shows --output json and a paren query. The roadmap section is accurate. There's a GitHub stars counter that does a client-side fetch and falls back to — gracefully if the API is having a day.

Go look at it. Tell me what's wrong with it.

What's next — and I'm taking a break after this

v0.4.0 planned scope:

Benchmark suite at 500k rows (100k isn't stressful enough for the executor's real hot paths)
Query latency improvements
--output yaml and --output csv
Windows --follow — the (dev, ino) rotation check has been Unix-only since v0.2
Configurable retention by source/tag

But honestly: after this release I'm stepping back from logdive for a bit to work on some other projects. v0.3.0 is in a clean state. prerelease-check.sh passes all 11 steps. 417 tests green. The breaking changes are documented.

Good place to breathe.

If something genuinely breaks (security issue, data loss) — file an issue, I'll look. Everything else waits for v0.4.0.

Repo: github.com/Aryagorjipour/logdive
Website: aryagorjipour.github.io/logdive
Crates: logdive · logdive-core · logdive-api
Docker: ghcr.io/aryagorjipour/logdive

Arya Gorjipour — backend engineer, logdive maintainer.
@Aryagorjipour · @Arysmart1

If you run cargo bench on your machine and the numbers are interesting — I want to see them. If you debug a real incident with v0.3.0 — I really want to hear about that.

logdive v0.2.0: OR queries, follow mode, and the features I shipped after I tried using my own tool

Arya Gorjipour — Tue, 26 May 2026 18:00:00 +0000

Two weeks after I shipped logdive v0.1.0, I tried using it.
Within an hour I wanted to write level=error OR level=warn. The README said no.
Within a day I wanted to tail a growing log file. The README said no.
Within a week I had a 1.8 GB index from a CI pipeline and no way to trim it. The README didn't say anything, because I hadn't thought about it.
logdive v0.2.0 ships everything the v0.1 README told you not to ask for.

Twenty seconds: ingest a couple of structured lines, then query them with OR. No daemon. No config file. One binary.

What's new at a glance

# Tail a live log file — finally.
logdive ingest --file ./logs/app.log --follow

# logfmt and plain text, not just JSON.
logdive ingest --file ./nginx-access.log --format logfmt --tag nginx
logdive ingest --file ./app-old.log --format plain --timestamp-now

# OR. Just OR.
logdive query 'level=error OR level=warn'

# Trim the index before it eats your disk.
logdive prune --older-than 30d

# Run the whole thing in Docker. Multi-arch. /data persists.
docker run -d -v logdive-data:/data -p 4000:4000 ghcr.io/aryagorjipour/logdive

# Tell a client what the running server can do, with one call.
curl http://localhost:4000/version
# → {"version":"0.2.0","formats":["json","logfmt","plain"],"capabilities":["query","stats","version"]}

Available now:

cargo install logdive logdive-api --force
# or
docker pull ghcr.io/aryagorjipour/logdive:0.2.0

Six numbered milestones. 330 tests passing. Both binaries still under 10 MB (logdive 3.9 MB, logdive-api 4.2 MB). The compiled Docker image lands at 97 MB.

"v1 non-goals" was an aspirational list. Then I read it back.

The v0.1 README had a tidy little section calling six things explicit non-goals. Some of them genuinely should be non-goals (multi-machine clustering, log shipping daemons). Three of them were just "things I didn't want to write yet, in a costume."

OR queries. I argued in the v0.1 post that AND covered the dominant query pattern and OR would roughly double the parser. Both were true. Both were also irrelevant the first time I needed level=error OR level=warn during a real incident and couldn't write it. "I don't want to grow the parser" is a feeling, not an engineering position.

Follow mode. v0.1 was a batch tool. Pipe a file in, query it, walk away. The first time I wanted to watch a container that was actively misbehaving, the missing --follow flag turned a five-second loop into "re-ingest every thirty seconds." That's not a tool — that's a chore the tool is now causing.

Non-JSON ingestion. "v1 is JSON-only" sounded principled. It stopped sounding principled the first time someone tried logdive against nginx access logs or a legacy Java service emitting Apache-style plaintext with a [ERROR] prefix.

Pruning. This one I just didn't think about. Then I ran logdive against a CI pipeline for two weeks and the index hit 1.8 GB. The fix in v0.1 was rm ~/.logdive/index.db && start over. The kind of UX I'd mock in someone else's tool.

So: v0.2.0 ships all four. Plus a Docker image, plus a versioned API, because if you're going to break your own scope rules you may as well do it once.

M1 — OR

The query language now has a real two-level grammar:

query    := and_expr (OR and_expr)*
and_expr := clause (AND clause)*

AND binds tighter than OR, the way it does in SQL and the way your fingers expect:

logdive query 'level=error OR level=warn'
logdive query 'level=error AND service=payments OR level=warn AND tag=worker'
# Reads as: (level=error AND service=payments) OR (level=warn AND tag=worker)

Parenthesised groups ((a OR b) AND c) are still out — that's the headline v0.3 milestone. AND+OR covers ~95% of real queries; the remaining 5% you can reshape with De Morgan's laws and a small sigh.

Implementation note for anyone curious: it's still a hand-written recursive-descent parser, no parser-combinator library, ~340 lines of pure Rust in crates/core/src/query.rs. Two new methods (parse_or_expr, parse_and_expr), one new AST variant (AndGroup { clauses: Vec<Clause> }), and a SQL generator that always parenthesises each AND-group so WHERE clauses come out unambiguous:

WHERE (level = ? AND json_extract(fields, '$.service') = ?)
   OR (level = ? AND tag = ?)

One breaking change for library users: QueryNode::And(Vec<Clause>) is now QueryNode::Or(Vec<AndGroup>). Even single-clause queries wrap in the two-level structure — uniformity for the executor, slight clumsiness if you were pattern-matching. CLI users see nothing.

M2 — logfmt and plain

logdive ingest gained a --format flag with three values:

# Default — JSON, identical to v0.1
logdive ingest --file app.log

# logfmt — service=payments user_id=4812 duration_ms=120
logdive ingest --file legacy.log --format logfmt

# Plain text — whole line becomes the `message` field
logdive ingest --file random-app.log --format plain

The logfmt parser is hand-written: roughly 200 lines, no nom, no pest. It handles escaped quotes (\", \\), bareword booleans (debug → debug=true, the Heroku convention), hyphenated and dotted keys (request-id, user.id), and the last-write-wins rule on duplicate keys that go-kit/log uses. A malformed token inside an otherwise-fine line is skipped to the next whitespace boundary; only a truly fatal condition — empty input, no parseable pairs, an unterminated quote — drops the whole line. Real-world logfmt is messy; the parser tries to be the more polite party.

Plain is what it sounds like. The entire line becomes LogEntry::message. No timestamp parsing, no level extraction, no heuristics — because every plaintext format encodes those differently and a wrong guess silently corrupts your last 2h queries.

For formats without their own timestamps, a new --timestamp-now flag stamps each entry with the current ingestion time:

docker logs my-container | logdive ingest --format plain --timestamp-now

Opt-in. v0.1's skip-if-no-timestamp policy is still the default, because fabricating timestamps without explicit consent is the kind of foot-gun you discover at 3am.

M3 — `--follow`

logdive ingest --file ./logs/app.log --follow
# Ctrl-C to exit. Detects rotation. Handles truncation.

The implementation lives in crates/core/src/follow.rs, gated behind #[cfg(unix)]. It's built on the notify crate for inotify/FSEvents/kqueue plus a FileTailer struct that tracks (dev, ino, offset) and reacts to two conditions:

Rotation — (dev, ino) on the watched path changes. The handle gets dropped, the path is reopened, the offset resets to zero.
Truncation — same inode, but the file size shrank below the tracked offset. someone-ran > /var/log/app.log. Offset resets, reading continues from the top.

Both checks run on every read. The tailer starts at EOF (matches tail -f), buffers partial lines until the newline arrives, and strips \r\n for the Windows-line-ending crowd. Fifteen unit tests in follow.rs cover the boring edges: 20 KiB single line spanning three read buffers, unicode, blank lines, mid-rotation gap where the file briefly doesn't exist.

The CLI loop wires it together with ctrlc for clean SIGINT/SIGTERM handling and std::sync::mpsc for the watcher → ingest channel. The CLI stays fully synchronous — no tokio dependency, because watching a file and reading lines does not need an async runtime, and adding one for the sake of fashion was tempting and I resisted.

One real limitation: Windows. The (dev, ino) trick uses std::os::unix::fs::MetadataExt. The module is gated; on Windows the flag is rejected with an error. Cross-platform rotation detection is a clean v0.3 contribution.

M4 — `prune` and `LOGDIVE_DB`

logdive prune --older-than 30d
logdive prune --before 2026-01-01
logdive prune --older-than 7d --yes  # skip [y/N] prompt for cron

Two mutually-exclusive flags, one of which is required (clap's ArgGroup enforces this). --older-than accepts a count plus a unit (m/h/d); --before accepts RFC 3339, naive UTC datetime, or a bare date. By default prune counts the doomed rows, shows you the number, and asks for confirmation. --yes skips for scripts.

Under the hood it's DELETE FROM log_entries WHERE timestamp < ?1 followed by VACUUM. The VACUUM is a separate statement because SQLite refuses to run it inside an explicit transaction — a thing I learned by writing the obvious code first, hitting the error, and reading the SQLite docs second.

Comparison is strict <, so a row whose timestamp exactly equals the cutoff is kept. Surprising? Yes, slightly. Documented? Now it is.

Also in M4: the API used to honour LOGDIVE_DB as an environment-variable fallback for --db. The CLI did not. v0.2 fixes the asymmetry — both binaries now respect it, with the command-line flag taking precedence when both are set.

M5 — `GET /version` and configurable CORS

The HTTP API got a third endpoint:

$ curl http://localhost:4000/version
{"version":"0.2.0","formats":["json","logfmt","plain"],"capabilities":["query","stats","version"]}

Three fields, all compile-time constants. version is env!("CARGO_PKG_VERSION"). formats is LogFormat::ALL.iter().map(|f| f.name()).collect() — adding a new format to core automatically propagates here, no manual maintenance. capabilities is hard-coded today, sorted alphabetically with a test pinning that ordering so future additions can't accidentally break a client that did assert_eq!.

The endpoint never touches the database. It's reachable before the first ingest, after a prune --yes, during a fresh Docker volume's first millisecond — which is exactly what makes it a good HEALTHCHECK target (see M6).

CORS is now configurable via --cors-origins or LOGDIVE_API_CORS_ORIGINS. Disabled by default (same-origin only). Pass * to allow any origin, or a comma-separated list for specific ones. Mixing * with explicit origins is rejected at startup because it's meaningless and almost certainly a typo:

logdive-api --cors-origins 'https://app.example.com,https://staging.example.com'
logdive-api --cors-origins '*'                  # development convenience
logdive-api                                      # production default — no CORS at all

The implementation is tower-http's CorsLayer, restricted to GET only because the API is and stays read-only. The router wiring is the only place the methods list lives, so adding write endpoints would force a deliberate code change rather than a one-line config fix. This is the kind of friction you want.

One trap I hit during M5 worth flagging: tower-http = "0.6" checks axum version compatibility at the trait-bound level. My first cargo add tower-http --features cors got axum 0.7.5 from cache where 0.7.6+ is needed. The error said Service trait not satisfied. cargo update -p axum fixed it but the error message gave me zero hints about the root cause.

M6 — Multi-arch Docker

docker pull ghcr.io/aryagorjipour/logdive:0.2.0   # or :latest

docker run -d \
  --name logdive \
  -v logdive-data:/data \
  -p 4000:4000 \
  ghcr.io/aryagorjipour/logdive

curl http://localhost:4000/version

Multi-stage build with cargo-chef so dependency compilation is cached across source-only changes. debian:bookworm-slim runtime. Non-root user (logdive, UID 1000). /data volume for index persistence. EXPOSE 4000. HEALTHCHECK curls /version every 30 seconds — no DB access, no false negatives during a long query.

Both binaries ship in one image. Default ENTRYPOINT is logdive-api; the CLI is reachable through --entrypoint logdive:

docker run --rm \
  -v logdive-data:/data \
  -v /var/log/app:/logs:ro \
  --entrypoint logdive \
  ghcr.io/aryagorjipour/logdive \
  ingest --file /logs/app.log --tag production

Two image-level environment defaults that don't exist on host installs:

LOGDIVE_DB=/data/index.db — points both binaries at the persistent volume.
LOGDIVE_API_HOST=0.0.0.0 — overrides the binary's loopback default so -p 4000:4000 actually does something.

The second one matters: on a host install the API binds 127.0.0.1 for security. In a container, that would make the published port unreachable. The override is the right default for the container. It is also why the README is explicit that exposing port 4000 with no reverse proxy puts a read-only API on the public internet. Read-only is defence in depth, not access control.

GitHub Actions builds for linux/amd64 and linux/arm64 via docker buildx + QEMU, pushes to GHCR with the workflow's built-in GITHUB_TOKEN (no PAT to rotate). Builds run on every push to main, every push to a release/v* branch, every v* tag, and on PRs without pushing.

One CI gotcha I'll save you the debug session for: cache-to: type=gha, mode=max reliably 502s on cache export for this workspace. The cache backend disagrees with how large the intermediate-layer export is. mode=min works. PR builds skip cache write entirely. The current .github/workflows/docker.yml has the working config.

While I was in there, I also fixed a v0.1 UX paper-cut: logdive-api used to refuse to start if the database file was missing. That made sense for --db /home/me/typo.db; it was hostile for docker run -v fresh-volume:/data. v0.2 auto-creates an empty index with the right schema on first run, prints a one-line note to stderr explaining what just happened, and otherwise lets the server come up. Genuinely bad paths (non-existent parent, permission denied) still fail fast.

The updated query grammar, in one place

query    := and_expr (OR and_expr)*
and_expr := clause (AND clause)*
clause   := field OP value
          | field CONTAINS string
          | TIME_RANGE
field    := [a-zA-Z_][a-zA-Z0-9_.]*
OP       := "=" | "!=" | ">" | "<"
TIME_RANGE := "last" duration | "since" datetime
duration := number ("m" | "h" | "d")

AND, OR, CONTAINS, last, since, true, and false are case-insensitive. Known fields (timestamp, level, message, tag) hit indexed columns. Everything else routes through json_extract(fields, '$.<key>') — slower than a real index, but works against arbitrary JSON shapes without forcing a schema. SQL output always parenthesises each AND-group so the WHERE clause is unambiguous regardless of how many disjuncts you stack.

Tradeoffs, honestly

A few v0.1 tradeoffs are resolved. New ones replaced them.

Multi-format ingestion adds a small dispatch cost. Every line goes through a LogFormat match arm before hitting its parser. ~1–2% overhead on the benchmark suite, which I'd characterise as "not a thing" but felt obliged to mention.

Follow mode is Unix-only. The rotation check uses (dev, ino) from std::os::unix::fs::MetadataExt. Windows compiles fine but rejects --follow at runtime. A notify-based fallback for the rotation half is a real contribution opportunity.

The Docker image is 97 MB. debian:bookworm-slim + two binaries + curl + ca-certificates. A musl-static build against a distroless runtime would cut this to ~10 MB but adds linker complexity with bundled rusqlite and blake3. Deferred.

LOGDIVE_API_HOST=0.0.0.0 in the container is correct but dangerous. The README repeats the warning twice; the API stays read-only specifically so a misconfigured deployment leaks data, not the world.

OR without parens covers ~95%. (level=error OR level=warn) AND service=payments requires duplicating the service clause as level=error AND service=payments OR level=warn AND service=payments. Parenthesised expressions are the v0.3 headline.

Performance

Fresh cargo bench numbers from v0.2.0. Ingest and query paths weren't touched in v0.2, so these track v0.1 to within run-to-run variance:

Operation	Throughput / Latency
Ingestion, batched insert (10k rows)	~206k lines/sec
Ingestion, parse + insert end-to-end (10k rows)	~164k lines/sec
Query on known field, empty result (100k rows)	~20 µs
Query on known field, 25% match (100k rows, LIMIT 1000)	~51 ms
Query on JSON field, 25% match (100k rows, LIMIT 1000)	~4.0 ms
Query on JSON field, 0% match — full scan (100k rows)	~68 ms
`CONTAINS` full-table scan (100k rows)	~37–41 ms
3-clause `AND` chain, mixed known + JSON (100k rows)	~23 ms

Reproducible from crates/core/benches/. cargo bench runs the suite; criterion writes HTML reports to target/criterion. The 25%-match number is the most variance-sensitive (it walks the result set rather than short-circuiting on a count); your hardware will differ.

Release binary sizes: logdive 3.9 MB, logdive-api 4.2 MB. The "<10 MB" budget from v0.1 still holds, with room.

Upgrading from v0.1.0

CLI users:

cargo install logdive logdive-api --force
# or
docker pull ghcr.io/aryagorjipour/logdive:0.2.0

No CLI flags were removed or renamed. Every v0.1 command still works.

Library users (logdive-core directly): two breaking changes.

// Before
match node {
    QueryNode::And(clauses) => /* ... */,
}

// After
match node {
    QueryNode::Or(groups) => {
        for group in groups {
            for clause in &group.clauses { /* ... */ }
        }
    }
}

// Before
parse_line(line)

// After
parse_line(LogFormat::Json, line)  // explicit format selector

Both are mechanical migrations.

Call for contributions

v0.1's contribution list had seven items. Four shipped in v0.2 (OR, non-JSON formats, follow mode, Docker image). Three remain, plus the work v0.2 exposed:

Parenthesised query expressions. The v0.3 flagship. Recursive descent on the two-level grammar, SQL generator update that emits balanced parens correctly. Well-scoped.
A browser UI. The GET /version endpoint exists specifically to make feature detection easy. The API is CORS-configurable. A weekend with React/Svelte/HTMX and you have a real UI.
Generated columns for hot JSON fields. The big win on JSON-field query speed. Mark a field as "promote to indexed column" and get known-field performance for it.
Windows rotation detection. A notify-based fallback to replace the (dev, ino) check that's currently Unix-only.
Distroless / musl Docker image. 97 MB → ~10 MB. Real engineering, real win.
More log formats. Apache common log format, syslog RFC 5424, journalctl JSON. LogFormat is set up for additions — one enum variant, one parser module, one dispatcher arm.
Benchmarks on more hardware. Run cargo bench on your machine, PR the README table.

The repo has CI, integration tests against tempfile-backed SQLite, conventional-commit history, and a release process that's a documented sequence of one-line commands. Bug reports and PRs at github.com/Aryagorjipour/logdive.

A short field guide to shipping v0.2 of anything

Five things v0.2 taught me that the v0.1 release didn't:

Auto-create on first run beats fail-fast in containers. Host CLIs should fail loudly on missing files. Containers with fresh volumes should not. Same code, two correct behaviours, switched by whether you trust the path.
cargo publish --workspace is a Cargo 1.90+ feature. Before that, dry-running all three crates simultaneously fails because the downstream ones can't resolve their unpublished dep. Publish core, wait 30 seconds, publish the rest. The scripts/prerelease-check.sh in the repo handles both versions.
tower-http and axum are coupled at the trait-bound level. Bumping one without the other compiles, then explodes when you .layer() it. cargo update -p axum after cargo add tower-http is muscle memory now.
GHA cache mode=max is a 502 generator on real workspaces. mode=min works. PR builds should skip cache write entirely.
QEMU multi-arch Rust builds are slow but reliable. ~12 minutes cold, ~2 minutes warm with cargo-chef. Native arm64 runners would halve this. Until then, you wait.

Each of these took a CI run or a stack trace to learn. None of them are in any "publishing a Rust workspace" tutorial I can find. They're in this article now, which is the only reason I'm slightly less annoyed about having learned them.

About

Arya Gorjipour — backend engineer, logdive maintainer.

GitHub: @Aryagorjipour
X / Twitter: @Arysmart1
LinkedIn: arysmart

If you ship a v0.3 contribution from the list above, I want to hear about it. If you use logdive to debug a real incident, I want to hear about that too. The most interesting bug reports get a hat tip in the next release post.

I wanted jq with memory, time ranges, and filters. So I built logdive

Arya Gorjipour — Tue, 28 Apr 2026 18:00:00 +0000

Your app is in production. Something broke at 2am. Your options are:

grep through a rotated log file, squinting at terminal output.
Chain together half a dozen jq pipes until the command line becomes unreadable.
Page an SRE to query your observability stack, assuming you have one.
Spin up Loki or Elastic locally, spend two hours on config, and then do the actual investigation.

All four of these suck. Either you're limited to flat text tooling, or you're paying for infrastructure complexity you don't need.

logdive is what sits in the gap. It's a single Rust binary. You drop it anywhere, point it at a log file or pipe Docker output into it, and you get a fast, queryable index on your local machine. No daemon. No cloud. No YAML. Just cargo install logdive.

# Ingest logs from a file or pipe from stdin.
logdive ingest --file ./logs/app.log
docker logs my-container | logdive ingest --tag my-container

# Query the index.
logdive query 'level=error AND service=payments last 2h'
logdive query 'message contains "timeout"' --format json | jq

# Inspect what you've indexed.
logdive stats

# Optionally expose a read-only HTTP API for remote querying.
logdive-api --db ./logdive.db --port 4000
curl 'http://127.0.0.1:4000/query?q=level%3Derror&limit=100'

That's the whole product surface.

Why logdive exists

Every backend engineer has hit the wall this is built for: your application emits perfectly good structured JSON logs. Tools for querying that JSON locally are stuck in the extremes:

jq is for a single file, one-shot, no memory, no time ranges, no filters-across-files.
Loki, Datadog, Elastic, Splunk all demand infrastructure, cost, and configuration that's overkill for a side project, small team, or personal investigation.

The target user is a backend engineer who wants jq with memory, filters, and time ranges — without YAML files, without a running daemon they didn't ask for, without a monthly bill.

Rust makes this credible in a way no other language quite does: a single self-contained binary with no runtime, zero-copy parsing, SQLite bundled directly into the binary, and real concurrency for ingestion. This is the kind of tool Rust is genuinely good at.

Who this is for (and who it isn't)

Good fit:

Backend engineers debugging production incidents from local log copies.
Small teams without a dedicated observability budget.
Anyone who's ever built a 4-stage jq pipeline and wished it was searchable afterward.
Folks running Docker locally who want docker logs my-container | logdive ingest and instant querying.
CI pipelines that need to grep through structured output of a previous step.

Bad fit (and I'll be explicit about this):

Multi-machine, networked indexes. logdive is single-host by design.
Real-time log tailing / tail -f style follow mode. Not in v0.1.0.
Anything needing authentication on the HTTP endpoint. The v1 API assumes the network layer handles access control.
Massive enterprise-scale log volumes. SQLite handles a lot, but if you're indexing 100GB/day, you want Loki.
Non-JSON log formats (plaintext, logfmt, syslog). v1 is JSON-only.

The whole scope is deliberately small. v0.1.0 ships what a side project or small team needs, nothing more.

The query language

Small enough to fit in your head, expressive enough to be useful:

level=error
level=error AND service=payments
message contains "database timeout"
level=error last 2h
tag=api AND status > 499 since 2026-04-15
user_id=4812 AND duration_ms > 500

Operators: =, !=, >, <, CONTAINS. Time ranges: last Nm/Nh/Nd or since <datetime>. Clauses chain with AND. Known fields (timestamp, level, message, tag) hit SQLite indexes directly. Unknown fields go through json_extract() on a JSON blob — slower but works for arbitrary JSON shapes.

No OR in v0.1.0 — it's the single biggest v1 non-goal. AND covers the dominant query pattern, and adding OR requires a two-level grammar plus precedence handling that would roughly double the parser. Deferred to v2 deliberately.

Under the hood

For readers who like the implementation details:

Three-crate Cargo workspace. logdive-core is pure library (parser, indexer, query engine — no I/O at module level), logdive is the CLI binary, logdive-api is the HTTP server binary. Each is independently publishable.
SQLite via rusqlite with the bundled feature. Zero infrastructure, battle-tested, ships inside the binary at ~500KB.
Hybrid storage. Known fields (timestamp, level, message, tag) are real indexed columns. Everything else is stored in a JSON blob queryable via SQLite's json_extract(). This is the only way to handle arbitrary JSON shapes without a schema-bound design.
Hand-written recursive descent query parser. ~200 lines of Rust enums. No parser-combinator dependency. Better error messages than generated parsers, and honestly, it was one of the most satisfying parts of the project to write.
Blake3 row hashing for deduplication. INSERT OR IGNORE on a unique hash column means re-ingesting a file (or dealing with log rotation) is free. No duplicate rows. The hash is cheap — negligible per-line cost.
Batched inserts at 1000 rows per transaction. Standard SQLite throughput pattern.
Axum HTTP API. Read-only via SQLITE_OPEN_READ_ONLY, blocking SQLite work wrapped in tokio::task::spawn_blocking so it doesn't block Tokio's worker threads, graceful shutdown on Ctrl-C and SIGTERM.

The full architecture is documented in the repo's README.

Performance

Representative numbers on an Acer Nitro 5 laptop, measured via criterion:

Operation	Throughput / Latency
Ingestion, batched insert (10k rows)	~210k lines/sec
Ingestion, parse + insert end-to-end (10k rows)	~166k lines/sec
Query on known field, empty result (100k rows)	~17 μs
Query on known field, 25% match (100k rows, LIMIT 1000)	~39 ms
Query on JSON field, 25% match (100k rows, LIMIT 1000)	~3.6 ms
Query on JSON field, 0% match (full scan, 100k rows)	~68 ms
`CONTAINS` full-table scan (100k rows)	~36–40 ms
3-clause `AND` chain (100k rows)	~22 ms

Release binaries at 3.7 MB (logdive) and 4.1 MB (logdive-api) — well under the 10 MB target, thanks to LTO + strip + panic=abort in the release profile.

Run cargo bench in the repo to get your own baseline.

Tradeoffs worth being honest about

A few design decisions have real downsides that users should know:

Timestamps compared as lexical TEXT. This is correct for ISO-8601-shaped timestamps (they sort chronologically when compared as strings), but any exotic timestamp format will silently misorder. Default timestamps from modern structured loggers are ISO-8601, so in practice this is rarely a problem — but it's a real sharp edge.

No index on json_extract expressions. Queries on unknown JSON fields fall back to full table scans. 100k rows scans in ~68ms which is still fast, but if you're hammering the same unknown field constantly, it's slower than a known-column query by 1000x. A future version could promote frequently-queried JSON fields to real columns.

Single-host only. There's no clustering story. If you need distributed query across machines, you want Loki or Elastic.

No authentication on the HTTP API. Deliberate for v1. If you expose logdive-api beyond localhost, put a reverse proxy with auth in front of it. The binary defaults to binding 127.0.0.1 for a reason.

Install and try it

From crates.io:

cargo install logdive logdive-api

From prebuilt binaries: grab the tarball for your platform from the GitHub Releases page. Linux x86_64 and macOS arm64 are built on every tag push.

From source:

git clone https://github.com/Aryagorjipour/logdive
cd logdive
cargo build --release

MSRV: Rust 1.85 (edition 2024). Dual-licensed MIT OR Apache-2.0.

Try the included examples:

logdive --db /tmp/demo.db ingest --file examples/app.log
logdive --db /tmp/demo.db ingest --file examples/nginx.log
logdive --db /tmp/demo.db stats
logdive --db /tmp/demo.db query 'level=error AND service=payments'

Call for contributions

v0.1.0 is deliberately small, but there's a clear set of high-value v2 features that would benefit hugely from community help. If you want to contribute, these are genuine needs:

OR operator in the query language. The single most-requested feature implied by v1's scope. Extends the parser to handle a two-level grammar (clauses joined by OR, OR-groups joined by AND) and the SQL generator to emit parenthesized disjunctions. Non-trivial but well-scoped.

Non-JSON log format support. Plaintext and logfmt are the obvious next formats. Would plug in as additional parser implementations alongside parse_line in logdive-core.

Follow mode (-f / tail-and-index). Watch a log file for new lines and ingest them as they appear. Good use of tokio::fs and notify crate patterns.

A browser UI. The HTTP API is ready for one — someone with frontend chops could build a single-page React/Svelte/HTMX UI that talks to logdive-api and gives people a browser-based query interface.

Generated columns for frequently-queried JSON fields. The big performance win. Would let users mark certain JSON fields as "promote to indexed column" and get known-field query performance for those.

Benchmarks on more hardware. If you run the existing cargo bench suite on your machine, an issue/PR updating the README's Performance section with a broader sample would be genuinely useful.

Docker image for the HTTP API. Dockerfile for logdive-api with a volume mount for the index database. Natural next step for users who want to run the API as a service.

The repo has CI, benchmarks, clean test coverage, and a documented contribution workflow. Issues and pull requests at github.com/Aryagorjipour/logdive.

A note on context

logdive started as the final project in a Rust learning journey. The framing mattered — I wanted a project that was small enough to finish, demanding enough to exercise real Rust (parsers, SQLite, async, concurrency, CLI, HTTP), and useful enough that I'd actually keep using it afterward.

What I underestimated: how much of the effort lives in the parts that aren't writing code. Setting up a clean workspace. Choosing the right abstractions between core and binaries. Writing benchmarks that actually measure what you think they measure. Testing an HTTP server with tower::ServiceExt::oneshot. Packaging a three-crate workspace for crates.io when one crate depends on another and you're publishing for the first time. Each of these had at least one subtle gotcha.

The project is open source because the next person hitting the "jq vs Datadog" wall might as well benefit from it, and because Rust has given me enough that I want to give something back.

About

Arya Gorjipour — backend engineer, Rust learner, logdive maintainer.

GitHub: @Aryagorjipour
Twitter/X: @Arysmart1
LinkedIn: arysmart

Issues, bug reports, and pull requests welcome. If you end up using logdive to debug a real production incident, I'd love to hear about it.

DEV Community: Arya Gorjipour

I wanted a task tracker that gets out of my way and remembers what I shipped. So I built shipflow

Why I quit my task manager

What I actually get out of it

How it actually works

Who this is for (and who it isn't)

Under the hood

Tradeoffs worth being honest about

Install and try it

Where it's thin (PRs welcome)

A note on context

Links

About

logdive v0.3.0 — the one where I finally added parens (and four more things my heart wanted)

M1 — Parenthesised queries

M2 — --offset and the rename I had to make

M3 — HTTP pagination

M4 — level=ERROR and level=error are the same query now

M5 — Distroless Docker and --health-check

Breaking changes

Benchmarks

Tradeoffs, I'll be honest

The website is live

What's next — and I'm taking a break after this

logdive v0.2.0: OR queries, follow mode, and the features I shipped after I tried using my own tool

What's new at a glance

"v1 non-goals" was an aspirational list. Then I read it back.

M1 — OR

M2 — logfmt and plain

M3 — --follow

M4 — prune and LOGDIVE_DB

M5 — GET /version and configurable CORS

M6 — Multi-arch Docker

The updated query grammar, in one place

Tradeoffs, honestly

Performance

Upgrading from v0.1.0

Call for contributions

A short field guide to shipping v0.2 of anything

Links

About

I wanted jq with memory, time ranges, and filters. So I built logdive

Why logdive exists

Who this is for (and who it isn't)

The query language

Under the hood

Performance

Tradeoffs worth being honest about

Install and try it

Call for contributions

A note on context

Links

About

M2 — `--offset` and the rename I had to make

M4 — `level=ERROR` and `level=error` are the same query now

M5 — Distroless Docker and `--health-check`

M3 — `--follow`

M4 — `prune` and `LOGDIVE_DB`

M5 — `GET /version` and configurable CORS