Indexing Solana Programs in Rust: Notes From a Python Backend Engineer

#rust #postgres #solana #cryptocurrency

TL;DR

I built a small Solana program activity indexer in Rust to pressure-test the patterns I rely on every day in Python — cursor-based syncs, idempotent ingestion, mockable I/O — against an unfamiliar language and an unfamiliar chain. The repo is here: https://github.com/tyu1996/SPAI. This post walks through the three design choices I'd defend in a review.

Why I built it

I've spent the last three years writing FastAPI and MSSQL backends for retail and hospitality platforms running across 15+ distributed sites. The work taught me to value boring, idempotent, restart-safe systems above almost everything else.

I'm now actively transitioning toward Rust and, longer term, Web3 infrastructure work. I needed a portfolio project that:

Is small enough that one engineer can finish it.
Exercises async Rust, a real database, and a real external API.
Shows the same engineering instincts I'd bring to a production team.

A Solana program indexer fit. The Solana JSON-RPC is well-documented, the data model is messy enough to be interesting, and the requirements naturally push you toward the kind of decisions you want a candidate to make on their own.

What it does

For each configured program ID, the service:

Asks Solana RPC for signatures that touched the program since the last one it saw.
Fetches the full transaction payload for each new signature.
Stores normalized metadata (slot, block time, success, fee) plus the raw JSON in Postgres.
Records ingestion errors instead of failing the batch.

There's a small Axum HTTP API on top — /health, /programs, /programs/:id/transactions, /transactions/:signature — and a minimal static dashboard for browsing.

Three design choices worth sharing

1. Cursor-based incremental sync

Each tracked program row carries a last_seen_signature and last_seen_slot. When the worker polls RPC, it passes until = last_seen_signature so Solana only returns signatures newer than the cursor. The cursor advances only when the new slot is greater than or equal to the recorded one:

UPDATE tracked_programs
SET last_seen_signature = $2, last_seen_slot = $3, updated_at = now()
WHERE program_id = $1
  AND (last_seen_slot IS NULL OR last_seen_slot <= $3);

That guard matters more than it looks. Out-of-order processing inside a batch is a real possibility; the guard means the cursor can never rewind.

2. Idempotent ingestion

Solana transactions are immutable once confirmed, which makes them a perfect fit for upserts. Every ingestion path uses ON CONFLICT DO UPDATE or ON CONFLICT DO NOTHING:

transactions(signature) is a primary key — re-ingesting refreshes metadata without duplicating rows.
program_transactions(program_id, signature) is the many-to-many join with DO NOTHING.
Errors are written to a separate ingestion_errors table, not raised, so one bad signature never blocks the rest of the batch.

The practical payoff: I can drop the database, replay a backfill, and end up with the exact same state. That's the property I always want on the worker side of a system.

3. A mockable RPC trait

The RPC client is behind a trait:

#[async_trait]
pub trait SolanaRpc: Send + Sync {
    async fn signatures_for_program(
        &self,
        program_id: &str,
        until: Option<String>,
        limit: usize,
    ) -> Result<Vec<ProgramSignature>, AppError>;

    async fn transaction_json(&self, signature: &str) -> Result<Value, AppError>;
}

The production implementation wraps solana_client::nonblocking::rpc_client::RpcClient. Tests pass a fake implementation. The parser is a pure function over serde_json::Value, so the most error-prone code path — turning a wildly nested transaction JSON into rows — is unit-testable without standing up a chain.

What surprised me coming from Python

A few honest observations after a few weeks in idiomatic async Rust:

Arc<dyn Trait> is the async equivalent of "just inject the dependency." Once that clicked, the rest of the dependency wiring stopped feeling foreign.
sqlx::test is a quietly excellent feature — each test gets a fresh database with migrations applied. Coming from Python where I'd usually hand-roll fixtures, it felt like cheating.
thiserror and anyhow together give you most of what HTTPException plus structured logging give you in FastAPI, with sharper boundaries between library and application errors.

If you've shipped a Rust ingestion service in production and any of this looks naive, I'd genuinely like to hear about it. Repo: https://github.com/tyu1996/SPAI.