DEV Community

ohmygod
ohmygod

Posted on

The Firedancer Security Checklist: 7 DeFi Assumptions That Break in Solana's Multi-Client Era — And the Defense Patterns to Fix Each One

Solana's single-client era is over. Firedancer — Jump Crypto's ground-up C/C++ validator rewrite — is live on mainnet, and by mid-2026, a meaningful share of stake is expected to run it alongside Agave (the original Rust client) and Jito-Solana.

This is the most significant architectural shift in Solana's history. Client diversity is a net positive for network resilience (Ethereum's post-Merge experience proves it), but it also invalidates assumptions that DeFi protocols have baked into their smart contracts since day one.

I've audited Solana programs that would silently break under multi-client conditions. Here are the seven assumptions that need to die — and the concrete defense patterns to replace them.


1. "400ms Slots Are Guaranteed"

The old assumption: Every slot takes ~400ms. Liquidation bots, oracle updates, and time-sensitive instructions can rely on this cadence.

Why it breaks: Firedancer leaders can produce blocks faster than Agave validators can verify them. When a high-performance Firedancer leader fills a block to capacity, slower validators may skip votes — stretching effective finality beyond 400ms for that slot.

The defense pattern:

// Anchor: Never assume wall-clock time from slot math
use anchor_lang::prelude::*;

#[account]
pub struct TimeGuard {
    pub last_oracle_slot: u64,
    pub last_oracle_unix: i64,
}

pub fn validate_oracle_freshness(
    guard: &TimeGuard,
    clock: &Clock,
    max_slot_drift: u64,   // e.g., 5 slots
    max_time_drift: i64,   // e.g., 10 seconds
) -> Result<()> {
    // Check BOTH slot distance AND unix timestamp
    let slot_drift = clock.slot.saturating_sub(guard.last_oracle_slot);
    let time_drift = clock.unix_timestamp.saturating_sub(guard.last_oracle_unix);

    require!(
        slot_drift <= max_slot_drift && time_drift <= max_time_drift,
        ErrorCode::StaleOracle
    );
    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

Key insight: Use Clock::unix_timestamp as the primary time reference and slot numbers only as a secondary sanity check. Never convert slots to seconds with a hardcoded multiplier.


2. "Transaction Ordering Is Deterministic Within a Block"

The old assumption: Transactions in a block execute in a predictable, sequential order.

Why it breaks: Firedancer's transaction scheduler uses a fundamentally different algorithm than Agave's. While both produce valid blocks, the ordering of non-conflicting transactions can differ between clients. This matters for protocols that implicitly depend on execution order — particularly those using priority fees as a sequencing mechanism.

The defense pattern:

// Make your program order-independent with explicit sequence numbers
#[account]
pub struct OrderedAction {
    pub sequence: u64,
    pub expected_state_hash: [u8; 32], // hash of expected pre-state
}

pub fn execute_ordered(
    ctx: Context<ExecuteOrdered>,
    action: OrderedAction,
) -> Result<()> {
    let state = &ctx.accounts.protocol_state;

    // Verify we're operating on the expected state
    let current_hash = hash_state(state);
    require!(
        action.expected_state_hash == current_hash,
        ErrorCode::StateMismatch
    );

    // Process with explicit ordering
    require!(
        action.sequence == state.next_sequence,
        ErrorCode::SequenceError
    );

    // ... execute logic ...
    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

Key insight: If your protocol's correctness depends on transaction ordering, you have a latent bug. Make state transitions explicitly sequenced or provably order-independent.


3. "Compute Unit Costs Are Stable"

The old assumption: A given instruction costs roughly the same CUs across validator versions.

Why it breaks: Firedancer reimplements the BPF/SBF runtime. Identical programs can consume slightly different CU counts on Firedancer vs. Agave due to implementation differences in the virtual machine. Programs that calculate fees or refunds based on metered CU consumption may produce inconsistent results.

The defense pattern:

// Don't calculate fees from metered CUs
// Instead, use fixed fee schedules per operation type

pub const FEE_TABLE: &[(OperationType, u64)] = &[
    (OperationType::Swap, 5000),        // lamports
    (OperationType::Deposit, 3000),
    (OperationType::Withdraw, 4000),
    (OperationType::Liquidate, 8000),
];

pub fn calculate_fee(op: OperationType) -> u64 {
    FEE_TABLE.iter()
        .find(|(t, _)| *t == op)
        .map(|(_, fee)| *fee)
        .unwrap_or(5000) // safe default
}
Enter fullscreen mode Exit fullscreen mode

Key insight: Treat CU metering as a resource limit, not a pricing mechanism. Fixed fee schedules are deterministic regardless of which client processes the transaction.


4. "My RPC Node's View Is the Network's View"

The old assumption: simulateTransaction on your RPC node gives you an accurate preview of what will happen on-chain.

Why it breaks: If your RPC runs Agave but the current leader runs Firedancer (or vice versa), simulation results can diverge from actual execution — particularly for edge cases in CPI depth handling, account data serialization, and error codes.

The defense pattern:

# Multi-client simulation: query both client types
import httpx
import asyncio

AGAVE_RPC = \"https://your-agave-rpc.example.com\"
FIREDANCER_RPC = \"https://your-firedancer-rpc.example.com\"

async def simulate_multi_client(tx_base64: str) -> dict:
    \"\"\"Simulate on both clients and flag divergence.\"\"\"
    async with httpx.AsyncClient() as client:
        payload = {
            \"jsonrpc\": \"2.0\", \"id\": 1,
            \"method\": \"simulateTransaction\",
            \"params\": [tx_base64, {\"encoding\": \"base64\"}]
        }

        agave, firedancer = await asyncio.gather(
            client.post(AGAVE_RPC, json=payload),
            client.post(FIREDANCER_RPC, json=payload),
        )

    a_result = agave.json()[\"result\"][\"value\"]
    f_result = firedancer.json()[\"result\"][\"value\"]

    diverged = (
        a_result.get(\"err\") != f_result.get(\"err\") or
        abs(a_result[\"unitsConsumed\"] - f_result[\"unitsConsumed\"]) > 1000
    )

    if diverged:
        print(f\"WARNING: CLIENT DIVERGENCE: Agave={a_result['err']}, \"
              f\"Firedancer={f_result['err']}\")

    return {
        \"diverged\": diverged,
        \"agave\": a_result,
        \"firedancer\": f_result,
    }
Enter fullscreen mode Exit fullscreen mode

Key insight: For high-value transactions (liquidations, large swaps, governance votes), simulate against both client implementations before submitting. Treat divergence as a red flag.


5. "Finality = Optimistic Confirmation"

The old assumption: Once a transaction is \"confirmed\" (optimistic confirmation at ~5% of stake), it's safe to act on.

Why it breaks: In a multi-client network, optimistic confirmation can be misleading if client diversity is unevenly distributed. If 60% of stake runs Agave and 40% runs Firedancer, a transaction confirmed by Agave validators alone hasn't been validated by a truly independent implementation. A consensus bug in Agave could still cause a rollback.

The defense pattern:

import { Connection, Commitment } from '@solana/web3.js';

async function waitForRobustFinality(
  connection: Connection,
  signature: string,
  minConfirmations: number = 32,
): Promise<boolean> {
  // Step 1: Wait for \"finalized\" (supermajority, 2/3+ stake)
  const result = await connection.confirmTransaction(
    signature,
    'finalized' as Commitment,
  );

  if (result.value.err) {
    throw new Error(`Transaction failed: ${JSON.stringify(result.value.err)}`);
  }

  // Step 2: Verify the slot is rooted
  const status = await connection.getSignatureStatus(signature);
  if (!status.value?.confirmationStatus === 'finalized') {
    throw new Error('Transaction not yet rooted');
  }

  // Step 3: For bridges — add a slot buffer
  const txSlot = status.value.slot;
  const SAFETY_BUFFER = 10;

  while (true) {
    const currentSlot = await connection.getSlot('finalized');
    if (currentSlot >= txSlot + SAFETY_BUFFER) break;
    await new Promise(r => setTimeout(r, 400));
  }

  return true;
}
Enter fullscreen mode Exit fullscreen mode

Key insight: Bridges and cross-chain protocols should require finalized commitment level plus a slot buffer. The cost of waiting 15-20 seconds is negligible compared to the cost of a consensus-bug-induced rollback.


6. "Error Codes Are Portable"

The old assumption: Custom program error codes (and their wrapping in TransactionError) behave identically across clients.

Why it breaks: Firedancer and Agave may surface different error metadata for the same underlying failure — different log formatting, different wrapping of CPI errors, and potentially different handling of instruction-level error propagation. Bot code that pattern-matches on error strings can break silently.

The defense pattern:

// Define error codes as explicit, documented constants
#[error_code]
pub enum ProtocolError {
    #[msg(\"Oracle price is stale\")]
    StaleOracle = 6000,
    #[msg(\"Insufficient collateral\")]
    Undercollateralized = 6001,
    #[msg(\"Rate limit exceeded\")]
    RateLimited = 6002,
    #[msg(\"Sequence mismatch\")]
    SequenceMismatch = 6003,
}
Enter fullscreen mode Exit fullscreen mode
// Bot-side: parse error codes numerically
function parseProtocolError(err: any): number | null {
  const match = JSON.stringify(err).match(/\"Custom\":(\\d+)/);
  return match ? parseInt(match[1]) : null;
}
Enter fullscreen mode Exit fullscreen mode

Key insight: Numeric error codes from your program are portable. Everything else — log messages, error wrapping, instruction trace formatting — is client-implementation-specific.


7. "Account Data Layout Is Just My Concern"

The old assumption: As long as my program serializes/deserializes correctly, account data layout doesn't matter for security.

Why it breaks: Firedancer's account storage and snapshot system is independent from Agave's. During network upgrades, validator restarts, or snapshot loading, subtle differences in how each client handles account data boundaries, padding, and rent-exempt minimums can surface.

The defense pattern:

// Always use discriminators — Anchor does this automatically
// For raw programs, implement it manually:

pub const ACCOUNT_DISCRIMINATOR: [u8; 8] = [0x01, 0x02, 0x03, 0x04,
                                              0x05, 0x06, 0x07, 0x08];

pub fn deserialize_checked(data: &[u8]) -> Result<MyState> {
    require!(data.len() >= 8 + std::mem::size_of::<MyState>(),
             ErrorCode::InvalidAccountData);

    require!(data[..8] == ACCOUNT_DISCRIMINATOR,
             ErrorCode::InvalidDiscriminator);

    let state = MyState::try_from_slice(&data[8..])?;

    require!(state.version <= CURRENT_VERSION,
             ErrorCode::UnsupportedVersion);

    Ok(state)
}
Enter fullscreen mode Exit fullscreen mode

Key insight: Use Anchor's built-in discriminator system. If you're writing raw programs, implement 8-byte discriminators, version fields, and post-deserialization invariant checks.


The Multi-Client Security Audit Checklist

Before Firedancer adoption reaches critical mass, audit your programs against this checklist:

# Check Risk Level
1 No hardcoded slot-to-time conversions Critical
2 State transitions are order-independent or explicitly sequenced Critical
3 Fees/refunds don't depend on metered CU consumption High
4 Bot infrastructure simulates against both clients High
5 Bridge/cross-chain ops use finalized + slot buffer Critical
6 Error handling uses numeric codes, not string matching High
7 All accounts use discriminators and version fields High
8 Oracle freshness uses unix timestamps, not slot math Critical
9 No implicit dependency on leader client implementation High
10 Integration tests run against both Agave and Firedancer test validators Medium

Setting Up Multi-Client Testing

Here's the minimum viable CI configuration:

# .github/workflows/multi-client-test.yml
name: Multi-Client Security Tests
on: [push, pull_request]

jobs:
  test-agave:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: coral-xyz/setup-anchor@v3
        with:
          anchor-version: '0.30.1'
          solana-version: '1.18.26'
      - run: anchor test

  test-firedancer:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: coral-xyz/setup-anchor@v3
        with:
          anchor-version: '0.30.1'
      - run: |
          curl -sSf https://install.firedancer.io | sh
          fdctl test-validator &
          sleep 5
          anchor test --skip-local-validator

  compare-results:
    needs: [test-agave, test-firedancer]
    runs-on: ubuntu-latest
    steps:
      - run: echo \"Both clients passed\"
Enter fullscreen mode Exit fullscreen mode

The Bigger Picture: Lessons From Ethereum

Ethereum's multi-client journey offers clear warnings:

  • The Prysm dominance problem (2021-2022): When one client ran >66% of validators, a bug in that client could have caused irreversible finality failures.
  • The consensus bug of October 2023: Besu and Nethermind briefly diverged on state root calculations, causing temporary chain splits.

Solana is entering this same phase. The difference is that Solana's DeFi protocols are often more latency-sensitive than Ethereum's, making multi-client divergence more dangerous for real-time operations like liquidations and oracle updates.

The silver lining: Client diversity makes the network dramatically more resilient to single-implementation bugs. Every outage Solana experienced before Firedancer was fundamentally a monoculture problem. The multi-client era fixes the existential risk — but only if DeFi protocols update their assumptions to match.


TL;DR

Firedancer is great for Solana. But it invalidates seven assumptions that most DeFi programs silently depend on:

  1. Slot timing — use unix timestamps, not slot math
  2. Transaction ordering — make logic order-independent
  3. CU costs — use fixed fee schedules
  4. RPC accuracy — simulate against both clients
  5. Finality — use finalized + buffer for high-value ops
  6. Error codes — match numerically, not by string
  7. Data layout — use discriminators and versioning

The protocols that audit for multi-client correctness now will survive the transition. The ones that don't will discover these bugs the expensive way.


This is part of my DeFi Security Research series. I audit smart contracts and build security tooling. If your protocol hasn't been audited for multi-client compatibility, now is the time.

Top comments (0)