The Bug That Cost Me Three Weeks: Why Your SL/TP Logic Is Probably Wrong

#algorithmictrading #rust #trading #engineering

This is the story of a production bug I fixed, turned into a book. It's also why most algorithmic traders fail.

Every algorithmic trader thinks they understand stop-loss and take-profit (SL/TP). Most are wrong. Not subtly wrong — catastrophically wrong in ways that don't show up in backtesting but destroy live systems.

This is the opening chapter of my second book, and it's the reason I wrote the whole series.

The Naive Implementation

Here's what the original code looked like in our production system:

struct Position {
    entry_price: Decimal,
    sl_price: Decimal,
    tp_price: Decimal,
    sl_triggered: bool,
    tp_triggered: bool,
}

fn check_sl_tp(position: &mut Position, current_price: Decimal) {
    if current_price <= position.sl_price && !position.sl_triggered {
        position.sl_triggered = true;
        close_position(position);
    }

    if current_price >= position.tp_price && !position.tp_triggered {
        position.tp_triggered = true;
        close_position(position);
    }
}

Looks reasonable, right? Check price against SL/TP levels, set a flag, close if triggered.

It failed in production. Here's why.

Failure Mode 1: Flag-Based Checking Doesn't Track What Actually Happened

The problem with sl_triggered: bool is that it tells you that something happened, but not what actually happened.

Consider this sequence in a fast-moving market:

T=0:    Price at $105.00, position long with SL at $100.00
T=1:    Price drops to $99.95 (below SL!)
T=2:    Your check runs, sets sl_triggered = true
T=3:    Your system submits close order at $99.95
T=4:    Price bounces back to $100.50
T=5:    Exchange confirms: fill at $99.95

Your code set sl_triggered = true when price crossed $100.00. The exchange filled you at $99.95. Your flag doesn't tell you what fill price you actually got.

More critically: In step T=4, before the exchange confirmed the fill, your code thought "SL triggered, position closed." But the position wasn't actually closed yet — it was in flight.

This is the state vs event confusion. Your flag tracks an event (trigger), not a state (position actually closed).

Failure Mode 2: Re-Entry on the Next Tick

Here's where it gets really bad. After the SL triggers:

T=10:   Price moves back up to $102.00
T=11:   Your strategy sees "price is $102, no open position"
T=12:   Strategy decides to re-enter long
T=13:   New position opened at $102.00
T=14:   Price drops again to $99.90
T=15:   Another SL triggered, another loss

Your system has no memory that the previous close was an SL close. It just sees "no position, price looks good, buy."

This is the re-entry problem — and it's more expensive than the original loss.

The Correct Mental Model

SL/TP should be state-based, not event-based. Instead of "did we trigger?" think "what should we do given the current state and price?"

enum PositionState {
    Open,           // Position is active, checks are running
    ClosePending,   // Close order submitted, waiting for fill
    Closed,         // Position fully exited, no more checks
}

struct Position {
    entry_price: Decimal,
    sl_price: Decimal,
    tp_price: Decimal,
    state: PositionState,

    // Track the close order, not just the trigger
    close_order_id: Option<u64>,
    close_trigger_price: Option<Decimal>,
    opened_at_ns: u64,
    close_submitted_at_ns: Option<u64>,
    closed_at_ns: Option<u64>,
}

Key difference: The close_order_id field tracks the actual close order, not just a trigger flag. If you have a close order ID, the position is in ClosePending state. If it's filled, the state transitions to Closed.

enum SLTPAction {
    Nothing,
    TriggerSL,
    TriggerTP,
}

fn check_sl_tp(position: &Position, current_price: Decimal) -> SLTPAction {
    if position.state != PositionState::Open {
        return SLTPAction::Nothing;
    }

    if current_price <= position.sl_price {
        SLTPAction::TriggerSL
    } else if current_price >= position.tp_price {
        SLTPAction::TriggerTP
    } else {
        SLTPAction::Nothing
    }
}

fn on_action(position: &mut Position, action: SLTPAction, current_price: Decimal) -> Result<(), Error> {
    match action {
        SLTPAction::Nothing => Ok(()),
        SLTPAction::TriggerSL | SLTPAction::TriggerTP => {
            let order_id = submit_close_order(position, current_price)?;
            position.state = PositionState::ClosePending;
            position.close_order_id = Some(order_id);
            position.close_trigger_price = Some(current_price);
            position.close_submitted_at_ns = Some(current_timestamp_ns());
            Ok(())
        }
    }
}

Why Backtesting Misses This

In backtesting, prices are usually bar-based (OHLC). The SL/TP check happens once per bar at the close. In live trading, you're checking every tick. A tick-based system might check SL/TP 100 times per second.

The bug manifests in live trading because:

Price crosses SL
Your check runs, returns TriggerSL
You submit close order
Meanwhile, price bounces back above SL
Your check runs again, sees price above SL, does nothing
But your close order is still pending...

The flag-based approach doesn't know that a close order is already in flight. It sees price above SL and would try to trade again.

The Actual Production Failure

Our original system had this flow:

1. Position opened at $100.00, SL = $98.00, TP = $102.00
2. Price drops to $97.50
3. check_sl_tp() sets sl_triggered = true
4. close_position() called
5. Position state set to "closing" (but not "closed")
6. Order submitted to exchange
7. Network latency = 50ms
8. Price bounces back to $99.00
9. check_sl_tp() runs again — price above SL, does nothing
10. Strategy continues to next tick
11. Exchange confirms fill at $97.50
12. Position is now closed

All good so far. But then:

13. Next tick arrives
14. Strategy sees: "no open position, price is $99, this looks like a buy"
15. New position opened at $99.00
16. Price drops again to $97.50
17. Another SL triggered

The problem: Steps 13-15 happened while the close order was still in flight. The strategy saw "no position" because the position was in "closing" state but hadn't confirmed "closed" yet.

We fixed this by adding close order tracking — the system now knows that a close is pending and doesn't allow new positions until the close is confirmed.

What I Learned

Two things:

1. State machines over flags. Every position should follow a clear state machine: Open → ClosePending → Closed. Transitions happen on confirmed events, not on trigger signals.

2. Backtesting lies to you. The bug never appeared in backtesting because we checked once per bar. In live trading, the race condition happens between ticks. Your backtest looks perfect. Your live account doesn't.

This is Chapter 1 of "The Circuit Breaker Problem" — one of five books in the Trading System Engineering Bundle. All written by an engineer who actually built a production trading engine from scratch. Code templates included.

What's in the bundle:

Order Engine Architecture — FIFO matching, order book data structures
The Circuit Breaker Problem — SL/TP bugs, trailing stops, re-entry prevention
Data Pipeline — TVC3 binary format, ring buffers, VPIN
Risk Management Engineering — commission handling, Kelly criterion, drawdown
Backtest Architecture — look-ahead bias, slippage modeling, walk-forward analysis

$20 per book, $80 for the bundle. Free preview: Book 2, Chapter 1.