1kto1m

Posted on Jun 22

How I Measure Strategy Performance in My Polymarket Trading Bot

#polymarket #programming #analytics #trading

After 11,717 trades across five months, I realized I had been measuring the wrong things.

Win rate looked good. P&L was positive. The bot was running 24/7 without crashing.

But I had no idea if the edge was still there - or slowly disappearing.

This post is about the metrics that actually matter, what I was tracking wrong, and how I rebuilt the analytics layer to detect edge decay before it hurt the account.

For more detail about the bot's strategy and architecture, see the earlier articles in this series:

The metric I trusted too early

When the bot started making money, I watched one number: total P&L.

That was a mistake.

P&L tells you what happened. It does not tell you why, or whether it will keep happening.

A bot can show positive P&L while the underlying edge is already gone — if it is running on momentum from earlier trades, slowly draining a reserve, or just getting lucky across a short window.

I needed a different layer of measurement entirely.

1. Break-even win rate, not raw win rate

The first thing I stopped reporting was raw win rate in isolation.

A 75% win rate sounds good. But if the average entry price is 75¢, that 75% win rate is exactly break-even. There is no profit. There is no edge.

Win rate only means something relative to average entry price.

What I track now:

break_even_rate = average_entry_price
actual_win_rate = wins / total_trades
edge = actual_win_rate - break_even_rate

If edge is positive, the strategy is working.
If edge drifts toward zero, the strategy is failing — even if raw win rate looks fine.

Key insight:
Win rate without entry price is a vanity metric.

2. Expected value per trade

Raw P&L hides trade quality behind trade volume.

A bot making $0.001 per trade on 11,000 trades looks the same as a bot making $0.05 per trade on 220 trades when you just look at total P&L.

They are completely different strategies.

What I calculate:

ev_per_trade = total_pnl / total_trades

For my bot across 11,717 trades:

$292 / 11,717 = $0.0249 per trade

That is the number I watch. Not total P&L.

If EV per trade starts dropping week over week while trade volume holds constant, something is wrong - either entry quality is slipping or the market is repricing faster than the signal.

Key insight:
EV per trade is the cleanest single measure of strategy health.

3. Rolling edge window

Total EV per trade is useful but slow to signal problems.

If the edge disappears in week 18, averaging across all 18 weeks buries the signal.

I added a rolling 7-day EV window alongside the all-time figure.

What I watch:

ev_7d = pnl_last_7_days / trades_last_7_days
ev_30d = pnl_last_30_days / trades_last_30_days
ev_all = total_pnl / total_trades

When ev_7d drops significantly below ev_30d, that is an early warning sign.

When ev_7d goes negative while ev_30d is still positive, the bot pauses automatically for review.

Key insight:
Short-window EV detects edge decay weeks before it shows up in total P&L.

4. Win rate by entry price bucket

Not all entries are equal.

A trade entered at 70¢ has a different break-even line than one entered at 83¢. Mixing them into a single win rate hides which part of the strategy is working and which is not.

I split trades into price buckets and track win rate independently for each.

Buckets:

70–74¢ → break-even at 70%+
75–79¢ → break-even at 75%+
80–83¢ → break-even at 80%+

If win rate in the 80–83¢ bucket drops below 80%, that bucket is losing money - even if overall win rate looks fine because the 70-74¢ bucket is carrying it.

This is where most of my signal quality problems first appeared.

Key insight:
Aggregate win rate hides which entry prices are actually profitable.

5. Win rate by coin pair

XRP accounts for about 44% of open positions. BNB is another 19%.

If XRP momentum signals deteriorate - for example, because more competition enters that specific market - the overall win rate drops, but the cause is invisible unless you track pairs separately.

What I track:

win_rate_XRP, ev_XRP
win_rate_BNB, ev_BNB
win_rate_ETH, ev_ETH
win_rate_SOL, ev_SOL
win_rate_BTC, ev_BTC
win_rate_DOGE, ev_DOGE

When one pair starts underperforming, I can reduce allocation to that pair without shutting down the whole bot.

Key insight:
Pair-level analytics let you tune allocation without full strategy changes.

6. Fee drag tracking

My strategy only works because I use limit orders to earn maker rebates instead of paying taker fees.

That assumption is easy to forget to verify.

What I log per trade:

gross_pnl = raw outcome
fee = maker_rebate or taker_fee (signed)
net_pnl = gross_pnl + fee

I run a weekly check on sum(fee) across all trades. If taker fees start appearing consistently, something in the execution layer is wrong — the bot is falling back to market orders somewhere.

At 78 trades per day, a switch from maker rebate to taker fee can flip the strategy from profitable to negative without any change in win rate.

Key insight:
Fee drag is invisible in gross P&L but can eliminate the entire edge.

7. Stuck position tracking

Some positions from May still had not resolved by mid-June.

Those are not losses. But they are not counted in win rate either.

If I ignore them, my win rate calculation becomes optimistic — it only counts resolved trades, which skews toward the cleaner outcomes.

What I track:

open_duration = now - trade_timestamp (for unresolved positions)
positions_stuck_over_48h = count where open_duration > 48h
capital_locked = sum(cost) where open_duration > 48h

When capital_locked climbs above a threshold, it flags for manual review. Stuck positions are an operational risk that does not appear in any P&L figure.

Key insight:
Capital locked in stuck positions is a real cost that P&L ignores entirely.

8. The dashboard I actually use

Every morning I look at five numbers, in this order:

ev_7d - is the short-window edge positive?
win_rate_7d vs break_even_rate - is the edge above the break-even line?
fee_drag_7d - are maker rebates holding or slipping toward taker fees?
capital_locked - how much is sitting in stuck positions?
worst_pair_ev_7d - which pair is pulling down the average?

If all five look healthy, the bot runs without intervention.

If any one of them crosses a threshold, I review before the next trading session.

That is the full picture. Not P&L. Not total win rate. Those five numbers.

What changed after building this

Before proper analytics, I was flying blind with a positive balance.

After:

I caught one pair (DOGE) underperforming for 11 days before it showed up in total P&L
I caught a one-day slip into taker fees caused by an execution bug
I identified that my 80-83¢ entries were marginally negative and tightened the entry filter

None of those would have been visible in a simple P&L chart.

Closing thought

A bot that makes money and a bot with a real edge are not the same thing.

The difference shows up in the metrics you choose to look at.

Next in the series: how I handle the Chainlink oracle gap - the small but meaningful difference between Binance spot price and the oracle price that actually settles each market.

The full bot code is on GitHub: github.com/Duclos76/confidence-surfing-bot

Top comments (3)

Hiren Kava • Jun 22

Excellent breakdown—especially the distinction between raw win rate and win rate relative to entry price. The rolling EV, price-bucket analysis, and explicit fee tracking turn the dashboard from a reporting tool into an actual risk-control system. I would also consider tracking EV relative to capital deployed, since EV per trade can hide differences in position size and capital duration.

Tom • Jun 22

Brilliant post. The distinction between raw win rate and break-even win rate relative to entry price is something 90% of retail algo traders completely miss.

The fee_drag metric tracking is especially interesting. A fallback to market orders silently destroying an edge is a nightmare scenario.

How do you currently visualize this on your morning dashboard? Are you just printing logs/tables, or plotting the EV rolling windows on a timeline? We've been working on an open-source Canvas charting library (Exeria Charts) specifically because rendering custom time-series data like "Rolling EV vs Price" usually chokes standard charting tools.

Really appreciate the transparency in this breakdown!

1kto1m • Jun 22

visualized trading analytics chart and EV rolling window