<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: eedgee</title>
    <description>The latest articles on DEV Community by eedgee (@eedgee).</description>
    <link>https://dev.to/eedgee</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3964824%2F2e516174-fe20-4e37-b37b-e7d046e5158c.png</url>
      <title>DEV Community: eedgee</title>
      <link>https://dev.to/eedgee</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/eedgee"/>
    <language>en</language>
    <item>
      <title>Why Your Backtest Is Lying to You — 3 Tests That Catch Lookahead Bias, Overfitting, and Fantasy Fills</title>
      <dc:creator>eedgee</dc:creator>
      <pubDate>Tue, 09 Jun 2026 14:06:22 +0000</pubDate>
      <link>https://dev.to/eedgee/why-your-backtest-is-lying-to-you-3-tests-that-catch-lookahead-bias-overfitting-and-fantasy-2bnc</link>
      <guid>https://dev.to/eedgee/why-your-backtest-is-lying-to-you-3-tests-that-catch-lookahead-bias-overfitting-and-fantasy-2bnc</guid>
      <description>&lt;p&gt;Almost every strategy that dies in production looked &lt;em&gt;great&lt;/em&gt; in a backtest. The backtest wasn't unlucky — it was wrong, in one of three specific, detectable ways. Here's each one, the exact test that catches it, and why your usual metrics never warn you.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Lookahead bias — the silent killer
&lt;/h2&gt;

&lt;p&gt;It's almost never a deliberate &lt;code&gt;shift(-1)&lt;/code&gt;. It hides in subtle places:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structural indicators computed over the whole series&lt;/strong&gt; — swing highs/lows, pivots, "the trend", regime labels. If the value at bar &lt;em&gt;t&lt;/em&gt; depends on bars after &lt;em&gt;t&lt;/em&gt;, every signal derived from it is contaminated.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Global-statistic normalization&lt;/strong&gt; — z-scoring with the full-sample mean/std, fitting a scaler on all data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resampling/fills that peek&lt;/strong&gt; — &lt;code&gt;ffill&lt;/code&gt; after &lt;code&gt;resample&lt;/code&gt;, using a daily close to trade the same day's open.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Label leakage in ML&lt;/strong&gt; — targets overlapping features in time; train/test folds sharing information.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why metrics don't warn you:&lt;/strong&gt; a leaking backtest produces a &lt;em&gt;beautiful&lt;/em&gt; equity curve — high Sharpe, high win rate, shallow drawdowns. Those numbers can't distinguish a real edge from a leak, because a leak makes them all &lt;em&gt;better&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The test — execution-delay scan:&lt;/strong&gt; re-run the strategy delaying execution by 0, 1, 2, 3 bars.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Clean edge:&lt;/strong&gt; Sharpe decays &lt;em&gt;gently and smoothly&lt;/em&gt; — no cliff.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lookahead:&lt;/strong&gt; Sharpe is huge at delay 0 (or the illegal delay −1) and &lt;strong&gt;falls off a cliff at delay 1&lt;/strong&gt;, often to ~0 or negative.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The smoothness &lt;em&gt;is&lt;/em&gt; the proof. A vertical drop between delay 0 and 1 is damning.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Rule of thumb: always design and report at &lt;strong&gt;delay ≥ 1&lt;/strong&gt;. If your edge needs same-bar execution, it's a leak, not an edge.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  2. Overfitting — the luckiest config, not an edge
&lt;/h2&gt;

&lt;p&gt;The more configurations you tried, the more likely the "winner" is just the luckiest draw. A Sharpe of 2.0 means something very different after 1,000 trials than after 1.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deflated Sharpe Ratio (DSR):&lt;/strong&gt; adjusts your Sharpe for how many configs you tried (plus short samples, skew, fat tails). Brutal and correct — the same track record can show DSR 0.97 as a one-shot and 0.01 once you admit it was the best of 300. Count &lt;em&gt;every&lt;/em&gt; parameter you eyeballed and discarded.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PBO via CSCV:&lt;/strong&gt; feed it the per-period returns of every config you tried (one column each). It repeatedly splits time in half, picks the in-sample winner, and checks where it ranks out-of-sample. PBO near 0.5+ means your selection is essentially picking noise.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;See Bailey &amp;amp; López de Prado on PSR/DSR, and Bailey-Borwein-López de Prado-Zhu on PBO.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  3. Fantasy fills &amp;amp; understated costs
&lt;/h2&gt;

&lt;p&gt;The most clarifying number: &lt;strong&gt;break-even cost&lt;/strong&gt; — the per-trade cost (bps) at which net Sharpe hits zero. Compare it to what you actually pay:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Break-even 102 bps vs real cost 3 bps → robust.&lt;/li&gt;
&lt;li&gt;Break-even 4 bps vs real cost 3 bps → you're trading for your broker.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;High-turnover strategies die here. Futures traders: don't let the backtest fill your roll at the stale settlement price of an illiquid expiring contract — charge a conservative roll spread and confirm fills sit on the liquid contract.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Out-of-sample discipline that works
&lt;/h2&gt;

&lt;p&gt;A single train/test split is one noisy draw. Use walk-forward: select parameters on each training window, score them on the &lt;em&gt;next, unseen&lt;/em&gt; window, stitch the OOS pieces. The number that matters is the &lt;strong&gt;IS→OOS degradation&lt;/strong&gt; — a real edge degrades a little; an overfit one collapses.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest pre-deployment checklist
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Build at execution delay ≥ 1; never report same-bar fills.&lt;/li&gt;
&lt;li&gt;Run the delay scan — no smooth decay, stop and find the leak.&lt;/li&gt;
&lt;li&gt;Count your trials; report DSR, not raw Sharpe; run PBO.&lt;/li&gt;
&lt;li&gt;Prefer a plateau parameter over the global peak.&lt;/li&gt;
&lt;li&gt;Charge real costs; confirm break-even beats them with margin.&lt;/li&gt;
&lt;li&gt;Confirm on walk-forward; report IS→OOS degradation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A backtest that passes all of these isn't guaranteed to make money. But one that fails any of them is almost guaranteed to lose it.&lt;/p&gt;




&lt;p&gt;I packaged correct, unit-tested implementations of all of these into a small numpy+pandas kit (PSR, Deflated Sharpe, PBO/CSCV, execution-delay scan, break-even cost, walk-forward) — one call to &lt;code&gt;run_full_validation()&lt;/code&gt; prints a GO / CAUTION / NO-GO verdict. It's strategy-agnostic and never sees your alpha: you pass a returns series, it returns diagnostics.&lt;/p&gt;

&lt;p&gt;If it's useful: &lt;a href="https://924499172462.gumroad.com/l/quant-validation-kit" rel="noopener noreferrer"&gt;https://924499172462.gumroad.com/l/quant-validation-kit&lt;/a&gt;&lt;br&gt;
(The methodology above is enough to self-audit; the kit just runs every test for you in one call.)&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>testing</category>
    </item>
    <item>
      <title>DIY OFAC SDN monitoring for crypto addresses — and where it silently breaks</title>
      <dc:creator>eedgee</dc:creator>
      <pubDate>Tue, 02 Jun 2026 14:33:32 +0000</pubDate>
      <link>https://dev.to/eedgee/diy-ofac-sdn-monitoring-for-crypto-addresses-and-where-it-silently-breaks-20lm</link>
      <guid>https://dev.to/eedgee/diy-ofac-sdn-monitoring-for-crypto-addresses-and-where-it-silently-breaks-20lm</guid>
      <description>&lt;p&gt;If your product touches crypto and you have any AML/sanctions obligation, sooner or later someone asks: &lt;em&gt;"How do we know if an address we interact with lands on the OFAC SDN list?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The reassuring part: the data is &lt;strong&gt;free&lt;/strong&gt;. The U.S. Treasury publishes the Specially Designated Nationals (SDN) list, including the crypto addresses tied to sanctioned entities, as public downloads. Chainalysis even gives away a free sanctions screening API and an on-chain oracle. So the instinct is: &lt;em&gt;I'll just poll it myself.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You can. It's also a deceptively deep little pipeline, and the ways it breaks are quiet — which is the dangerous kind. Here's the honest map of building it yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  The naive version
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 1. download the SDN data (XML/CSV from treasury.gov)
# 2. extract the crypto addresses (the "Digital Currency Address" fields)
# 3. compare against the set you saw last time
# 4. if a watched address newly appears (or disappears), alert someone
&lt;/span&gt;
&lt;span class="n"&gt;sdn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_sdn_list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_crypto_addresses&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sdn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;# {"XBT": {...}, "ETH": {...}, ...}
&lt;/span&gt;&lt;span class="n"&gt;added&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;last_snapshot&lt;/span&gt;
&lt;span class="n"&gt;removed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;last_snapshot&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;my_watched&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;added&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;removed&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;notify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a watched address changed on the SDN list&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ship it on a cron, done? Not quite. Here's where reality leaks in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it silently breaks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The diff is harder than &lt;code&gt;==&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Addresses don't compare cleanly across chains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ethereum&lt;/strong&gt; addresses appear in mixed case (EIP-55 checksum) in some sources and lowercase in others. &lt;code&gt;0xAbC…&lt;/code&gt; and &lt;code&gt;0xabc…&lt;/code&gt; are the same address; a naive set diff sees two. Normalize to a canonical form &lt;em&gt;per chain&lt;/em&gt; before diffing, or you'll fire false alerts and miss real ones.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bitcoin&lt;/strong&gt; is the opposite — case &lt;strong&gt;is&lt;/strong&gt; significant, and you've got legacy, P2SH, and bech32 formats for what may be related holdings.&lt;/li&gt;
&lt;li&gt;OFAC re-lists and restructures entries. An address can move between SDN entries, or an entity can be re-added under a new listing. If you key your snapshot on the &lt;em&gt;entry&lt;/em&gt; instead of the &lt;em&gt;normalized address&lt;/em&gt;, a reshuffle looks like a churn of adds/removes that aren't real.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Delivery is the actual hard part
&lt;/h3&gt;

&lt;p&gt;Detecting the change is maybe 30% of the work. &lt;em&gt;Reliably telling someone&lt;/em&gt; is the other 70%:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A webhook that fails once and isn't retried is a &lt;strong&gt;silent miss&lt;/strong&gt;. Your endpoint was redeploying for 90 seconds; the one alert that mattered fell on the floor.&lt;/li&gt;
&lt;li&gt;No &lt;strong&gt;delivery log&lt;/strong&gt; means you can't answer "were we notified?" — which is exactly the question an examiner or your own incident review will ask.&lt;/li&gt;
&lt;li&gt;Unsigned webhooks mean the receiver can't trust the payload. You want &lt;strong&gt;HMAC-SHA256&lt;/strong&gt; signatures so the other side can verify it's really you.&lt;/li&gt;
&lt;li&gt;The moment you add email and Telegram as channels, each has its own failure modes (bounces, rate limits, bot token expiry) and you're now running three delivery systems.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. The watcher dies and nobody watches the watcher
&lt;/h3&gt;

&lt;p&gt;This is the one that actually bites people. Cron jobs fail silently. Treasury tweaks the XML schema and your parser throws — but only in the logs nobody reads. The poller has been dead for three weeks and everything &lt;em&gt;looks&lt;/em&gt; fine because no news looks identical to good news. You need a &lt;strong&gt;dead-man's switch&lt;/strong&gt;: something that alarms when the pipeline &lt;em&gt;stops&lt;/em&gt; producing, not just when it finds a change.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Freshness vs. politeness
&lt;/h3&gt;

&lt;p&gt;How often do you poll? Too rare and you're stale when it counts; too aggressive and you're hammering a government endpoint. You'll want conditional requests (ETag / If-Modified-Since), sane backoff, and a defensible "we re-check every N" story you can put in front of an auditor.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "done right" actually requires
&lt;/h2&gt;

&lt;p&gt;If you build it yourself, get these four things right or don't bother:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Idempotent diffing on normalized, per-chain canonical addresses&lt;/strong&gt; — not raw string equality, not entry-keyed snapshots.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Signed webhooks + retries with backoff&lt;/strong&gt;, plus email/Telegram fan-out that degrades gracefully.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A delivery-status history&lt;/strong&gt; you can point at to prove every detected change was actually dispatched.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A dead-man's switch on the pipeline itself&lt;/strong&gt;, because silence is the failure you won't notice.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of this is exotic. It's just &lt;em&gt;boring, and easy to get 80%-right in a way that fails exactly when it matters.&lt;/em&gt; That gap — between "it runs" and "I'd stake an audit on it" — is the whole job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Or don't build it
&lt;/h2&gt;

&lt;p&gt;I got tired of watching every crypto team rebuild this same plumbing, so I packaged the boring layer as &lt;a href="https://ofacalert.com" rel="noopener noreferrer"&gt;OFAC Alert&lt;/a&gt;: hourly-refreshed SDN data, normalized cross-chain diffing, HMAC-signed webhooks with retries, delivery history, batch screening, and a REST API (&lt;a href="https://api.ofacalert.com" rel="noopener noreferrer"&gt;live docs&lt;/a&gt;). If the piece you actually want is "tell me the moment a watched address changes," that's exactly what its &lt;a href="https://ofacalert.com/ofac-sdn-change-alerts" rel="noopener noreferrer"&gt;OFAC SDN change alerts&lt;/a&gt; do. The free tier monitors one address with no signup gate, so you can see the shape of it.&lt;/p&gt;

&lt;p&gt;To be clear about scope: it is &lt;strong&gt;not&lt;/strong&gt; a Chainalysis/TRM/Elliptic replacement — no risk scoring, no clustering, no enterprise contract. It's the monitoring-and-delivery layer for the free sanctions data, built once so it's reliable and not your problem.&lt;/p&gt;

&lt;p&gt;But honestly — whether you use it or roll your own — get those four things right. The data being free is the easy part. Staking your compliance posture on a cron job is the part that keeps people up at night.&lt;/p&gt;

</description>
      <category>cryptocurrency</category>
      <category>compliance</category>
      <category>webdev</category>
      <category>api</category>
    </item>
  </channel>
</rss>
