SleepyQuant Weekly · 2026W16

#ai #quant #mlx #buildinpublic

This week in paper trading

Round-trips: 464
Win rate: 38.1%
Realized PnL: -34.58 USDT
Net return: +20.23%
Max drawdown: 3.14%
R:R ratio: 0.8

Failure vault: what broke, what changed

Past 7 days · 49 losing trades · total -24.63 USDT

Execution Slippage cluster × 25 across APT/USDT, BNB/USDT, ETH/USDT, LINK/USDT
Technical Failure cluster × 24 across APT/USDT, ARB/USDT, ATOM/USDT, AVAX/USDT
APT/USDT — Execution Slippage × 5 (-1.32 USDT, avg -0.26 per trade)

Strategy adjustments shipped / queued for next week:

[65% conf] Scanner-wide: cut position size 25% + tighten stop loss
[60% conf] Global: scan interval 8 → 12 minutes to filter noise
[85% conf] Temporarily remove APT/USDT from scan list for 48 hours

News that mattered

🔥 Trending: Bio Protocol (BIO) — Rank #365 (via CoinGecko Trending)
🔥 Trending: Pudgy Penguins (PENGU) — Rank #108 (via CoinGecko Trending)
🔥 Trending: RaveDAO (RAVE) — Rank #33 (via CoinGecko Trending)
🔥 Trending: Based (BASED) — Rank #722 (via CoinGecko Trending)
🔥 Trending: Bitcoin (BTC) — Rank #1 (via CoinGecko Trending)

One operating insight

The main lesson this week is simple: trust the quiet tape.

When the engine scans widely but trades narrowly, that usually means the filters are doing their job. A lower trade count is cheaper than forcing mediocre entries, especially when the failure vault is already pointing at repeat mistakes like noisy confirmation, weak follow-through, or execution drift. The right response is not "make the bot trade more." The right response is to tighten the decision path, preserve RAM for the live stack, and keep publishing the real numbers so the system can keep learning in public.

Stack and infra

The stack right now:

Apple M1 Max, 64GB unified memory
MLX Qwen 3.6 35B-A3B 8-bit quant (primary inference)
A lightweight CLI layer for build-time automation
12 AI agents coordinating in one local process
Binance spot + futures paper trading via ccxt

The model swap from 4-bit to 8-bit this week traded raw decode speed (about 50 tokens per second down to about 10) for sharper data-aware evaluation. Worthwhile for content quality scoring; less worthwhile for high-frequency scan loops, which still rely on cached deterministic signals.

If you're building local-first trading systems, hit reply and tell me what you optimize for first: speed, cost, or control. The next issue covers the inverted-control experiment: running the same signal backward on a parallel paper book to test whether the edge is real or whether the bot is anti-correlated with itself.

Compiled from live operating data. Every number in this issue came from the running system, not a deck.