<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sean  |   Mnemox</title>
    <description>The latest articles on DEV Community by Sean  |   Mnemox (@mnemox).</description>
    <link>https://dev.to/mnemox</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3786702%2Fbf4cad1f-0803-4d73-8e52-a2549df97374.png</url>
      <title>DEV Community: Sean  |   Mnemox</title>
      <link>https://dev.to/mnemox</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mnemox"/>
    <language>en</language>
    <item>
      <title>I Let AI Invent Its Own Trading Strategies From Scratch. Here's What Happened.</title>
      <dc:creator>Sean  |   Mnemox</dc:creator>
      <pubDate>Mon, 16 Mar 2026 17:31:53 +0000</pubDate>
      <link>https://dev.to/mnemox/i-let-ai-invent-its-own-trading-strategies-from-scratch-heres-what-happened-1e2g</link>
      <guid>https://dev.to/mnemox/i-let-ai-invent-its-own-trading-strategies-from-scratch-heres-what-happened-1e2g</guid>
      <description>&lt;p&gt;&lt;em&gt;By Sean, CEO of Mnemox AI | March 2026&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Every AI trading bot has the same fatal flaw: amnesia.&lt;/p&gt;

&lt;p&gt;There are 200+ trading MCP servers on GitHub right now. They can execute trades, pull market data, calculate indicators. But not a single one remembers what happened yesterday. Every session starts from zero. Every mistake gets repeated. Every lesson gets lost.&lt;/p&gt;

&lt;p&gt;I spent two days running an experiment to fix this — and ended up discovering something I didn't expect at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Deeper Question
&lt;/h2&gt;

&lt;p&gt;The memory problem is real, but it's actually the &lt;em&gt;second&lt;/em&gt; problem. The first one is more fundamental: why are we teaching AI how to trade at all?&lt;/p&gt;

&lt;p&gt;Think about it. Every trading bot — from simple moving average crossovers to sophisticated ML systems — starts with a human saying "here's a strategy, go execute it." The human does the thinking. The AI does the labor. And when the strategy stops working (which it always does), the human has to go back, analyze what went wrong, redesign the strategy, and re-deploy.&lt;/p&gt;

&lt;p&gt;What if we skipped the human part entirely?&lt;/p&gt;

&lt;p&gt;Not "use machine learning to optimize parameters." I mean: give AI raw price data, give it persistent memory, give it &lt;em&gt;no&lt;/em&gt; strategies whatsoever, and see if it can invent its own from scratch.&lt;/p&gt;

&lt;p&gt;The idea isn't new. Google's AlphaEvolve uses evolutionary algorithms to discover novel solutions. The Ouroboros paper explored self-modifying agents. AZR (Absolute Zero Reasoner) showed that AI can bootstrap its own training data. DGM proposed Darwinian selection for agent populations. But nobody had applied this loop — observe, hypothesize, test, eliminate, evolve — to trading with persistent memory across sessions.&lt;/p&gt;

&lt;p&gt;My hypothesis: &lt;strong&gt;an AI with memory and the freedom to fail will converge on real market structure faster than any hand-coded strategy.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The $0 Experiment
&lt;/h2&gt;

&lt;p&gt;I started with the cheapest possible test — no trading capital, just API calls. Three months of BTC/USDT hourly candles (2,184 bars, December 2025 to March 2026). A bear market — BTC dropped 16% during this period.&lt;/p&gt;

&lt;p&gt;I fed this raw data to Claude with a single instruction: &lt;em&gt;"You don't know any technical indicators. Describe what you see in your own words."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;No RSI. No MACD. No Bollinger Bands. Just price, volume, open, high, low, close.&lt;/p&gt;

&lt;p&gt;It came back with seven patterns, each with its own name:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Breathing&lt;/strong&gt; (呼吸) — periodic expansion/contraction cycles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Giant Wave&lt;/strong&gt; (巨浪) — outsized candles that appear at turning points&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Staircase&lt;/strong&gt; (階梯) — sequential directional moves&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fake Door&lt;/strong&gt; (假門) — false breakouts that reverse&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exhaustion&lt;/strong&gt; (枯竭) — declining momentum at trend ends&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tide&lt;/strong&gt; (潮汐) — time-of-day price flow patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Echo&lt;/strong&gt; (回聲) — price returning to prior levels&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What made this interesting wasn't the patterns themselves — experienced traders would recognize most of these. What was interesting was what the AI did next: it scored each pattern for tradability and killed the weak ones. Staircase got 3/10. Fake Door got 4/10. Gone.&lt;/p&gt;

&lt;p&gt;Nobody told it to do this. The prompt didn't mention anything about scoring or elimination. It just... decided some patterns weren't worth pursuing.&lt;/p&gt;

&lt;p&gt;Then it combined the surviving patterns into a trading strategy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Round 1: Failure
&lt;/h2&gt;

&lt;p&gt;The AI's first strategy was called "Giant Wave Reversal" (巨浪逆行): when an abnormally large candle appears, trade in the opposite direction.&lt;/p&gt;

&lt;p&gt;Intuitively, this makes sense. After a big move, you'd expect a pullback. Hundreds of retail traders trade this exact pattern.&lt;/p&gt;

&lt;p&gt;The backtest results:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Trades&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Win Rate&lt;/td&gt;
&lt;td&gt;30.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sharpe Ratio&lt;/td&gt;
&lt;td&gt;-1.20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Return&lt;/td&gt;
&lt;td&gt;-0.21%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Terrible. The strategy lost money.&lt;/p&gt;

&lt;p&gt;But here's what matters: the system didn't just fail — it &lt;em&gt;analyzed&lt;/em&gt; why it failed. Three specific causes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Momentum continuation&lt;/strong&gt; — big candles often signal the &lt;em&gt;start&lt;/em&gt; of a trend, not the end&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stop loss structure&lt;/strong&gt; — fixed-point stops were too tight for the volatility&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Counter-trend bias&lt;/strong&gt; — fighting the trend is statistically unfavorable&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No human provided this analysis. The AI looked at its own results, examined the losing trades, and identified structural flaws.&lt;/p&gt;

&lt;h2&gt;
  
  
  Round 2: Evolution
&lt;/h2&gt;

&lt;p&gt;I fed the failure analysis back into the system with the same raw data. "You tried counter-trend. It failed for these reasons. Look at the data again."&lt;/p&gt;

&lt;p&gt;This time, three candidate strategies emerged:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Trades&lt;/th&gt;
&lt;th&gt;Win Rate&lt;/th&gt;
&lt;th&gt;Sharpe&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A: Ceiling Rejection&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;50%&lt;/td&gt;
&lt;td&gt;0.74&lt;/td&gt;
&lt;td&gt;Sample too small&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B: Trend Momentum&lt;/td&gt;
&lt;td&gt;67&lt;/td&gt;
&lt;td&gt;35.8%&lt;/td&gt;
&lt;td&gt;-1.40&lt;/td&gt;
&lt;td&gt;Eliminated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;C: US Session Drain&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;21&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;47.6%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.90&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Survived&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Strategy C — which the AI named "美盤洩洪" (US Session Drain) — was a breakthrough. The rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Entry&lt;/strong&gt;: 16:00 UTC, when the 12-hour trend is down → go short&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exit&lt;/strong&gt;: Take profit at +0.5%, stop loss at -0.25%, max hold 6 hours&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Risk/Reward&lt;/strong&gt;: 2:1&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sharpe went from -1.20 to 1.90 in a single evolutionary cycle.&lt;/p&gt;

&lt;p&gt;But any quant will tell you: in-sample results mean nothing. You can curve-fit garbage to look profitable on historical data. The real test is out-of-sample.&lt;/p&gt;

&lt;h3&gt;
  
  
  Out-of-Sample Validation
&lt;/h3&gt;

&lt;p&gt;I ran Strategy C on a completely different 3-month period (August to November 2025) that the AI had never seen:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;In-Sample&lt;/th&gt;
&lt;th&gt;Out-of-Sample&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Trades&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Win Rate&lt;/td&gt;
&lt;td&gt;47.6%&lt;/td&gt;
&lt;td&gt;59.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sharpe&lt;/td&gt;
&lt;td&gt;1.90&lt;/td&gt;
&lt;td&gt;4.09&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Profit Factor&lt;/td&gt;
&lt;td&gt;1.53&lt;/td&gt;
&lt;td&gt;2.25&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The out-of-sample results were &lt;em&gt;better&lt;/em&gt; than in-sample. Every metric improved. This is the opposite of overfitting — it suggests the strategy captured a genuine market structure, not noise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Can It Work in Bull Markets Too?
&lt;/h2&gt;

&lt;p&gt;One strategy in one market regime proves nothing. So I ran the same process on bull market data: BTC going from $60K to $105K over four months (October 2024 to January 2025).&lt;/p&gt;

&lt;p&gt;Same rules: raw data, no indicators, no guidance. Just "look and learn."&lt;/p&gt;

&lt;p&gt;The AI discovered different patterns this time — waterfalls, valley springs, Asian fountains. But one stood out: &lt;strong&gt;Afternoon Engine&lt;/strong&gt; (午後引擎). At 14:00 UTC, something happens. Price accumulated +14.9% at that single hour over the test period, far more than any other hour.&lt;/p&gt;

&lt;p&gt;Strategy E's rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Entry&lt;/strong&gt;: 14:00 UTC, when the 12-hour trend is up → go long&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exit&lt;/strong&gt;: TP +0.5%, SL -0.25%, max hold 6 hours&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Risk/Reward&lt;/strong&gt;: 2:1&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;First-round results: &lt;strong&gt;70 trades, 50% win rate, Sharpe 4.97.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It didn't need a second round. The bull market has stronger structural bias, so the AI hit on the first try.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Surprising Part
&lt;/h3&gt;

&lt;p&gt;I validated Strategy E on a &lt;em&gt;downtrending&lt;/em&gt; market (June to September 2024, BTC -6.2%). The 14:00 UTC hour actually &lt;em&gt;lost&lt;/em&gt; money during this period (-5.84% cumulative). The raw time-of-day edge disappeared.&lt;/p&gt;

&lt;p&gt;But Strategy E still profited: &lt;strong&gt;57 trades, 56.1% win rate, Sharpe 6.06.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Why? Because the 12-hour trend filter blocked almost all counter-trend signals. The edge isn't "trade at 14:00 UTC." The edge is "trade at 14:00 UTC &lt;em&gt;when the trend agrees&lt;/em&gt;." The trend filter is the alpha source, not the time window.&lt;/p&gt;

&lt;p&gt;(A Sharpe above 6 looks suspicious — and it should. The number is inflated by ultra-short holding periods and the 2:1 RR structure filtering out most losing scenarios. It's directionally meaningful, not a production-grade Sharpe. Take it as "this works" rather than "this is a 6-Sharpe strategy.")&lt;/p&gt;

&lt;p&gt;The AI figured this out without being told. It didn't just discover a correlation — it discovered the &lt;em&gt;mechanism&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Meta-Pattern
&lt;/h2&gt;

&lt;p&gt;Here's where it gets genuinely interesting.&lt;/p&gt;

&lt;p&gt;Strategy C and Strategy E were invented independently, from different datasets, in different market regimes (bear vs. bull). Yet they converged on the same structural template:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Time-of-day bias&lt;/strong&gt; — specific UTC hours carry persistent directional edge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trend filter&lt;/strong&gt; — 12-hour trend confirmation before entry&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Short holding period&lt;/strong&gt; — max 6 hours, in-and-out&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Asymmetric risk/reward&lt;/strong&gt; — 2:1 TP/SL guarantees positive expectancy at 50% win rate&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This meta-pattern was not programmed. It was not suggested. It emerged from two independent evolution cycles. When two completely separate experiments converge on the same solution, that's strong evidence of underlying structure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Combined System
&lt;/h2&gt;

&lt;p&gt;Running both strategies together over 22 months (June 2024 to March 2026), spanning a complete bull-to-bear cycle:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Trades&lt;/th&gt;
&lt;th&gt;Win Rate&lt;/th&gt;
&lt;th&gt;Sharpe&lt;/th&gt;
&lt;th&gt;Return&lt;/th&gt;
&lt;th&gt;Max Drawdown&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;C Only (SHORT)&lt;/td&gt;
&lt;td&gt;157&lt;/td&gt;
&lt;td&gt;42.7%&lt;/td&gt;
&lt;td&gt;0.70&lt;/td&gt;
&lt;td&gt;+0.37%&lt;/td&gt;
&lt;td&gt;0.45%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E Only (LONG)&lt;/td&gt;
&lt;td&gt;320&lt;/td&gt;
&lt;td&gt;49.4%&lt;/td&gt;
&lt;td&gt;4.10&lt;/td&gt;
&lt;td&gt;+3.65%&lt;/td&gt;
&lt;td&gt;0.27%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;C+E Combined&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;477&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;47.2%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.84&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+4.04%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.22%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Key findings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;91% of months were profitable&lt;/strong&gt; (20 out of 22)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Max drawdown 0.22%&lt;/strong&gt; — lower than either strategy alone (natural hedging)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No human-designed entry logic.&lt;/strong&gt; The AI chose which hours to trade and which direction. The framework — 2:1 RR, 6-hour max hold, ATR-based stops — was provided by the backtest engine. The &lt;em&gt;what&lt;/em&gt; and &lt;em&gt;when&lt;/em&gt; came from the AI; the &lt;em&gt;risk management structure&lt;/em&gt; came from me&lt;/li&gt;
&lt;li&gt;Strategy E is the engine (90% of profit). Strategy C is a diversifier&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The long/short combination creates a natural hedge. When the market trends up, E captures profits going long. When it trends down, C captures profits going short. Drawdown &lt;em&gt;improves&lt;/em&gt; when combined.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Experiment to Product
&lt;/h2&gt;

&lt;p&gt;The manual process — give AI data, analyze patterns, backtest, evolve — took about a day of hands-on work per strategy. Interesting as a research exercise, but not scalable.&lt;/p&gt;

&lt;p&gt;So I automated the entire loop into what I call the &lt;strong&gt;Evolution Engine&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Discover&lt;/strong&gt; — LLM analyzes raw price data, proposes candidate strategies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backtest&lt;/strong&gt; — vectorized engine tests each candidate (ATR-based stops, long/short, time-based exit)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Select&lt;/strong&gt; — in-sample ranking, then out-of-sample validation (Sharpe &amp;gt; 1.0, trades &amp;gt; 30, max DD &amp;lt; 20%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evolve&lt;/strong&gt; — survivors get mutated, failures go to the graveyard (but their lessons persist). Next generation. Repeat.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Evolution Engine runs on top of &lt;strong&gt;Outcome-Weighted Memory (OWM)&lt;/strong&gt; — a five-layer memory architecture (episodic, semantic, procedural, prospective, affective) that gives the AI persistent recall across sessions. Each memory gets scored by outcome quality, context similarity, and recency when recalled — inspired by ACT-R cognitive architecture and Kelly criterion. The details are in the repo if you're curious; the key point is that the AI doesn't just remember &lt;em&gt;what&lt;/em&gt; happened, it remembers &lt;em&gt;how relevant&lt;/em&gt; each memory is to the current situation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model Comparison
&lt;/h3&gt;

&lt;p&gt;I ran the automated pipeline with three Claude models on real Binance data:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Cost/Run&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;th&gt;Strategies Graduated&lt;/th&gt;
&lt;th&gt;Verdict&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Haiku&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.016&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;34.7s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Best so far&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sonnet&lt;/td&gt;
&lt;td&gt;$0.013&lt;/td&gt;
&lt;td&gt;51.9s&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Solid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Opus&lt;/td&gt;
&lt;td&gt;$0.013&lt;/td&gt;
&lt;td&gt;72.4s&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Slowest&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Caveat: this is a small sample — a handful of runs per model. But the early signal is counterintuitive: the cheapest, fastest model produced the most graduated strategies. My working theory is that speed and diversity matter more than depth of reasoning for creative pattern discovery. A full evolution cycle costs less than two cents.&lt;/p&gt;

&lt;p&gt;The most compelling finding: the automated pipeline independently rediscovered 16:00 UTC as a key trading hour — the same edge that the manual experiments found. Convergent validation from a completely different process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Known Bottlenecks
&lt;/h3&gt;

&lt;p&gt;The system isn't perfect. Two issues I'm actively working on:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prompt over-concretization&lt;/strong&gt; — all three models tend to lock onto very specific conditions (e.g., "hour_utc == 16 AND atr &amp;gt; 2.5"). This produces strategies that trigger too rarely for statistical significance. The graduated strategies had only 2 trades in out-of-sample, far below the 30-trade minimum for confidence.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Graveyard feedback depth&lt;/strong&gt; — eliminated strategies get stored, but the feedback loop from graveyard → next generation isn't rich enough yet. The AI knows &lt;em&gt;that&lt;/em&gt; a strategy failed, but doesn't fully leverage &lt;em&gt;why&lt;/em&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. AI doesn't need to be taught strategies. It needs memory and permission to fail.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The biggest bottleneck in AI trading isn't model capability — it's the assumption that humans must provide the strategy. Give the AI raw data and a feedback loop, and it finds structure faster than any hand-designed system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Objective feedback (P&amp;amp;L) beats prompt engineering.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I tried various prompt strategies for pattern discovery. None of them mattered as much as simply feeding back the backtest results. "$-0.21% return, Sharpe -1.20" is more useful than ten paragraphs of trading wisdom.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The speed of evolution depends on the quality of failure, not the quantity of success.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Strategy C only exists because Strategy "Giant Wave Reversal" failed spectacularly and the AI could analyze &lt;em&gt;why&lt;/em&gt;. A clean failure with clear attribution is more valuable than a marginal success.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Meta-patterns are the real prize.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Individual strategies are nice. But the discovery that two independent evolution cycles converged on the same structural template (time bias + trend filter + short hold + asymmetric RR) — that's worth more than any single strategy. It suggests a universal regularity in how markets behave.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. One person + Claude Code can go from hypothesis to working product in a day.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The entire pipeline — research, backtest, analysis, Evolution Engine code, OWM memory architecture, 1,055 tests, MCP server, open source release — was built in 48 hours by one person with an AI coding assistant. That's the part I still have trouble believing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;TradeMemory Protocol is open source. The Evolution Engine, OWM memory architecture, and all 11 experiments documented in &lt;code&gt;RESEARCH_LOG.md&lt;/code&gt; are available today.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;tradememory-protocol
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The full research log with every backtest number, every eliminated strategy, and every lesson learned is in the repo.&lt;/p&gt;

&lt;p&gt;I'm not claiming this is a finished product. The over-concretization problem is real. The automated pipeline needs more diverse hypothesis generation. But the core insight — that AI can discover its own trading strategies through evolutionary memory — is validated.&lt;/p&gt;

&lt;p&gt;If you're building AI agents that make decisions in uncertain environments, the memory problem is yours too. Trading is just the most measurable version of it.&lt;/p&gt;

&lt;p&gt;Want to poke holes in the methodology? The full research log with every backtest number, every eliminated strategy, and every failed hypothesis is public. I'd rather get useful criticism now than discover blind spots later.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Update (2026-03-17):&lt;/strong&gt; Ran statistical validation against 1,000 random strategies. Both Strategy C (P96.9%) and E (P100%) beat the 95th percentile. &lt;a href="https://github.com/mnemox-ai/tradememory-protocol/blob/master/VALIDATION_RESULTS.md" rel="noopener noreferrer"&gt;Full results&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;TradeMemory Protocol: &lt;a href="https://github.com/mnemox-ai/tradememory-protocol" rel="noopener noreferrer"&gt;github.com/mnemox-ai/tradememory-protocol&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Full research data: &lt;a href="https://github.com/mnemox-ai/tradememory-protocol/blob/master/RESEARCH_LOG.md" rel="noopener noreferrer"&gt;RESEARCH_LOG.md&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Questions, feedback, or want to run your own evolution experiment? Open an issue on GitHub or find me on &lt;a href="https://mnemox.ai" rel="noopener noreferrer"&gt;Mnemox AI&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>I Built an AI That Tells You If Your Idea Already Exists — And Syncs Results to Notion</title>
      <dc:creator>Sean  |   Mnemox</dc:creator>
      <pubDate>Sat, 14 Mar 2026 10:59:41 +0000</pubDate>
      <link>https://dev.to/mnemox/i-built-an-ai-that-tells-you-if-your-idea-already-exists-and-syncs-results-to-notion-3pba</link>
      <guid>https://dev.to/mnemox/i-built-an-ai-that-tells-you-if-your-idea-already-exists-and-syncs-results-to-notion-3pba</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/notion-2026-03-04"&gt;Notion MCP Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Idea Reality Tracker&lt;/strong&gt; — a dual-MCP pipeline that validates software ideas against 5 live platforms and automatically syncs structured results to a Notion database.&lt;/p&gt;

&lt;p&gt;Instead of googling "has anyone built this?" and drowning in 10 tabs of noise, you describe your idea in one sentence. In 15 seconds you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;Reality Score&lt;/strong&gt; (0–100) measuring how crowded the space is&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Market momentum&lt;/strong&gt; analysis (accelerating / stable / declining)&lt;/li&gt;
&lt;li&gt;Competitor counts from GitHub, npm, PyPI, Hacker News, and Product Hunt&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;Build / Pivot / Kill&lt;/strong&gt; recommendation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All automatically saved to your Notion workspace as a searchable decision log.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Story Behind It
&lt;/h3&gt;

&lt;p&gt;Six months ago, I asked ChatGPT if my idea for an AI trading memory system was original. It said &lt;em&gt;"This is a unique and innovative concept!"&lt;/em&gt; I believed it and spent weeks building.&lt;/p&gt;

&lt;p&gt;Then I built &lt;a href="https://github.com/mnemox-ai/idea-reality-mcp" rel="noopener noreferrer"&gt;idea-reality-mcp&lt;/a&gt; — a tool that scans actual platforms instead of relying on LLM knowledge. I ran my own idea through it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Score: 93. Momentum: Accelerating. Competitors: Mem0, FinMem, and dozens more.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That reality check forced me to pivot. Instead of building "yet another memory layer," I focused on what was actually different about my approach — and discovered a structural flaw I call Parametric-External Memory Resonance: when your RAG pipeline retrieves results that are too similar to what the LLM already believes, the model becomes overconfident and stops reasoning critically.&lt;/p&gt;

&lt;p&gt;The tool that checked my idea ended up being more valuable than the idea itself.&lt;/p&gt;

&lt;p&gt;Now every idea I consider goes through this pipeline, and results accumulate in my Notion workspace as a decision log — a searchable history of what I've validated, what I've killed, and why.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Makes This Different
&lt;/h3&gt;

&lt;p&gt;Most MCP integrations do one thing: read from a service, or write to it. This is a &lt;strong&gt;dual-MCP pipeline&lt;/strong&gt; where two independent tools collaborate through Claude to create something neither could do alone:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;idea-reality-mcp knows how to scan markets but has no persistence&lt;/li&gt;
&lt;li&gt;Notion MCP knows how to create structured pages but has no market intelligence&lt;/li&gt;
&lt;li&gt;Together, they create a persistent idea validation pipeline&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And unlike ChatGPT telling you "great idea!", this tool checks reality — with numbers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Video Demo
&lt;/h2&gt;

&lt;p&gt;No video — see screenshots below for the full e2e workflow.&lt;/p&gt;

&lt;p&gt;Here's a real validation session. I asked Claude to check "AI tool that generates unit tests from code comments":&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foj448gq661glyqotls5l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foj448gq661glyqotls5l.png" alt="Claude Desktop checking an idea and saving results to Notion" width="800" height="645"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The result: &lt;strong&gt;Reality Score 38/100&lt;/strong&gt; — medium duplicate likelihood. There's community buzz (47 HN discussions) but no dominant open-source solution yet. Claude recommended focusing on a specific workflow (e.g., JSDoc → Jest) rather than a generic solution, and saved everything to Notion with status "Checked."&lt;/p&gt;

&lt;p&gt;Here's what the Notion dashboard looks like after validating several ideas:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftmzw5rs31u08a9zxulqf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftmzw5rs31u08a9zxulqf.png" alt="Notion Board View showing ideas grouped by Build, Kill, and Pivot" width="800" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each column represents a decision:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Build&lt;/strong&gt; (green) — low competition, go for it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kill&lt;/strong&gt; (red) — too crowded, move on&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pivot&lt;/strong&gt; (yellow) — opportunity exists but needs a different angle&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Show us the code
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/mnemox-ai/idea-reality-mcp" rel="noopener noreferrer"&gt;mnemox-ai/idea-reality-mcp&lt;/a&gt; — Python, MIT license, 318+ stars&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PyPI&lt;/strong&gt;: &lt;a href="https://pypi.org/project/idea-reality-mcp/" rel="noopener noreferrer"&gt;idea-reality-mcp&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To set up both MCP servers in Claude Desktop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"idea-reality"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uvx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"idea-reality-mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"notion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@modelcontextprotocol/server-notion"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"NOTION_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"your-notion-integration-token"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then just tell Claude:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Check if this idea already exists: [your idea]. Save the results to my Notion Idea Tracker."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude handles the rest — calling both MCP tools and writing the structured entry.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Notion MCP
&lt;/h2&gt;

&lt;p&gt;The system uses two MCP servers working together through Claude:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. idea-reality-mcp&lt;/strong&gt; — scans 5 platforms in parallel and returns structured market intelligence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Notion MCP&lt;/strong&gt; (&lt;code&gt;@modelcontextprotocol/server-notion&lt;/code&gt;) — writes the results into a structured Notion database.&lt;/p&gt;

&lt;p&gt;Claude Desktop orchestrates both: it calls idea-reality-mcp first, interprets the results, then calls Notion MCP to create a database entry with all the structured data.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Notion Database
&lt;/h3&gt;

&lt;p&gt;The database schema captures everything the AI finds:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Property&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Idea&lt;/td&gt;
&lt;td&gt;Title&lt;/td&gt;
&lt;td&gt;The idea description&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reality Score&lt;/td&gt;
&lt;td&gt;Number&lt;/td&gt;
&lt;td&gt;0–100 duplicate likelihood&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Status&lt;/td&gt;
&lt;td&gt;Select&lt;/td&gt;
&lt;td&gt;Build / Pivot / Kill / Checked&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Market Momentum&lt;/td&gt;
&lt;td&gt;Select&lt;/td&gt;
&lt;td&gt;Accelerating / Stable / Declining&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Repos&lt;/td&gt;
&lt;td&gt;Number&lt;/td&gt;
&lt;td&gt;Direct competitor count&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Stars&lt;/td&gt;
&lt;td&gt;Number&lt;/td&gt;
&lt;td&gt;Top competitor traction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HN Posts&lt;/td&gt;
&lt;td&gt;Number&lt;/td&gt;
&lt;td&gt;Community buzz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;npm / PyPI Packages&lt;/td&gt;
&lt;td&gt;Number&lt;/td&gt;
&lt;td&gt;Package ecosystem overlap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Keywords&lt;/td&gt;
&lt;td&gt;Text&lt;/td&gt;
&lt;td&gt;Extracted search terms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Summary&lt;/td&gt;
&lt;td&gt;Text&lt;/td&gt;
&lt;td&gt;AI-generated strategic analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Checked At&lt;/td&gt;
&lt;td&gt;Date&lt;/td&gt;
&lt;td&gt;When the scan ran&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Why Notion as the Dashboard
&lt;/h3&gt;

&lt;p&gt;Notion's native views turn raw data into decision intelligence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Board view&lt;/strong&gt; groups ideas by Build / Pivot / Kill — one glance shows your pipeline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Table view&lt;/strong&gt; lets you sort by score or filter by momentum&lt;/li&gt;
&lt;li&gt;Over time, the database becomes a &lt;strong&gt;decision journal&lt;/strong&gt;: which ideas you killed, which you pursued, and whether the market validated your choice&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tech Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/mnemox-ai/idea-reality-mcp" rel="noopener noreferrer"&gt;idea-reality-mcp&lt;/a&gt; — Python, MIT license, 318+ GitHub stars&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/makenotion/notion-mcp-server" rel="noopener noreferrer"&gt;Notion MCP&lt;/a&gt; — official Notion MCP server&lt;/li&gt;
&lt;li&gt;Claude Desktop — orchestration layer&lt;/li&gt;
&lt;li&gt;Notion — intelligence dashboard&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Background
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/mnemox/i-gave-my-trading-agent-memory-and-it-made-everything-worse-28a3"&gt;I Gave My Trading Agent Memory and It Made Everything Worse&lt;/a&gt; — the research story behind this tool&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devchallenge</category>
      <category>notionchallenge</category>
      <category>mcp</category>
      <category>ai</category>
    </item>
    <item>
      <title>I Gave My Trading Agent Memory and It Made Everything Worse</title>
      <dc:creator>Sean  |   Mnemox</dc:creator>
      <pubDate>Tue, 10 Mar 2026 13:20:11 +0000</pubDate>
      <link>https://dev.to/mnemox/i-gave-my-trading-agent-memory-and-it-made-everything-worse-28a3</link>
      <guid>https://dev.to/mnemox/i-gave-my-trading-agent-memory-and-it-made-everything-worse-28a3</guid>
      <description>&lt;p&gt;&lt;em&gt;How similarity-based recall amplifies LLM confirmation bias, and a simple mechanism that breaks the feedback loop.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I spent two days and $73 watching an LLM trading agent destroy itself with its own memories. What I found wasn't a bug. It was a structural flaw in how every similarity-based memory system interacts with an LLM's internal beliefs — and the fix turned out to be counterintuitively simple: make the agent remember its failures, even when the retrieval system doesn't want to.&lt;/p&gt;

&lt;p&gt;This is the story of that experiment, what went wrong, and the open-source mechanism I built to prevent it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I'm building &lt;a href="https://github.com/mnemox-ai/tradememory-protocol" rel="noopener noreferrer"&gt;TradeMemory&lt;/a&gt;, an episodic memory layer for AI trading agents. The idea is straightforward: store every trade the agent makes — entry, exit, P&amp;amp;L, market context — and retrieve relevant past trades at decision time so the agent can learn from experience. Exactly what you'd want a human trader to do.&lt;/p&gt;

&lt;p&gt;The experimental framework is called Trade Dreaming. It runs an LLM agent through historical XAUUSD M15 bars (50,802 bars from Jan 2024 to Mar 2026), letting the agent decide on each bar whether to trade or hold. Three strategies are available: VolBreakout (VB), IntradayMomentum (IM), and PullbackEntry (PB). Starting equity is $10,000, risk is 0.25% per trade, buy-only.&lt;/p&gt;

&lt;p&gt;Before adding memory, I ran three different models through the same framework, same prompt, same data. The results were... instructive.&lt;/p&gt;

&lt;p&gt;(A note on costs: the full 2-day experiment cost $72.69 across 6,836 decisions and 40 trades. Sonnet runs at about $0.014 per decision, Haiku at $0.001. I mention this because "I ran experiments" sounds different when you know the entire budget was under $75.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Models, Three Personalities
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Haiku 3.5&lt;/th&gt;
&lt;th&gt;Sonnet 4&lt;/th&gt;
&lt;th&gt;DeepSeek-V3&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Decisions&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;td&gt;2,000&lt;/td&gt;
&lt;td&gt;2,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trades executed&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trade rate&lt;/td&gt;
&lt;td&gt;11.0%&lt;/td&gt;
&lt;td&gt;0.3%&lt;/td&gt;
&lt;td&gt;0.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Win rate&lt;/td&gt;
&lt;td&gt;22.7%&lt;/td&gt;
&lt;td&gt;83.3%&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Profit factor&lt;/td&gt;
&lt;td&gt;0.96&lt;/td&gt;
&lt;td&gt;2.42&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Final equity&lt;/td&gt;
&lt;td&gt;~$9,980&lt;/td&gt;
&lt;td&gt;$10,176&lt;/td&gt;
&lt;td&gt;$10,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API cost&lt;/td&gt;
&lt;td&gt;~$0.23&lt;/td&gt;
&lt;td&gt;~$28.86&lt;/td&gt;
&lt;td&gt;~$2.50&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Haiku ran 200 decisions as a preliminary screen; Sonnet and DeepSeek ran the full 2,000.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Haiku&lt;/strong&gt; was the trigger-happy intern — 22 trades in 200 decisions, 22.7% win rate, net negative. It fired at everything. Pure System 1: fast, impulsive, undiscriminating.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sonnet&lt;/strong&gt; was the senior trader — 6 trades in 2,000 decisions, 83.3% win rate, profit factor 2.42. It only took 4 VolBreakout and 2 PullbackEntry setups. Zero IntradayMomentum trades. It knew what to skip.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DeepSeek-V3&lt;/strong&gt; was the analyst who never left the office — 2,000 consecutive HOLD outputs. Zero trades. It found uncertainty in every setup, burned 3,000+ reasoning tokens per decision, and eventually crashed from memory accumulation at decision 1,786. Final equity: $10,000.00 exactly.&lt;/p&gt;

&lt;p&gt;A perfect behavioral spectrum: reckless → precise → paralyzed. The same prompt, the same data, and a 37x difference in trade frequency between Haiku and Sonnet. This alone is interesting — existing literature has documented that smarter models don't always trade better (GPT-4o-mini beats GPT-4o on Sharpe ratio in one benchmark) and that reasoning models overthink financial decisions. But nobody had quantified the full spectrum in a single framework before.&lt;/p&gt;

&lt;p&gt;Sonnet was clearly the winner. So I gave it memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory Made Everything Worse
&lt;/h2&gt;

&lt;p&gt;The memory system stores each closed trade as an episodic record — strategy, entry/exit prices, P&amp;amp;L, market regime, session, ATR, confidence level. At each new decision, the retrieval system finds the 5 most similar past trades (scored by ATR proximity, session overlap, and regime match) and injects them into the prompt.&lt;/p&gt;

&lt;p&gt;Here's what happened:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;No Memory (baseline)&lt;/th&gt;
&lt;th&gt;With Memory&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Trades&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;+1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Win rate&lt;/td&gt;
&lt;td&gt;83.3%&lt;/td&gt;
&lt;td&gt;57.1%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-26.2pp&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Profit factor&lt;/td&gt;
&lt;td&gt;2.42&lt;/td&gt;
&lt;td&gt;0.94&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-1.48&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PnL&lt;/td&gt;
&lt;td&gt;+$176&lt;/td&gt;
&lt;td&gt;-$28&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-$204&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strategies&lt;/td&gt;
&lt;td&gt;VB(4) + PB(2)&lt;/td&gt;
&lt;td&gt;VB(4) + PB(1) + &lt;strong&gt;IM(2)&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;IM appeared&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The agent went from profitable to unprofitable. Profit factor dropped below 1.0. Two IntradayMomentum trades appeared — a strategy Sonnet had correctly avoided in every single one of its 2,000 no-memory decisions. Both IM trades hit their stop-losses. Combined loss: -$437, wiping out all VB and PB gains.&lt;/p&gt;

&lt;p&gt;And here's the kicker: both IM trades were entered with confidence 0.85 — the highest confidence of any trade in the entire run. The agent was most confident on its worst trades.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Debugging Rabbit Hole
&lt;/h2&gt;

&lt;p&gt;Getting to this point wasn't clean. The first attempt at adding memory revealed that the engine wasn't even storing closed trades — a bug where &lt;code&gt;_execute_decision&lt;/code&gt; didn't return the closed position. I fixed that, re-ran, got 1 trade with 1 episodic record. Pipeline verified.&lt;/p&gt;

&lt;p&gt;Then I discovered a shortcut: I could backfill episodic memory from the existing Sonnet 2000-decision JSONL log. Six trades, already completed, just needed to be converted to memory records. That saved $28 and 5 hours of re-running the full baseline.&lt;/p&gt;

&lt;p&gt;With the backfilled memory in place, I ran the full 2,000-decision memory test. That's when the profit factor cratered from 2.42 to 0.94. Two IM trades appeared. Both lost.&lt;/p&gt;

&lt;p&gt;My first fix attempt addressed three bugs at once: added loss balance to the retrieval, fixed unbalanced guidance text in the prompt, and patched the regime classifier that was tagging everything as "unknown." All 44 retrieval tests and 503 engine tests passed. Re-ran 200 decisions.&lt;/p&gt;

&lt;p&gt;IM still appeared.&lt;/p&gt;

&lt;p&gt;It took another hour of debugging to discover the engine was using the old recall function, not my new hybrid retrieval. The &lt;code&gt;hybrid.py&lt;/code&gt; I'd written was sitting there, fully tested, completely unused. Classic integration failure. I redesigned the engine to accept a pluggable &lt;code&gt;memory_recall_fn&lt;/code&gt; via dependency injection, wired in the hybrid retrieval, hit a Pydantic import error, fixed it, and finally ran the validation that worked.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Root Cause: 100% Positive Recall
&lt;/h2&gt;

&lt;p&gt;When I examined what the agent actually saw in its prompt at the point of the IM entries, the memory block looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Past Similar Trades
1. [VolBreakout] pnl=+$92.00  Relevance: 0.97
2. [VolBreakout] pnl=+$31.10  Relevance: 0.93
3. [VolBreakout] pnl=+$105.80 Relevance: 0.78
4. [PullbackEntry] pnl=+$19.90 Relevance: 0.78
5. [PullbackEntry] pnl=+$51.60 Relevance: 0.78
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five trades. Five winners. Zero losses. The retrieval system had done exactly what it was designed to do — find the most similar past experiences — and returned an entirely positive sample.&lt;/p&gt;

&lt;p&gt;Compare this to the no-memory prompt for the same decision point. Without memory, the agent sees the current bar, 20 recent bars, technical indicators (ATR, RSI, SMAs), and its recent trade history as a flat list. With memory, it gets an additional block of 5 "similar past trades," each with context, reflection text, and a relevance score. The agent reads: "In similar market conditions, here are 5 trades you made. All 5 were profitable. The most similar one (relevance 0.97) made $92."&lt;/p&gt;

&lt;p&gt;There is no counterexample. No memory of "this setup also failed X% of the time." The agent generalizes from a perfectly biased sample.&lt;/p&gt;

&lt;p&gt;I initially thought this was a data problem — maybe the memory just didn't have enough losses yet. But at the point of the second IM trade (around decision 1,600), the episodic memory already contained 12 records, including 3 losses. Nine of 12 records were wins (75% positive bias), and 5 had regime tagged as "unknown" due to a classifier bug. But the real issue wasn't the database composition — it was that the retrieval system picked the top 5 by similarity, and all 5 happened to be winners.&lt;/p&gt;

&lt;p&gt;This wasn't a coincidence. It's a structural property of similarity-based retrieval.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Similarity-Based Retrieval Has a Built-In Positive Bias
&lt;/h2&gt;

&lt;p&gt;Think about where winning trades cluster versus where losing trades cluster:&lt;/p&gt;

&lt;p&gt;Winning trades tend to happen in typical conditions — trending markets, London session (most liquid), normal ATR ranges, textbook setups. These are the most common market states, because strategies are designed to work in common conditions.&lt;/p&gt;

&lt;p&gt;Losing trades concentrate in atypical conditions — range-bound markets, off-hours with thin liquidity, extreme ATR spikes, edge cases. By definition, unusual conditions are less similar to any typical query.&lt;/p&gt;

&lt;p&gt;When you ask "find me trades in conditions similar to right now," you're querying against the most common market state. Winning trades dominate that region of the space. Losses are scattered in the tails, where similarity scores are inherently lower.&lt;/p&gt;

&lt;p&gt;This means &lt;strong&gt;any similarity-based retrieval system will systematically over-retrieve positive outcomes&lt;/strong&gt;, even with a perfectly balanced underlying database. The bias isn't in the data. It's in the geometry of retrieval itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resonance: When Retrieval Confirms What the LLM Already Believes
&lt;/h2&gt;

&lt;p&gt;Here's where it gets dangerous. The biased retrieval doesn't operate in isolation — it feeds into an LLM that has its own beliefs.&lt;/p&gt;

&lt;p&gt;Every LLM carries parametric memory: knowledge baked into its weights during training. For trading, this includes everything it absorbed from financial textbooks and trading forums: "breakout trading works," "momentum strategies capture intraday moves," "the trend is your friend." These beliefs are permanent, uninspectable, and always running in the background.&lt;/p&gt;

&lt;p&gt;Current research on parametric-contextual knowledge interaction — surveyed comprehensively by Xu et al. at EMNLP 2024, with benchmarks like ConflictBank (NeurIPS 2024) and EchoQA (ICLR 2025) — focuses almost entirely on what happens when the two disagree. Six major papers and two benchmarks study the conflict axis. The implicit assumption is that agreement is good: both sources say the same thing, higher confidence, better output.&lt;/p&gt;

&lt;p&gt;Our data shows the opposite.&lt;/p&gt;

&lt;p&gt;When the retrieval system returns 5 winning VolBreakout trades, and Sonnet's parametric memory already believes "breakout trading works," the two signals amplify each other. I call this &lt;strong&gt;resonance&lt;/strong&gt;. The mechanism follows a clear chain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Sonnet's weights contain the belief that breakout strategies are valid (absorbed from training data — breakout trading is one of the most-documented technical strategies in existence).&lt;/li&gt;
&lt;li&gt;The agent's first few closed trades happen to be VB winners. They get stored in episodic memory.&lt;/li&gt;
&lt;li&gt;On the next decision, retrieval finds the 5 most similar past trades. All 5 are VB winners (because winning VB trades cluster in the most common market state).&lt;/li&gt;
&lt;li&gt;Now the prompt says: "Here are 5 trades you made in similar conditions. All 5 were profitable."&lt;/li&gt;
&lt;li&gt;Parametric memory says: "Breakout works." External memory says: "Everything you've done works." Both signals point the same direction. Resonance.&lt;/li&gt;
&lt;li&gt;Confidence inflates beyond calibration. The agent starts taking IntradayMomentum entries — because parametric memory says "momentum is valid" and external memory says "I'm on a winning streak."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This maps directly onto documented LLM behavior. The "Chain of Evidence" paper (arXiv, Dec 2024) demonstrated that LLMs exhibit confirmation bias — they preferentially trust external evidence that aligns with their internal knowledge, regardless of whether that evidence is actually correct. ReDeEP (ICLR 2025 Spotlight) showed that Knowledge FFNs in transformer models overemphasize parametric knowledge while Copying Heads fail to properly integrate external context. And "No Free Lunch" (EMNLP 2025) found that RAG amplifies model confidence in biased answers — just 20% unfair samples in retrieval was enough to trigger amplification.&lt;/p&gt;

&lt;p&gt;These are all pieces of the same puzzle. Nobody had assembled them into a single causal chain: &lt;strong&gt;similarity retrieval bias + LLM confirmation bias + parametric knowledge alignment = resonance&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Human Parallel
&lt;/h2&gt;

&lt;p&gt;The parallel to behavioral finance is not a metaphor — it's mechanistically identical.&lt;/p&gt;

&lt;p&gt;Godker, Jiao, and Smeets published in PNAS (2021) that human investors systematically over-remember winning trades and under-remember losses. Jiang et al. in the Quarterly Journal of Economics (2025) showed that investor memory-based beliefs explain stock return expectations, with rising markets triggering positive recall feedback loops. Fudenberg, Lanzani, and Strack formalized this in the Journal of Political Economy (2024) as a "Selective Memory Equilibrium" — agents who over-remember ego-boosting experiences become overconfident.&lt;/p&gt;

&lt;p&gt;Replace "human investor's selective forgetting" with "retrieval system's similarity bias" and you get the same outcome through a different mechanism: a biased sample of past experiences that systematically overstates the probability of success.&lt;/p&gt;

&lt;p&gt;Nobody had connected these two literatures. The behavioral finance people study humans. The AI agent people study LLMs. They're describing the same phenomenon.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anti-Resonance: The Fix Is Deliberate Conflict
&lt;/h2&gt;

&lt;p&gt;If resonance is the problem — both memory sources agreeing, amplifying confidence — then the fix is to deliberately break the agreement. I call this &lt;strong&gt;anti-resonance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When the retrieval system returns 5 winning VB trades, you force-inject at least 1 losing trade into the recall. Now the agent's prompt contains a contradiction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parametric memory: "Breakout strategies work."&lt;/li&gt;
&lt;li&gt;External memory (4 wins): "Yes, they usually work here."&lt;/li&gt;
&lt;li&gt;External memory (1 loss): "But sometimes they fail catastrophically."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent is forced to reconcile contradictory evidence instead of rubber-stamping a pre-existing belief. This is genuine reasoning — weighing competing signals, calibrating confidence, deciding whether this setup looks more like the 4 wins or the 1 loss. Without the injected loss, there's nothing to reason about.&lt;/p&gt;

&lt;p&gt;The concept has precedents at other abstraction levels. Du et al. (ICML 2024) showed multi-agent debate improves factuality through conflicting positions. De Jong et al. (CSCW 2025) explored LLMs as "epistemic provocateurs" — challenging positions to reduce human confirmation bias. But nobody had applied deliberate conflict at the retrieval level — constructing recall results that contradict the model's parametric bias.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;code&gt;ensure_negative_balance&lt;/code&gt;: The Engineering Contribution
&lt;/h2&gt;

&lt;p&gt;I implemented anti-resonance as a single, generic function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ensure_negative_balance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;top&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;all_candidates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;is_negative&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;min_negative_ratio&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;score_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;relevance_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The mechanism is post-retrieval: normal relevance ranking happens first, preserving the quality of similarity matching. Then a hard constraint is applied — at least &lt;code&gt;ceil(K × min_negative_ratio)&lt;/code&gt; of the top-K results must be negative outcomes. If there aren't enough negatives, the lowest-scored positives get swapped out for the highest-scored negatives from the full candidate pool.&lt;/p&gt;

&lt;p&gt;The key abstraction is the &lt;code&gt;is_negative&lt;/code&gt; predicate. It decouples the balance mechanism from any specific domain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Trading: losses
&lt;/span&gt;&lt;span class="nf"&gt;ensure_negative_balance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;is_negative&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pnl&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Customer service: bad outcomes
&lt;/span&gt;&lt;span class="nf"&gt;ensure_negative_balance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;is_negative&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;satisfaction&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Code review: failed builds
&lt;/span&gt;&lt;span class="nf"&gt;ensure_negative_balance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;is_negative&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;test_passed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is domain-agnostic anti-resonance. Any system that stores outcomes, retrieves by similarity, and feeds context into an LLM with parametric knowledge will produce resonance when retrieved outcomes align with parametric beliefs. The specific domain doesn't matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Validation: It Works
&lt;/h2&gt;

&lt;p&gt;After integrating the hybrid recall (with &lt;code&gt;min_negative_ratio=0.20&lt;/code&gt;) into the engine, I ran a 200-decision validation — same data window, same model, new recall path:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Old Recall (memory hurts)&lt;/th&gt;
&lt;th&gt;Hybrid Recall (fixed)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Decisions&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trades&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IM trades&lt;/td&gt;
&lt;td&gt;1 (appeared)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0 (eliminated)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PnL&lt;/td&gt;
&lt;td&gt;-$154&lt;/td&gt;
&lt;td&gt;+$29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory recalls triggered&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;IntradayMomentum — the strategy that only appeared with memory and caused -$437 in losses across the full run — was completely eliminated. The single trade was a clean VB winner. All 200 decisions triggered memory recall (compared to only 41 in the old version, which had a retrieval threshold that filtered out most queries), confirming the pipeline was fully operational.&lt;/p&gt;

&lt;p&gt;The loss balance mechanism did exactly what it was designed to do: it didn't change the retrieval algorithm, didn't modify the scoring weights, didn't retrain anything. It just guaranteed that the agent would see at least one counterexample before making a decision. That single counterexample was enough to break the resonance loop and restore calibrated behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters Beyond Trading
&lt;/h2&gt;

&lt;p&gt;Every LLM agent memory system has this problem. Any architecture that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Stores outcomes (positive and negative)&lt;/li&gt;
&lt;li&gt;Retrieves by similarity&lt;/li&gt;
&lt;li&gt;Feeds retrieved context into an LLM with parametric knowledge&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;...will produce resonance when retrieved outcomes align with parametric beliefs. Consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Customer service agent&lt;/strong&gt;: Retrieves 5 similar tickets, all resolved successfully → overconfident in a case that actually needs escalation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code review agent&lt;/strong&gt;: Retrieves 5 similar PRs, all passed tests → misses a subtle bug pattern.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Medical triage agent&lt;/strong&gt;: Retrieves 5 similar cases, all benign → misses a rare but serious condition.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The positive bias isn't in the data — it's in the geometry of retrieval. And the LLM's confirmation bias turns that geometric artifact into a confidence amplifier.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Model-Dependent Twist
&lt;/h2&gt;

&lt;p&gt;There's one more finding worth highlighting. The severity of resonance depends on the model's parametric confidence — and the interaction is nonlinear.&lt;/p&gt;

&lt;p&gt;Haiku (weak parametric beliefs, fast System 1) produced noise regardless of memory. It was already making bad decisions; memory didn't make them worse because there was no coherent signal to amplify.&lt;/p&gt;

&lt;p&gt;Sonnet (calibrated beliefs, deliberate System 2) was precisely where resonance struck hardest. It had accurate enough beliefs to trade well, and the retrieval bias pushed it past calibration into overconfidence.&lt;/p&gt;

&lt;p&gt;DeepSeek (overthinking, paralyzed System 2) was immune to resonance because it never traded at all. You can't amplify a decision that doesn't get made.&lt;/p&gt;

&lt;p&gt;This means &lt;strong&gt;memory hurts most for the best-calibrated models&lt;/strong&gt; — exactly the ones you'd want to give memory to. The relationship between model quality and memory benefit isn't monotonic. It has a danger zone at the exact performance level where you'd deploy an agent in production.&lt;/p&gt;

&lt;p&gt;Existing literature has studied model size vs. trading performance, and memory vs. trading performance, but never the interaction. This is, as far as I can tell from an extensive prior art search, the first empirical demonstration of the model × memory interaction effect.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;Two days, $73 in API costs, 6,836 decisions, 40 trades, and one genuinely surprising finding:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The most dangerous thing you can do to a well-calibrated LLM agent is give it memory that confirms what it already believes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not wrong memories. Not hallucinated memories. Accurate, relevant, correctly-retrieved memories that happen to be biased toward positive outcomes because of the geometry of similarity search. The retrieval system works perfectly. The LLM reasons coherently. And the combination produces worse decisions than no memory at all.&lt;/p&gt;

&lt;p&gt;The fix isn't better embeddings or smarter retrieval scoring. It's a structural intervention: guarantee that recall results contain enough negative outcomes to create tension with the model's parametric beliefs. Force the agent to reason about contradictory evidence instead of confirming what it already thinks.&lt;/p&gt;

&lt;p&gt;I've open-sourced &lt;code&gt;ensure_negative_balance&lt;/code&gt; as part of &lt;a href="https://github.com/mnemox-ai/tradememory-protocol" rel="noopener noreferrer"&gt;TradeMemory&lt;/a&gt;. It's 40 lines of Python. It took two days to discover why it was needed, and 30 minutes to build.&lt;/p&gt;

&lt;p&gt;The resonance problem is hiding in every RAG pipeline that feeds results into an LLM. The question is whether you'll notice before your agent gets confident enough to act on it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;All data in this article comes from actual experimental runs on XAUUSD M15 bars (Jan 2024 – Mar 2026). No results are simulated or cherry-picked. The full material pack, including trade logs, prompt comparisons, and prior art analysis, is available in the project repository.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key References:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://doi.org/10.1073/pnas.2026680118" rel="noopener noreferrer"&gt;Godker, Jiao &amp;amp; Smeets (2021, PNAS)&lt;/a&gt; — Investor memory is positively biased&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2403.08319" rel="noopener noreferrer"&gt;Xu et al. (EMNLP 2024)&lt;/a&gt; — Knowledge Conflicts for LLMs: A Survey&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2412.12632" rel="noopener noreferrer"&gt;Chain of Evidence (arXiv 2412.12632)&lt;/a&gt; — LLMs prefer evidence consistent with internal memory&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2410.11414" rel="noopener noreferrer"&gt;ReDeEP, Sun et al. (ICLR 2025 Spotlight)&lt;/a&gt; — Detecting hallucination via mechanistic interpretability&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2305.14325" rel="noopener noreferrer"&gt;Du et al. (ICML 2024)&lt;/a&gt; — Multi-agent debate improves reasoning&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2505.16067" rel="noopener noreferrer"&gt;Xie et al. (2025)&lt;/a&gt; — Memory management impacts LLM agents&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/1511.05952" rel="noopener noreferrer"&gt;Schaul et al. (2015)&lt;/a&gt; — Prioritized Experience Replay&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://doi.org/10.1093/qje/qjae038" rel="noopener noreferrer"&gt;Jiang et al. (2025, QJE)&lt;/a&gt; — Investor memory and biased beliefs&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2405.20138" rel="noopener noreferrer"&gt;No Free Lunch: RAG Undermines Fairness (EMNLP 2025)&lt;/a&gt; — RAG amplifies LLM confidence in biased answers&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>rag</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Add a pre-build reality check to your AI agent — one line, every project</title>
      <dc:creator>Sean  |   Mnemox</dc:creator>
      <pubDate>Sun, 01 Mar 2026 09:42:40 +0000</pubDate>
      <link>https://dev.to/mnemox/add-a-pre-build-reality-check-to-your-ai-agent-one-line-every-project-46e5</link>
      <guid>https://dev.to/mnemox/add-a-pre-build-reality-check-to-your-ai-agent-one-line-every-project-46e5</guid>
      <description>&lt;p&gt;Your AI coding agent just spent 3 hours building a DNS propagation checker. You were impressed. The code was clean, tests passed, CLI looked great. Then you searched GitHub: 47 repos doing exactly the same thing. One of them has 2,000+ stars and a published npm package.&lt;/p&gt;

&lt;p&gt;The agent never checked. You never asked it to. Nobody does.&lt;/p&gt;

&lt;p&gt;This is the most common failure mode of AI-assisted development. Not bad code. Not wrong architecture. Just building something that already exists, because the agent was never told to look first.&lt;/p&gt;

&lt;h2&gt;
  
  
  The blind spot
&lt;/h2&gt;

&lt;p&gt;Claude Code, Cursor, Windsurf, GitHub Copilot -- they are all excellent at writing code. Give them a spec and they will produce working software. But they have zero awareness of what already exists in the ecosystem.&lt;/p&gt;

&lt;p&gt;They don't search GitHub before scaffolding a new project. They don't check if there's already a popular npm package for what you described. They don't scan Hacker News to see if someone shipped the same idea last week.&lt;/p&gt;

&lt;p&gt;The result: you invest hours (or days) into something that already has mature alternatives. Or you ship a clone without knowing, then find out when someone drops a link in your comments section.&lt;/p&gt;

&lt;h2&gt;
  
  
  One tool, five sources, real data
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/mnemox-ai/idea-reality-mcp" rel="noopener noreferrer"&gt;idea-reality-mcp&lt;/a&gt; is an MCP server that searches five real-time sources -- GitHub, Hacker News, npm, PyPI, and Product Hunt -- and returns a &lt;code&gt;reality_signal&lt;/code&gt; score from 0 to 100.&lt;/p&gt;

&lt;p&gt;It uses live API data, not LLM opinions. Every number in the result traces back to an actual search query you can verify. The scoring formula is documented and intentionally simple: no ML, no black box.&lt;/p&gt;

&lt;p&gt;Add one line to your agent's instructions file and it checks automatically before writing any new code. Here is what that looks like in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the results look like
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Example 1: "CLI tool for DNS propagation monitoring"&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;reality_signal: 75
top_competitor: dns-propagation-checker (1,847 stars)
github_repos: 47
npm_packages: 12
hn_mentions: 23
verdict: High competition. Multiple mature tools exist.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your agent sees signal 75 and warns you: "There are 47 existing repos in this space, including one with nearly 2K stars. Want me to find a differentiation angle instead?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example 2: "MCP server for pre-build idea validation"&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;reality_signal: 15
top_competitor: none with significant traction
github_repos: 3
npm_packages: 0
hn_mentions: 2
verdict: Low competition. Greenfield opportunity.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Signal 15. The agent proceeds with confidence. The space is open.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example 3: "React component library for dashboards"&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;reality_signal: 85
top_competitor: tremor (4,200 stars)
github_repos: 312
npm_packages: 89
hn_mentions: 67
verdict: Very crowded. Strong incumbents with active development.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Signal 85. The agent stops and says: "This space has 312 repos and multiple well-funded projects. I'd recommend either targeting a specific niche or contributing to an existing library."&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup: one line per platform
&lt;/h2&gt;

&lt;p&gt;The MCP tool description already tells your agent when and how to call &lt;code&gt;idea_check&lt;/code&gt;. You just need a one-line hint in your agent's instruction file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before starting any new project, run idea_check first.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Drop this into the right file for your platform:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.cursorrules&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Windsurf&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.windsurfrules&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Copilot&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.github/copilot-instructions.md&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That is it. The tool handles scoring thresholds, competitor analysis, and pivot suggestions on its own. You do not need to spell out the logic in your instruction file -- that is the MCP server's job.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works under the hood
&lt;/h2&gt;

&lt;p&gt;The tool connects via MCP (Model Context Protocol), so any MCP-compatible agent can call it natively. When triggered:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your idea text goes through a 3-stage keyword extraction pipeline (90+ intent anchors, 80+ synonym expansions).&lt;/li&gt;
&lt;li&gt;Five sources are queried in parallel using async HTTP.&lt;/li&gt;
&lt;li&gt;Results are scored with a weighted formula: GitHub repo count, star concentration, npm/PyPI package density, HN discussion volume, and Product Hunt presence.&lt;/li&gt;
&lt;li&gt;The agent receives a structured response with the signal, evidence list, top competitors, and pivot suggestions.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Total latency: roughly 3 seconds for a deep scan across all five sources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# pip&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;idea-reality-mcp

&lt;span class="c"&gt;# uv (recommended)&lt;/span&gt;
uvx idea-reality-mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No API key required. No account. No data storage. Works entirely through live, public API queries.&lt;/p&gt;

&lt;p&gt;Set &lt;code&gt;GITHUB_TOKEN&lt;/code&gt; for higher rate limits (optional). Set &lt;code&gt;PRODUCTHUNT_TOKEN&lt;/code&gt; to include Product Hunt data (optional).&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it now
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/mnemox-ai/idea-reality-mcp" rel="noopener noreferrer"&gt;mnemox-ai/idea-reality-mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web demo&lt;/strong&gt;: &lt;a href="https://mnemox.ai/check" rel="noopener noreferrer"&gt;mnemox.ai/check&lt;/a&gt; -- test any idea without installing anything&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent instruction templates&lt;/strong&gt;: &lt;a href="https://github.com/mnemox-ai/idea-reality-mcp/blob/master/examples/agent-instructions.md" rel="noopener noreferrer"&gt;examples/agent-instructions.md&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP Registry&lt;/strong&gt;: &lt;code&gt;io.github.mnemox-ai/idea-reality-mcp&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your agent does not need to guess. Make it search.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by &lt;a href="https://github.com/mnemox-ai" rel="noopener noreferrer"&gt;Sean&lt;/a&gt; at Mnemox. 148 tests passing. MIT licensed.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>opensource</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I asked ChatGPT if my idea was original. GitHub said 847 repos already exist.</title>
      <dc:creator>Sean  |   Mnemox</dc:creator>
      <pubDate>Fri, 27 Feb 2026 15:01:41 +0000</pubDate>
      <link>https://dev.to/mnemox/i-asked-chatgpt-if-my-idea-was-original-github-said-847-repos-already-exist-500l</link>
      <guid>https://dev.to/mnemox/i-asked-chatgpt-if-my-idea-was-original-github-said-847-repos-already-exist-500l</guid>
      <description>&lt;p&gt;Last month I mass-deleted 6 hours of code.&lt;/p&gt;

&lt;p&gt;Claude had spent the entire time enthusiastically helping me build something that already had 12 competitors on GitHub. The top one had over 1,000 stars.&lt;/p&gt;

&lt;p&gt;Here's the pattern every developer hits:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Developer has an idea&lt;/li&gt;
&lt;li&gt;Asks ChatGPT: "Is this original?"&lt;/li&gt;
&lt;li&gt;ChatGPT says: "That's a great idea! Here's how to build it..."&lt;/li&gt;
&lt;li&gt;Developer spends 2 weeks building&lt;/li&gt;
&lt;li&gt;Searches GitHub → finds 847 repos doing the same thing&lt;/li&gt;
&lt;li&gt;The top one has 9,000 stars and a funded team behind it&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The AI didn't lie. It just didn't search.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "just Google it" doesn't work
&lt;/h2&gt;

&lt;p&gt;You might think: just search before you build. But manual searching has problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You search GitHub&lt;/strong&gt; → find repos, but miss npm packages and HN discussions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You search one query&lt;/strong&gt; → miss synonyms ("LLM monitoring" vs "AI observability" vs "model telemetry")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You check star counts&lt;/strong&gt; → but don't check PyPI/npm for existing packages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You spend 30 minutes&lt;/strong&gt; → and still aren't sure if you missed something&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The real issue: there's no standardized way to do a comprehensive market scan across all developer platforms at once.&lt;/p&gt;

&lt;h2&gt;
  
  
  What if your AI agent searched before coding?
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://github.com/mnemox-ai/idea-reality-mcp" rel="noopener noreferrer"&gt;idea-reality-mcp&lt;/a&gt; — an MCP server that searches real data before you build.&lt;/p&gt;

&lt;p&gt;One command. Five sources. Quantified signal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"AI code review tool"
→ reality_signal: 90/100
→ 847 GitHub repos (top: reviewdog, 9,094 ⭐)
→ 254 Hacker News mentions
→ Verdict: "Extremely high existing coverage"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It searches GitHub, Hacker News, npm, PyPI, and Product Hunt in parallel, then returns a 0-100 reality signal based on actual API data — not LLM opinions.&lt;/p&gt;

&lt;h2&gt;
  
  
  We search. They guess.
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;ChatGPT / Copilot&lt;/th&gt;
&lt;th&gt;idea-reality-mcp&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Method&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Generates opinion from training data&lt;/td&gt;
&lt;td&gt;Searches live APIs in real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sources&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None (hallucination-prone)&lt;/td&gt;
&lt;td&gt;GitHub, HN, npm, PyPI, Product Hunt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Great idea!" (usually)&lt;/td&gt;
&lt;td&gt;reality_signal: 73, 2,341 repos found&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Verifiable&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes — every number links to a real API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Instant&lt;/td&gt;
&lt;td&gt;~3 seconds (parallel async)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How it works (30 seconds)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
uvx idea-reality-mcp

&lt;span class="c"&gt;# Or add to Claude Desktop / Cursor config:&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"mcpServers"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"idea-reality"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="s2"&gt;"command"&lt;/span&gt;: &lt;span class="s2"&gt;"uvx"&lt;/span&gt;,
      &lt;span class="s2"&gt;"args"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"idea-reality-mcp"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then ask your AI agent: &lt;em&gt;"Check if anyone has built a CLI tool for DNS propagation monitoring"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The agent calls &lt;code&gt;idea_check&lt;/code&gt; automatically and gets back:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;reality_signal&lt;/strong&gt;: 0-100 score&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top similar projects&lt;/strong&gt; with star counts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HN discussion&lt;/strong&gt; evidence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pivot suggestions&lt;/strong&gt; if the space is crowded&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No API key needed. No account. No storage. It's a protocol, not a SaaS.&lt;/p&gt;

&lt;h2&gt;
  
  
  The scoring is intentionally simple
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Quick mode: GitHub repos (60%) + stars (20%) + HN mentions (20%)
Deep mode:  GitHub (25%) + stars (10%) + HN (15%) + npm (20%) + PyPI (15%) + PH (15%)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every weight is documented. Every number comes from a real API call you can verify. No ML black box.&lt;/p&gt;

&lt;p&gt;I chose explainability over sophistication because when you're deciding whether to invest weeks into a project, you need to trust the data — not a magic number.&lt;/p&gt;

&lt;h2&gt;
  
  
  Make your AI check automatically
&lt;/h2&gt;

&lt;p&gt;The most powerful pattern: add one line to your AI agent's instructions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For Claude Code&lt;/strong&gt; (&lt;code&gt;.claude/instructions.md&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before starting any new project, run idea_check to verify the idea hasn't been built already.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For Cursor&lt;/strong&gt; (&lt;code&gt;.cursorrules&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;When the user describes a new project idea, always run idea_check first.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now your agent will search before coding — every time, automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it found that surprised me
&lt;/h2&gt;

&lt;p&gt;Some results from real checks on the &lt;a href="https://mnemox.ai/check" rel="noopener noreferrer"&gt;web demo&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymzxspahjjfs7ivz8pd4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymzxspahjjfs7ivz8pd4.png" alt="idea-reality-mcp demo result" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"MCP server for monitoring LLM calls"&lt;/strong&gt; → Signal 68. Turns out there are several observability tools, but none MCP-native. Worth building with differentiation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"AI-powered code review"&lt;/strong&gt; → Signal 90. Massively crowded. reviewdog alone has 9K stars. Don't.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Pet acupuncture booking app"&lt;/strong&gt; → Signal 12. Almost nothing exists. Niche, but the market might also be tiny.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The signal doesn't tell you whether to build — it tells you what you're walking into, backed by data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open source, zero storage
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;120 tests&lt;/strong&gt;, all passing&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MIT license&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero storage&lt;/strong&gt; — nothing is logged or saved&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero accounts&lt;/strong&gt; — no signup, no API key needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Works offline&lt;/strong&gt; (dictionary-based keyword extraction for MCP mode)&lt;/li&gt;
&lt;li&gt;Published on &lt;strong&gt;PyPI&lt;/strong&gt;, &lt;strong&gt;MCP Registry&lt;/strong&gt;, &lt;strong&gt;Smithery&lt;/strong&gt;, and 10+ directories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The web demo at &lt;a href="https://mnemox.ai/check" rel="noopener noreferrer"&gt;mnemox.ai/check&lt;/a&gt; uses Claude Haiku for smarter keyword extraction, but the MCP tool itself needs zero external dependencies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/mnemox-ai/idea-reality-mcp" rel="noopener noreferrer"&gt;mnemox-ai/idea-reality-mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web demo&lt;/strong&gt;: &lt;a href="https://mnemox.ai/check" rel="noopener noreferrer"&gt;mnemox.ai/check&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Install&lt;/strong&gt;: &lt;code&gt;uvx idea-reality-mcp&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP Registry&lt;/strong&gt;: &lt;code&gt;io.github.mnemox-ai/idea-reality-mcp&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you use Claude Code or Cursor daily, add it to your agent instructions. It takes 30 seconds and saves hours.&lt;/p&gt;

&lt;p&gt;What's the worst "I should have searched first" moment you've had? Drop your idea in the comments — I'll run it through the tool and reply with the real numbers. 🔍&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by &lt;a href="https://github.com/mnemox-ai" rel="noopener noreferrer"&gt;Sean&lt;/a&gt; at Mnemox.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>webdev</category>
      <category>mcp</category>
    </item>
    <item>
      <title>idea-reality-mcp v0.3.0: How We Built Chinese Language Support Into an MCP Server</title>
      <dc:creator>Sean  |   Mnemox</dc:creator>
      <pubDate>Thu, 26 Feb 2026 15:31:25 +0000</pubDate>
      <link>https://dev.to/mnemox/idea-reality-mcp-v030-how-we-built-chinese-language-support-into-an-mcp-server-4e32</link>
      <guid>https://dev.to/mnemox/idea-reality-mcp-v030-how-we-built-chinese-language-support-into-an-mcp-server-4e32</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg2jq4a7pmzs1vbyzskl2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg2jq4a7pmzs1vbyzskl2.png" alt=" " width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — idea-reality-mcp is an MCP server that checks if your project idea already exists. v0.3.0 adds a 3-stage keyword extraction pipeline and full Chinese/mixed-language support (150+ term mappings across 15+ domains).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;When users typed ideas in Chinese like &lt;code&gt;LINE Bot 自動客服系統&lt;/code&gt;, our v0.2 keyword extraction would either return raw Chinese characters or miss the intent entirely. Every search query was garbage.&lt;/p&gt;

&lt;p&gt;For a tool used by Taiwanese developers, this was unacceptable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: 3-Stage Pipeline
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Stage A: Clean &amp;amp; Map
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Map Chinese terms to English equivalents (150+ terms)&lt;/li&gt;
&lt;li&gt;Hard-filter boilerplate words (&lt;code&gt;ai&lt;/code&gt;, &lt;code&gt;tool&lt;/code&gt;, &lt;code&gt;platform&lt;/code&gt;, &lt;code&gt;system&lt;/code&gt;...)&lt;/li&gt;
&lt;li&gt;Normalize hyphens, extract compound terms&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Stage B: Intent Anchors
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Detect 1-2 intent signals from a curated set of 90+ anchors&lt;/li&gt;
&lt;li&gt;Covers: monitoring, evaluation, agents, RAG, scheduling, billing, scraping, deployment, and more&lt;/li&gt;
&lt;li&gt;Example: &lt;code&gt;排程任務管理工具&lt;/code&gt; → anchor: &lt;code&gt;scheduling&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Stage C: Synonym Expansion
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;80+ synonym groups generate 3-8 varied search queries&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;scheduling&lt;/code&gt; expands to: &lt;code&gt;cron&lt;/code&gt;, &lt;code&gt;job queue&lt;/code&gt;, &lt;code&gt;task scheduler&lt;/code&gt;, &lt;code&gt;periodic&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Avoids duplicate words (fixed a bug where &lt;code&gt;redis redis&lt;/code&gt; could appear)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Chinese Coverage
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;CHINESE_TECH_MAP&lt;/code&gt; isn't just tech terms. We mapped 150+ terms across 15+ domains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tech/SaaS&lt;/strong&gt;: 監控→monitoring, 爬蟲→scraping, 快取→caching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Medical&lt;/strong&gt;: 中醫→tcm, 針灸→acupuncture, 病歷→medical record&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Legal&lt;/strong&gt;: 合約→contract, 律師→lawyer, 判決→court ruling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Education&lt;/strong&gt;: 教學→teaching, 考試→exam, 學習→learning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;And more&lt;/strong&gt;: agriculture, aerospace, religion, art, gaming, government...&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Key design decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sort by key length (longest first) so &lt;code&gt;客戶關係&lt;/code&gt; matches before &lt;code&gt;客戶&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Never return raw Chinese — if we can't map it, we strip it cleanly&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;追蹤&lt;/code&gt; maps to &lt;code&gt;tracking&lt;/code&gt; (general), not &lt;code&gt;tracing&lt;/code&gt; (infra-specific)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quality Numbers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;pytest&lt;/td&gt;
&lt;td&gt;93/93 passing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Golden eval (54 ideas)&lt;/td&gt;
&lt;td&gt;100% anchor hit rate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Junk ratio&lt;/td&gt;
&lt;td&gt;4% average&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TW Chinese tests (99 cases)&lt;/td&gt;
&lt;td&gt;98%+ pass rate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chinese char leakage&lt;/td&gt;
&lt;td&gt;Zero&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
uvx idea-reality-mcp

&lt;span class="c"&gt;# Or try online (no install)&lt;/span&gt;
&lt;span class="c"&gt;# https://mnemox.ai/check&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqei2xl7mnf3oi1899ocb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqei2xl7mnf3oi1899ocb.png" alt="Reality Check web interface with an input field showing an AI code review idea" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbjxllub0eb9hkkecbtfi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbjxllub0eb9hkkecbtfi.png" alt=" " width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fslrctry2cj2oqq9uo09r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fslrctry2cj2oqq9uo09r.png" alt="Reality Signal score of 90 with a red gauge indicating high competition" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4kgfw2c2jxm67uw5p1yf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4kgfw2c2jxm67uw5p1yf.png" alt="Evidence grid showing 664,818 GitHub repos, 24 HN posts, and 70,408 top stars with similar projects list" width="800" height="1212"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fezsad7dpa7zwiw500905.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fezsad7dpa7zwiw500905.gif" alt=" " width="640" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/mnemox-ai/idea-reality-mcp" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; — star if useful&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://pypi.org/project/idea-reality-mcp/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt; — &lt;code&gt;pip install idea-reality-mcp&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/mnemox-ai/idea-reality-mcp/releases/tag/v0.3.0" rel="noopener noreferrer"&gt;Release Notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://mnemox.ai/check" rel="noopener noreferrer"&gt;Live Demo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;MIT licensed. Built by &lt;a href="https://mnemox.ai" rel="noopener noreferrer"&gt;Mnemox AI&lt;/a&gt; in Taipei.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>mcp</category>
      <category>opensource</category>
    </item>
    <item>
      <title>From 2 sources to 5: How I upgraded my "idea reality check" MCP server in one day</title>
      <dc:creator>Sean  |   Mnemox</dc:creator>
      <pubDate>Wed, 25 Feb 2026 08:49:49 +0000</pubDate>
      <link>https://dev.to/mnemox/from-2-sources-to-5-how-i-upgraded-my-idea-reality-check-mcp-server-in-one-day-3gjh</link>
      <guid>https://dev.to/mnemox/from-2-sources-to-5-how-i-upgraded-my-idea-reality-check-mcp-server-in-one-day-3gjh</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl7az7oakpyx2l3c1bept.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl7az7oakpyx2l3c1bept.png" alt=" " width="726" height="752"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is a follow-up to &lt;a href="//%E4%BD%A0%E7%AC%AC%E4%B8%80%E7%AF%87%E7%9A%84%20dev.to%20%E9%80%A3%E7%B5%90"&gt;Stop Your AI Agent From Building What Already Exists&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  v0.1 had a blind spot
&lt;/h2&gt;

&lt;p&gt;Two weeks ago I shipped &lt;a href="https://github.com/mnemox-ai/idea-reality-mcp" rel="noopener noreferrer"&gt;idea-reality-mcp&lt;/a&gt; — an MCP server that checks if your idea already exists before your AI starts coding.&lt;/p&gt;

&lt;p&gt;It worked. But it only looked at two places: GitHub and Hacker News.&lt;/p&gt;

&lt;p&gt;That meant it missed entire categories. npm has 297,000+ packages related to MCP alone. PyPI has its own ecosystem. Product Hunt has thousands of launched products that never made it to GitHub.&lt;/p&gt;

&lt;p&gt;Two sources wasn't enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  v0.2: five sources, one command
&lt;/h2&gt;

&lt;p&gt;The new version scans GitHub, Hacker News, npm, PyPI, and Product Hunt in parallel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uvx idea-reality-mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;quick&lt;/strong&gt; — GitHub + HN only (fast, same as v0.1)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;deep&lt;/strong&gt; — all five sources at once via &lt;code&gt;asyncio.gather()&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's a real test with depth="deep":&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Query&lt;/td&gt;
&lt;td&gt;"AI trading bot for gold"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;reality_signal&lt;/td&gt;
&lt;td&gt;82 / 100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;duplicate_likelihood&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sources_used&lt;/td&gt;
&lt;td&gt;GitHub + HN + npm + PyPI (PH skipped, no token)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub repos&lt;/td&gt;
&lt;td&gt;1,359&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HN mentions&lt;/td&gt;
&lt;td&gt;254 (across 3 keyword variants)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;top_similars&lt;/td&gt;
&lt;td&gt;GOLD_ORB (XAUUSD EA, 186 stars)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pivot_hints&lt;/td&gt;
&lt;td&gt;"Consider niche differentiator or plugin for existing tools"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;An 82 with 1,359 repos means: the space is crowded, but the tool also found a specific competitor (GOLD_ORB) that I could study before deciding whether to proceed.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed under the hood
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;New sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;npm&lt;/strong&gt; — hits the registry JSON API (&lt;code&gt;/-/v1/search&lt;/code&gt;), free, no auth needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PyPI&lt;/strong&gt; — scrapes search HTML with regex fallback (no official search API exists)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt; — optional GraphQL v2, requires &lt;code&gt;PRODUCTHUNT_TOKEN&lt;/code&gt;. No token? Gracefully skipped, zero config stays zero config.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Smarter keyword extraction:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;v0.1 just sorted words by length. v0.2 detects compound terms ("machine learning", "web app"), prioritizes technical keywords (React, Docker, FastAPI), and generates a 4th query variant optimized for registry searches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;New scoring weights for deep mode:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GitHub repos:  25%    GitHub stars: 10%
HN mentions:   15%    npm packages: 20%
PyPI packages: 15%    Product Hunt: 15%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If Product Hunt is unavailable, its weight redistributes automatically across the other sources.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;v0.1: 2 sources, 31 tests&lt;/li&gt;
&lt;li&gt;v0.2: 5 sources, 73 tests&lt;/li&gt;
&lt;li&gt;Still zero config for basic usage&lt;/li&gt;
&lt;li&gt;Still one install command&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;The scoring is better but still rule-based. v0.3 will likely add LLM-powered analysis for the "deep" mode — using the raw data from all five sources to generate a more nuanced assessment instead of just a weighted formula.&lt;/p&gt;

&lt;p&gt;If you're building with Claude, Cursor, or any MCP-compatible tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uvx idea-reality-mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/mnemox-ai/idea-reality-mcp" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; — MIT licensed, zero dependencies beyond Python.&lt;/p&gt;

&lt;p&gt;Built by &lt;a href="https://mnemox.ai" rel="noopener noreferrer"&gt;Mnemox&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Previously: &lt;a href="https://dev.to/mnemox/stop-your-ai-agent-from-building-what-already-exists-2mdj"&gt;Stop Your AI Agent From Building What Already Exists&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>opensource</category>
      <category>python</category>
    </item>
    <item>
      <title>Stop Your AI Agent From Building What Already Exists</title>
      <dc:creator>Sean  |   Mnemox</dc:creator>
      <pubDate>Tue, 24 Feb 2026 18:08:11 +0000</pubDate>
      <link>https://dev.to/mnemox/stop-your-ai-agent-from-building-what-already-exists-2mdj</link>
      <guid>https://dev.to/mnemox/stop-your-ai-agent-from-building-what-already-exists-2mdj</guid>
      <description>&lt;h2&gt;
  
  
  I wasted 6 hours building something that already had 847 GitHub repos
&lt;/h2&gt;

&lt;p&gt;Last month I told Claude: "Build me an AI-powered food recommendation engine."&lt;/p&gt;

&lt;p&gt;It did. Beautifully. Clean code, tests passing, README done.&lt;/p&gt;

&lt;p&gt;Then I searched GitHub. &lt;strong&gt;847 repos.&lt;/strong&gt; Twelve of them had over 100 stars. Some were updated &lt;em&gt;that same week&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I had just mass-produced another clone.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem isn't coding speed — it's decision correctness
&lt;/h2&gt;

&lt;p&gt;Every AI coding tool in 2026 makes you build faster. Cursor, Claude Code, Copilot — they're all racing to write code at the speed of thought.&lt;/p&gt;

&lt;p&gt;But none of them ask the one question that matters:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Should you build this at all?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  So I built a reality check that lives inside the workflow
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/mnemox-ai/idea-reality-mcp" rel="noopener noreferrer"&gt;Idea Reality MCP&lt;/a&gt; is an MCP server — not a website, not a dashboard, not another SaaS validator.&lt;/p&gt;

&lt;p&gt;It's a tool your AI agent calls &lt;em&gt;before&lt;/em&gt; it starts building.&lt;/p&gt;

&lt;p&gt;Install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uvx idea-reality-mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add to Claude Desktop config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"idea-reality"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uvx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"idea-reality-mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then just tell Claude: "Check if this idea already exists before we build it."&lt;/p&gt;

&lt;h2&gt;
  
  
  What it returns
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reality_signal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;82&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"duplicate_likelihood"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"evidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"github"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"repo_count"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;847&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"github"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high_star_repos"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hn"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mention_count"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"top_similars"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"food-rec-ai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"stars"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2340&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"updated"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-02-18"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"pivot_hints"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Space is saturated. Consider vertical-specific targeting."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Most existing tools are generic — niche wins."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An 82 means: &lt;strong&gt;stop. Research first. Pivot or differentiate.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A 15 means: &lt;strong&gt;green light. The space is open.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why MCP, not a website?
&lt;/h2&gt;

&lt;p&gt;Idea validators already exist as websites — IdeaProof, ValidatorAI, DimeADozen, FounderPal. There are dozens.&lt;/p&gt;

&lt;p&gt;But they all require you to &lt;strong&gt;leave your workflow&lt;/strong&gt;, open a browser, type your idea, wait for a report, then go back to coding.&lt;/p&gt;

&lt;p&gt;That's the wrong architecture. The check should happen &lt;em&gt;inside the moment you decide to build&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;MCP makes this possible. Your AI agent calls &lt;code&gt;idea_check()&lt;/code&gt; the same way it calls any other tool. No context switch. No extra tab. No friction.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IDEA → reality check → BUILD
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IDEA → BUILD → discover competition → regret
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The scoring is intentionally simple
&lt;/h2&gt;

&lt;p&gt;v0.1.0 uses three signals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub repo count&lt;/strong&gt; (keyword search across 3 query variants)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub star/recency&lt;/strong&gt; (are top repos actively maintained?)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hacker News mentions&lt;/strong&gt; (has this been discussed in the last 12 months?)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Weighted formula: &lt;code&gt;(github_repos × 0.6) + (github_stars × 0.2) + (hn_mentions × 0.2)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Is it perfect? No. Is it better than zero signal? Absolutely.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;This is v0.1.0. The roadmap includes ProductHunt scanning, deeper keyword extraction, and an opt-in "idea memory dataset" — a global record of what people have checked and what happened next.&lt;/p&gt;

&lt;p&gt;If you're building with Claude, Cursor, or any MCP-compatible tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uvx idea-reality-mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/mnemox-ai/idea-reality-mcp" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt; — MIT licensed, zero dependencies beyond Python.&lt;/p&gt;

&lt;p&gt;Built by &lt;a href="https://mnemox.ai" rel="noopener noreferrer"&gt;Mnemox&lt;/a&gt; — we're building protocol-layer intelligence for AI builders.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Previously: &lt;a href="https://dev.to/mnemox/why-your-ai-trading-agent-needs-a-memory-and-how-we-built-one-kjo"&gt;Why Your AI Trading Agent Needs a Memory&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjs7h0pg2omwc4o7g61yi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjs7h0pg2omwc4o7g61yi.png" alt=" " width="800" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>opensource</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Why Your AI Trading Agent Needs a Memory — and How We Built One</title>
      <dc:creator>Sean  |   Mnemox</dc:creator>
      <pubDate>Mon, 23 Feb 2026 13:35:36 +0000</pubDate>
      <link>https://dev.to/mnemox/why-your-ai-trading-agent-needs-a-memory-and-how-we-built-one-kjo</link>
      <guid>https://dev.to/mnemox/why-your-ai-trading-agent-needs-a-memory-and-how-we-built-one-kjo</guid>
      <description>&lt;p&gt;Every AI trading assistant I've used has the same problem: amnesia.&lt;/p&gt;

&lt;p&gt;You ask Claude to analyze a gold trade. It gives you solid analysis — identifies the London session breakout, notes the resistance level, suggests a stop loss. Great.&lt;/p&gt;

&lt;p&gt;Next week, the exact same setup appears. And Claude has zero memory of what happened last time. Did that breakout work? Did the stop loss get hit? It doesn't know. It can't know.&lt;/p&gt;

&lt;p&gt;That's not how real traders think. A veteran trader carries thousands of pattern recognitions in their head. They call it "feel for the market" — but it's really just &lt;strong&gt;memory refined into judgment over time&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So I asked: what if we could give AI that same kind of memory?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: AI Agents Are Stateless
&lt;/h2&gt;

&lt;p&gt;Most AI trading tools today work like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You give the AI some market data&lt;/li&gt;
&lt;li&gt;It analyzes and gives a recommendation&lt;/li&gt;
&lt;li&gt;The conversation ends&lt;/li&gt;
&lt;li&gt;Next time, it starts from zero&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There's no learning loop. No way for the AI to say "last time I saw this pattern in Asian session, it failed 4 out of 5 times — I should be cautious."&lt;/p&gt;

&lt;p&gt;Existing solutions don't solve this either. Trading journals are built for humans, not agents. Backtesting frameworks test strategies, but don't give the AI a persistent memory it can query in real-time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: A 3-Layer Memory Architecture
&lt;/h2&gt;

&lt;p&gt;We built &lt;a href="https://github.com/mnemox-ai/tradememory-protocol" rel="noopener noreferrer"&gt;TradeMemory Protocol&lt;/a&gt; — an open-source memory layer for AI trading agents.&lt;/p&gt;

&lt;p&gt;It has three layers, inspired by how human traders actually develop expertise:&lt;/p&gt;

&lt;h3&gt;
  
  
  L1: Raw Trade Memory
&lt;/h3&gt;

&lt;p&gt;Every trade is automatically recorded with full context — entry price, exit price, stop loss, take profit, timeframe, session, outcome, and the AI's reasoning at the time.&lt;/p&gt;

&lt;p&gt;Think of it as a perfect trading journal that never forgets a detail.&lt;/p&gt;

&lt;h3&gt;
  
  
  L2: Pattern Memory
&lt;/h3&gt;

&lt;p&gt;This is where it gets interesting. A reflection engine periodically reviews L1 data and extracts patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"London session breakouts on XAUUSD: 73% win rate (n=41)"&lt;/li&gt;
&lt;li&gt;"Counter-trend entries during NFP: 23% win rate — avoid"&lt;/li&gt;
&lt;li&gt;"Pullback entries after strong trend days: 81% win rate when RSI &amp;lt; 40"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AI discovers what works and what doesn't — from its own history.&lt;/p&gt;

&lt;h3&gt;
  
  
  L3: Strategy Memory
&lt;/h3&gt;

&lt;p&gt;L2 patterns get promoted into active strategy adjustments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Asian session detected → reduce position size by 0.8x (based on lower win rate)"&lt;/li&gt;
&lt;li&gt;"Strong trend day + pullback setup → increase confidence, full position"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is memory becoming real-time judgment. The AI equivalent of "feel for the market."&lt;/p&gt;

&lt;h2&gt;
  
  
  Why MCP?
&lt;/h2&gt;

&lt;p&gt;We built this on Anthropic's &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; (MCP) because it solves distribution. Any MCP-compatible AI agent — Claude, GPT-based agents, open-source models — can plug into TradeMemory and immediately get persistent memory.&lt;/p&gt;

&lt;p&gt;The protocol exposes 7 tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;record_trade&lt;/code&gt; — Log a trade with full context&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;get_trade_history&lt;/code&gt; — Query past trades with filters&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;reflect_on_trades&lt;/code&gt; — Trigger pattern discovery&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;get_patterns&lt;/code&gt; — Retrieve discovered patterns&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;get_strategy_adjustments&lt;/code&gt; — Get real-time strategy modifications&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;get_memory_stats&lt;/code&gt; — Dashboard of memory state&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;search_memory&lt;/code&gt; — Semantic search across all memory layers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No API keys to manage, no separate dashboard. The AI talks directly to its own memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Reflection Engine
&lt;/h2&gt;

&lt;p&gt;The core innovation is the &lt;code&gt;ReflectionEngine&lt;/code&gt;. After enough L1 trades accumulate, it uses Claude's API to analyze the history and extract patterns.&lt;/p&gt;

&lt;p&gt;It's essentially the AI reflecting on its own decisions — what worked, what didn't, and why. The patterns it discovers get stored in L2, and the strongest patterns get promoted to L3 as active strategy adjustments.&lt;/p&gt;

&lt;p&gt;This is inspired by the &lt;a href="https://arxiv.org/abs/2303.11366" rel="noopener noreferrer"&gt;Reflexion framework&lt;/a&gt; and the &lt;a href="https://arxiv.org/abs/2311.13743" rel="noopener noreferrer"&gt;FinMem paper&lt;/a&gt; from 2023, which proved that layered memory architectures improve LLM trading performance. We took that academic insight and engineered it into a production-ready, pluggable protocol.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Usage
&lt;/h2&gt;

&lt;p&gt;My own quantitative trading system, NG_Gold (trading XAUUSD on MT5), is the first production user of TradeMemory Protocol. The system runs three strategies — VolBreakout, Pullback Entry, and IntradayMomentum — and every trade flows through the memory system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/mnemox-ai/tradememory-protocol.git
&lt;span class="nb"&gt;cd &lt;/span&gt;tradememory-protocol
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
python &lt;span class="nt"&gt;-m&lt;/span&gt; pytest tests/ &lt;span class="nt"&gt;-v&lt;/span&gt;  &lt;span class="c"&gt;# 36 tests passing&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See the full &lt;a href="https://github.com/mnemox-ai/tradememory-protocol/blob/master/docs/QUICK_START.md" rel="noopener noreferrer"&gt;Quick Start Guide&lt;/a&gt; for MCP integration setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Demo with real trade data flowing through L1 → L2 → L3&lt;/li&gt;
&lt;li&gt;More platform adapters (Interactive Brokers, crypto DEX)&lt;/li&gt;
&lt;li&gt;Multi-agent memory sharing (multiple AI agents learning from the same trade history)&lt;/li&gt;
&lt;li&gt;Community-contributed pattern libraries&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/mnemox-ai/tradememory-protocol" rel="noopener noreferrer"&gt;github.com/mnemox-ai/tradememory-protocol&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture docs&lt;/strong&gt;: &lt;a href="https://github.com/mnemox-ai/tradememory-protocol/blob/master/docs/ARCHITECTURE.md" rel="noopener noreferrer"&gt;ARCHITECTURE.md&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License&lt;/strong&gt;: MIT&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Built by &lt;a href="https://mnemox.ai" rel="noopener noreferrer"&gt;Mnemox&lt;/a&gt; in Taipei. We build memory infrastructure for AI agents.&lt;/p&gt;

&lt;p&gt;If you're working on AI trading agents or have ideas about what memory patterns would be useful, I'd love to hear from you in the comments or on GitHub.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>trading</category>
      <category>opensource</category>
      <category>python</category>
    </item>
  </channel>
</rss>
