<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Charbel</title>
    <description>The latest articles on DEV Community by Charbel (@charbull).</description>
    <link>https://dev.to/charbull</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3707644%2F81e2b118-4e82-4758-b87c-8110f3e58bf8.png</url>
      <title>DEV Community: Charbel</title>
      <link>https://dev.to/charbull</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/charbull"/>
    <language>en</language>
    <item>
      <title>Building a Stock Advisor on a Coral Dev Board</title>
      <dc:creator>Charbel</dc:creator>
      <pubDate>Wed, 13 May 2026 03:25:20 +0000</pubDate>
      <link>https://dev.to/charbull/building-a-stock-advisor-on-a-coral-dev-board-from-edge-tpu-bugs-to-working-tpu-inference-156</link>
      <guid>https://dev.to/charbull/building-a-stock-advisor-on-a-coral-dev-board-from-edge-tpu-bugs-to-working-tpu-inference-156</guid>
      <description>&lt;p&gt;A few months ago I set out to answer a simple question: can I build a &lt;strong&gt;scientific framework for deciding when to sell my Google RSUs&lt;/strong&gt; instead of making decisions based on gut feeling?&lt;/p&gt;

&lt;p&gt;The answer turned out to be "sort of, but the process taught me far more than the answer did." This post covers the full arc — hardware choices, architecture decisions, the bugs that kept predictions stuck at 0.00%, and finally a working system running at 2.5ms on the Edge TPU.&lt;/p&gt;

&lt;p&gt;I also added a second model — a direction classifier that predicts whether price will go up or down — to complement the original price regression model. The dual-model results are instructive and sometimes humbling.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hardware Stack
&lt;/h2&gt;

&lt;p&gt;I started with what I had: a &lt;strong&gt;&lt;a href="https://www.coral.ai/docs/dev-board/get-started/" rel="noopener noreferrer"&gt;Google Coral Dev Board&lt;/a&gt;&lt;/strong&gt; sitting on my shelf. The Coral has an Edge TPU coprocessor connected to the CPU via PCIe — not the USB Accelerator version, the on-chip variant. It's discontinued hardware, but it's genuinely capable for what I needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HP Victus RTX 3050 — primary training environment
Coral Dev Board   → inference + sentiment (2W idle, always on)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight that drove the architecture: &lt;strong&gt;you don't need the same hardware for training and inference&lt;/strong&gt;. The Coral is terrible at training (no backprop support) but excellent at fast, cheap, power-efficient inference.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Conv1D and Not LSTM
&lt;/h2&gt;

&lt;p&gt;The Coral TPU's supported op set is frozen at 2019. This matters enormously:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;TPU Support&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CONV_2D&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;✅ Full&lt;/td&gt;
&lt;td&gt;Conv1D maps here&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ReLU6&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;✅ Native&lt;/td&gt;
&lt;td&gt;NOT regular ReLU&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;GlobalAvgPool&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;✅ Native&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;BatchMatMul&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;❌ CPU fallback&lt;/td&gt;
&lt;td&gt;Kills LSTM, Transformers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LayerNorm&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;❌ CPU fallback&lt;/td&gt;
&lt;td&gt;Kills BERT-family&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;GELU&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;❌ CPU fallback&lt;/td&gt;
&lt;td&gt;Use ReLU6 instead&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;LSTM falls back to CPU because of &lt;code&gt;BatchMatMul&lt;/code&gt;. FinBERT falls back to CPU because of &lt;code&gt;LayerNorm&lt;/code&gt;. Conv1D runs 100% on-chip because it maps directly to &lt;code&gt;CONV_2D&lt;/code&gt;. The practical result: &lt;strong&gt;2.5ms on TPU vs ~300ms on the ARM CPU&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Feature Set: 52 Indicators Across 7 Groups
&lt;/h2&gt;

&lt;p&gt;The input is a 60-day window of 52 features per day, computed from OHLCV data for the target ticker plus SPY (market proxy) and VIX (fear gauge):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Group 1 – Price/Volume (5)      close_norm, OHLC ratios, volume deviation
Group 2 – Returns &amp;amp; RVol (6)    1d/5d/20d returns, log-return, realized vol
Group 3 – Momentum (11)         RSI×3, Stochastic K/D, Williams%R, MFI, CCI, ROC×3
Group 4 – MACD family (4)       line, signal, histogram, histogram delta
Group 5 – Trend &amp;amp; MAs (12)      close vs MA5/10/20/50/100/200, Bollinger, ATR, ADX, DI+/-
Group 6 – Volume (4)            OBV, vol ratio, CMF, vol momentum
Group 7 – Market context (10)   SPY returns, VIX z-score, relative strength,
                                 calendar cyclicals, 52w high/low distances
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The price model outputs three log-return predictions for 1-day, 3-day, and 5-day forward closes. The direction model outputs three up-probabilities for the same horizons.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bug 1: The Scaler That Refitted Itself
&lt;/h2&gt;

&lt;p&gt;For weeks the model was outputting this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1-day    →  $ 314.74  ▼ 0.00%
3-day    →  $ 314.74  ▼ 0.00%
5-day    →  $ 314.74  ▼ 0.00%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The inference code was silently fitting a brand new &lt;code&gt;RobustScaler&lt;/code&gt; on 2 years of current data when the scaler file wasn't found:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# BUG — silently refits if the file doesn't exist
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SCALER_PATH&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;scaler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RobustScaler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;scaler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;feat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# ← fits on 2 years of live data
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model was trained with a scaler fit on &lt;strong&gt;10 years of data across 30 tickers&lt;/strong&gt;. Different statistics, different scaling — the model received garbage inputs and output zeros.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Fix — crash loudly instead of silently producing wrong results
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SCALER_PATH&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;FileNotFoundError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Scaler not found. Copy price_model_scaler_params.npz from your &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;training machine. Never refit the scaler at inference time.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Bug 2: GlobalAveragePooling1D vs Flatten
&lt;/h2&gt;

&lt;p&gt;Using &lt;code&gt;Flatten&lt;/code&gt; instead of &lt;code&gt;GlobalAveragePooling1D&lt;/code&gt; caused only 2 of 40 ops to run on the TPU:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# WRONG — Flatten breaks the TPU execution graph
&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Flatten&lt;/span&gt;&lt;span class="p"&gt;()(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# RIGHT — GlobalAveragePooling1D maps to MEAN (TPU-native)
&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GlobalAveragePooling1D&lt;/span&gt;&lt;span class="p"&gt;()(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Bug 3: BatchNormalization Splits the Graph
&lt;/h2&gt;

&lt;p&gt;Even after fixing the above, the edgetpu-compiled model output all-zeros. The &lt;code&gt;edgetpu_compiler&lt;/code&gt; log revealed why:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DEQUANTIZE         1   Operation is working on an unsupported data type
CONV_2D            1   Mapped to Edge TPU
CONV_2D            4   More than one subgraph is not supported
FULLY_CONNECTED    3   More than one subgraph is not supported
MAX_POOL_2D        1   More than one subgraph is not supported
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only 2 ops out of ~20 ran on the TPU. &lt;code&gt;BatchNormalization&lt;/code&gt; uses float32 accumulators. When TFLite quantizes the graph it inserts a &lt;code&gt;DEQUANTIZE&lt;/code&gt; node — and &lt;code&gt;DEQUANTIZE&lt;/code&gt; is unsupported on the Edge TPU. This creates a subgraph boundary. The TPU runs everything before the first &lt;code&gt;DEQUANTIZE&lt;/code&gt; (one Conv), and everything after (pooling, dense layers, output) runs on CPU with uninitialized output quantization &lt;code&gt;(scale=0.0, zp=0)&lt;/code&gt;, which dequantizes to all-zeros.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; remove &lt;code&gt;BatchNormalization&lt;/code&gt; entirely and switch &lt;code&gt;use_bias=False&lt;/code&gt; → &lt;code&gt;use_bias=True&lt;/code&gt;. The inputs are already RobustScaler-normalized, so BN isn't needed for stability. ReLU6 keeps activations bounded for INT8.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before — BatchNorm causes DEQUANTIZE → subgraph split → zeros on TPU
&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Conv1D&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;padding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;same&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;use_bias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;BatchNormalization&lt;/span&gt;&lt;span class="p"&gt;()(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Activation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;relu6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# After — clean graph, 100% TPU execution
&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Conv1D&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;padding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;same&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;use_bias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Activation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;relu6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After this change, the compiler log became all &lt;code&gt;Mapped to Edge TPU&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bug 4: Reading the Wrong Quantization Scale
&lt;/h2&gt;

&lt;p&gt;Even with all ops on the TPU, inputs showed &lt;code&gt;Std: 28.89 | Unique Levels: 149&lt;/code&gt; — meaning values were being crushed to the INT8 boundary. The model was receiving a barcode of extreme values instead of a price chart.&lt;/p&gt;

&lt;p&gt;The cause: reading the input scale from the wrong field.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# WRONG — reads per-channel WEIGHT scales of the first Conv layer
&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;in_d&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;quantization_parameters&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;sc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;scales&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="c1"&gt;# a tiny weight-magnitude value like 0.003
&lt;/span&gt;
&lt;span class="c1"&gt;# RIGHT — reads the per-tensor INPUT scale from the calibration dataset
&lt;/span&gt;&lt;span class="n"&gt;sc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;zp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;in_d&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;quantization&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="c1"&gt;# correctly ~0.039 (= 5/127)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;quantization_parameters['scales']&lt;/code&gt; is an array of per-channel weight scales — one per Conv filter. &lt;code&gt;quantization&lt;/code&gt; is the plain &lt;code&gt;(scale, zero_point)&lt;/code&gt; 2-tuple the TFLite INT8 converter computes from the representative calibration data for the &lt;em&gt;input&lt;/em&gt; tensor. Using the weight scale to quantize a &lt;code&gt;[-5, 5]&lt;/code&gt; input means a value of &lt;code&gt;1.0&lt;/code&gt; quantizes to &lt;code&gt;1.0/0.003 = 333&lt;/code&gt;, clips to 127, and 90%+ of the input space collapses to the boundary. After the fix: &lt;code&gt;Std: 24.32 | Unique Levels: 152&lt;/code&gt;. Real predictions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-Ticker Training: Why 30 Stocks Instead of 1
&lt;/h2&gt;

&lt;p&gt;Training only on GOOGL gives ~2,300 bars — thin for a 60-day sequence model. Training on 30 tickers gives 55,560 sequences and forces the model to learn &lt;strong&gt;generalizable price dynamics&lt;/strong&gt; rather than GOOGL-specific patterns.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;DEFAULT_TICKERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AAPL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MSFT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NVDA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;META&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AMZN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TSLA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# mega-cap tech
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JPM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BAC&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;V&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                              &lt;span class="c1"&gt;# financials
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JNJ&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UNH&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PFE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ABBV&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                &lt;span class="c1"&gt;# healthcare
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XOM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CVX&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                               &lt;span class="c1"&gt;# energy
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;WMT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CAT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UPS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                  &lt;span class="c1"&gt;# consumer/industrial
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AMD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INTC&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TSM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                       &lt;span class="c1"&gt;# semiconductors
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XLK&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XLF&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XLE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XLV&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SPY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                         &lt;span class="c1"&gt;# sector ETFs
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Preventing Data Leakage: The Embargo Gap
&lt;/h2&gt;

&lt;p&gt;Adjacent sequences in a sequence model share almost all their data. Sequence 100 uses days 40–99; sequence 101 uses days 41–100. A standard train/val split puts these in different sets, creating look-ahead leakage. The fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;EMBARGO&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SEQ_LEN&lt;/span&gt;  &lt;span class="c1"&gt;# must be &amp;gt;= SEQ_LEN
&lt;/span&gt;
&lt;span class="n"&gt;split&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.85&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;train_end&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;split&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;EMBARGO&lt;/span&gt;
&lt;span class="n"&gt;val_start&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;split&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;EMBARGO&lt;/span&gt;

&lt;span class="n"&gt;X_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;train_end&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;X_val&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;val_start&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Adding a Direction Model
&lt;/h2&gt;

&lt;p&gt;The price model can cheat by predicting "slightly positive" for everything and still minimize MAE on bull market data. A direction model predicts binary up/down, which is harder to game:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Price model: linear head, Huber loss
&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price_output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;loss_fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;losses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Huber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Direction model: sigmoid head, binary cross-entropy
&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sigmoid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;direction_output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;loss_fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;binary_crossentropy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both models train in one command with &lt;code&gt;--mode both&lt;/code&gt;, sharing the same dataset and producing all deployment artifacts including automatic Edge TPU compilation.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Walk-Forward CV Results
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Price model CV Mean : 1d=53.5%  3d=56.4%  5d=57.7%
Direction model CV  : 1d=52.0%  3d=55.5%  5d=55.9%

Held-out val:
  Price      1d=53.0%  3d=56.8%  5d=57.6%
  Direction  1d=52.6%  3d=56.2%  5d=57.6%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both models cross 54% on 5-day, which is the threshold that indicates a real edge. Results are consistent across all 4 folds with no suspicious outlier fold.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Backtest Results
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Mode            ROI      Ann. ROI   Sharpe   Drawdown   Trades   Win Rate
─────────────────────────────────────────────────────────────────────────
Price only    +2.63%    +1.34%     -0.72    -5.96%       14      57.1%
Direction     +16.48%   +8.13%     +0.36   -11.68%       30      46.7%
Fusion        +2.76%    +1.40%     -0.99    -5.80%       10      40.0%
─────────────────────────────────────────────────────────────────────────
Buy &amp;amp; Hold   +98.95%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All three modes underperform buy-and-hold on GOOGL over 3 years. This is the right conclusion for RSU decisions: in a sustained bull trend the default should be to hold, and the bar for the model to recommend a sale should be high. The system's value is in providing a rigorous framework for when to deviate from holding, not in trading actively.&lt;/p&gt;




&lt;h2&gt;
  
  
  The System Running Live
&lt;/h2&gt;

&lt;p&gt;After fixing all four bugs, both models run on the Edge TPU at 2.5ms each:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;════════════════════════════════════════════════════════════════════
  📈  GOOG Advisor  |  Coral Edge TPU Dev Board  [FUSION mode]
  2026-04-27 07:03:58
════════════════════════════════════════════════════════════════════
  Last close : $342.32  ▲ 4.57 (1.35%)  [592ms]

────────────────────────────────────────────────────────────────────
  📊 Technical Analysis  (18 indicators)
────────────────────────────────────────────────────────────────────
  🟢  RSI-14 69.7 → Above midline
  🟢  RSI trend +4.6 → accelerating upward
  🟢  MACD 10.17 &amp;gt; Signal 7.43 → Bullish
  🔴  MACD histogram contracting → momentum fading
  🟢  Price $342.32 &amp;gt; MA50 $308.57
  🟢  Price $342.32 &amp;gt; MA200 $276.80
  🟢  MA5 &amp;gt; MA10 &amp;gt; MA20 → Momentum stacked bullish
  ⚪  BB %B 0.81 → mid-band territory
  🟢  ADX 29.9 strong | DI+ 36 &amp;gt; DI- 16 → bullish trend
  ⚪  Volume 1.1× avg → average participation
  🔴  MFI 80.5 → overbought money flow

────────────────────────────────────────────────────────────────────
  📰 News Sentiment
────────────────────────────────────────────────────────────────────
  Source     : yfinance+GoogleRSS  (2006ms)
  Headlines  : 9 scored  /  11 filtered
  Ticker     : +0.1717  →  BULLISH  (58% confidence)
  Macro      : +0.1343  →  NEUTRAL  [gate: —]

  +           +0.000  Chicago Capital LLC Reduces Stock Holdings in Alphabet Inc
  +█          +0.158  Why Alphabet (GOOG, GOOGL) Is a Compelling AI Investment i
  +████       +0.486  Alphabet Stock (GOOG) Opinions on Upcoming Q1 Earnings and
  +███        +0.346  Tanager Wealth Management LLP Has $37.11 Million Stock Pos
  +           +0.000  Alphabet Inc. (GOOG) Laps the Stock Market: Here's Why
  +█          +0.175  Alphabet Inc. $GOOG Stock Holdings Lowered by Natural Inve
  +           +0.000  Lbp Am Sa Trims Stock Holdings in Alphabet Inc. $GOOG
  +███        +0.380  Is GOOG Stock a Buy Ahead of Q1 Earnings and Amid Fragile

────────────────────────────────────────────────────────────────────
  TECHNICAL VERDICT : 🟢 BUY 🟢  (score: +6)
  ADJUSTED VERDICT  : 🟢 BUY 🟢
  CONFIDENCE        : MEDIUM
  FUSION SIGNAL     : 🟢 BUY 🟢  (price + direction)

────────────────────────────────────────────────────────────────────
  🤖 Price Model  [Coral Edge TPU (price) ⚡  2.6ms]

  1-day    →  $ 343.32  ▲ 0.29%
  3-day    →  $ 344.93  ▲ 0.76%
  5-day    →  $ 346.44  ▲ 1.20%

  Day-trade (1d)  BUY → SELL  +0.29%
  Swing     (3d)  BUY → SELL  +0.76%
  Week      (5d)  BUY → SELL  +1.20%

────────────────────────────────────────────────────────────────────
  🧭 Direction Model  [Coral Edge TPU (direction) ⚡  2.6ms]

  1-day    →  ▲  52.7%  ██████████
  3-day    →  ▲  55.5%  ███████████
  5-day    →  ▲  56.6%  ███████████

────────────────────────────────────────────────────────────────────
  Key levels : MA50 $308.57  MA200 $276.80  52wH $344.90  52wL $152.80
════════════════════════════════════════════════════════════════════
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;592ms total latency — data fetch + 52-feature engineering + two TPU inferences. Results pushed to Telegram automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Remove BatchNorm from the start.&lt;/strong&gt; For quantized edge deployment, &lt;code&gt;BatchNormalization&lt;/code&gt; is a trap. The right design is &lt;code&gt;Conv1D(use_bias=True) → ReLU6&lt;/code&gt;. Pre-normalized inputs make BN redundant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Read the edgetpu compiler log immediately.&lt;/strong&gt; The compiler exits with code 0 even when only 2 of 40 ops map to the TPU. The &lt;code&gt;.log&lt;/code&gt; file it writes alongside the compiled model is the only way to know.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use weighted horizon agreement.&lt;/strong&gt; The fusion signal's &lt;code&gt;MIN_AGREEMENT=2&lt;/code&gt; gate treats all three horizons equally. The 1-day prediction is noisier than the 5-day but counts the same. A weighted agreement score matching prediction weights &lt;code&gt;[0.5, 0.3, 0.2]&lt;/code&gt; would be more accurate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add bear-regime training data.&lt;/strong&gt; The sell signal never triggered once in 250 backtest days. The training window skews bullish. Explicitly oversampling high-VIX / drawdown windows would help.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use FinBERT instead of VADER for sentiment.&lt;/strong&gt; VADER was designed for social media. Financial language ("impairment charge," "above consensus," "guidance raised") isn't in its vocabulary.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Project Is Actually For
&lt;/h2&gt;

&lt;p&gt;The technical goals were always secondary to three things:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A better framework for RSU selling decisions than my gut feeling.&lt;/strong&gt; Replacing "the stock feels extended" with "RSI is 70, direction model sees 52.7% up on 1-day but 56.6% on 5-day, price model predicts +1.20% over the week — this is not a signal to sell."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hands-on experience with ML systems at the hardware layer.&lt;/strong&gt; Understanding why BatchNorm breaks INT8 graphs, how subgraph splitting silently produces zeros, and why &lt;code&gt;quantization_parameters['scales'][0]&lt;/code&gt; vs &lt;code&gt;quantization[0]&lt;/code&gt; is the difference between a working model and a broken one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A concrete signal about whether quantitative finance is genuinely interesting.&lt;/strong&gt; The answer: yes, but the gap between "57% directional accuracy" and "beating buy-and-hold" is enormous. That gap is where the real research lives.&lt;/p&gt;




</description>
      <category>tensorflow</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
    <item>
      <title>I Taught a 4B Parameter LLM to Play Wordle on a Mac M4 (Using GRPO)</title>
      <dc:creator>Charbel</dc:creator>
      <pubDate>Tue, 13 Jan 2026 18:26:05 +0000</pubDate>
      <link>https://dev.to/charbull/i-taught-a-4b-parameter-llm-to-play-wordle-on-a-mac-m4-using-grpo-i9k</link>
      <guid>https://dev.to/charbull/i-taught-a-4b-parameter-llm-to-play-wordle-on-a-mac-m4-using-grpo-i9k</guid>
      <description>&lt;p&gt;DeepSeek-R1 changed the conversation. Their paper &lt;a href="https://arxiv.org/abs/2501.12948" rel="noopener noreferrer"&gt;"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But DeepSeek was trained on massive clusters. I have a &lt;strong&gt;MacBook Pro M4&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I spent a few weeks answering a specific question: &lt;strong&gt;Can we replicate this reasoning behavior on a consumer device, using a small model (Gemma-3 4B), without any supervised fine-tuning (SFT)?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I chose &lt;strong&gt;Wordle&lt;/strong&gt; as the testbed. While simple, it requires state tracking, hypothesis testing, and information theory—a perfect microcosm for testing "reasoning" capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why MLX? (The Technology Stack)
&lt;/h2&gt;

&lt;p&gt;I chose Apple's &lt;strong&gt;MLX&lt;/strong&gt; framework over PyTorch for three specific technical reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Unified Memory Access:&lt;/strong&gt; Training with GRPO requires generating multiple "rollouts" (completions) in parallel. On a standard GPU, moving these massive tensors between VRAM and RAM is a bottleneck. MLX is optimized for the M-series Unified Memory architecture, allowing zero-copy access to arrays.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;The Quantization Struggle:&lt;/strong&gt; In the PyTorch ecosystem, libraries like &lt;code&gt;bitsandbytes&lt;/code&gt; (crucial for loading models in 4-bit/8-bit) have historically had unstable support on Apple Silicon.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Forcing Local Constraints:&lt;/strong&gt; Using a cloud GPU is an "escape hatch." By forcing myself to train locally, I had to confront the actual hardware limits (bandwidth vs. capacity) that shape modern LLM architecture.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Challenge: "Straight-to-RL"
&lt;/h2&gt;

&lt;p&gt;Most RL pipelines start with &lt;strong&gt;Supervised Fine-Tuning (SFT)&lt;/strong&gt;. You show the model thousands of expert games, and &lt;em&gt;then&lt;/em&gt; use RL to polish the strategy.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I wanted to test the &lt;strong&gt;"Cold Start"&lt;/strong&gt; problem. Can a 4B parameter model learn the rules and strategy of Wordle &lt;em&gt;purely&lt;/em&gt; through trial and error, guided only by a reward function?&lt;/li&gt;
&lt;li&gt;I wanted to see if &lt;strong&gt;GRPO (Group Relative Policy Optimization)&lt;/strong&gt; could teach a model the rules &lt;em&gt;and&lt;/em&gt; the strategy simultaneously, purely from trial and error.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It turns out, skipping SFT with a 4B parameter model is a high-wire act.&lt;/p&gt;

&lt;h2&gt;
  
  
  1: The "Final Final" Loop (Reward Hacking)
&lt;/h2&gt;

&lt;p&gt;An RL agent does not learn what you &lt;em&gt;want&lt;/em&gt; it to learn; it learns what you &lt;em&gt;incentivize&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In my early runs, the model discovered a loophole. It realized that making a "bad guess" (a word that doesn't fit the clues) resulted in a penalty. But it also realized that if it just outputted garbage or looped the word &lt;code&gt;Final Final Final&lt;/code&gt; forever, the penalty was sometimes &lt;em&gt;less&lt;/em&gt; severe (or delayed).&lt;/p&gt;

&lt;p&gt;The model converged on a strategy of &lt;strong&gt;inaction&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix:&lt;/strong&gt; I had to engineer a &lt;code&gt;format_fail_penalty&lt;/code&gt; that was unequivocally the worst possible outcome (-200 reward). I effectively told the model: &lt;em&gt;"You can lose the game, but if you mess up the JSON format or refuse to play, you will regret it."&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2: Policy Collapse at Rank 64 vs Rank 16 with the same learning rate
&lt;/h2&gt;

&lt;p&gt;There is a misconception that "Higher Rank LoRA = Better."&lt;/p&gt;

&lt;p&gt;I initially tried training with a LoRA Rank of 64 and a standard learning rate. The result was a catastrophic &lt;strong&gt;Policy Collapse&lt;/strong&gt;. The win rate dropped to 0%, and the model's outputs degraded into gibberish.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Insight:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Model Sensitivity:&lt;/strong&gt; Smaller models (4B) are incredibly sensitive to hyperparameter swings compared to the massive reasoning models described in research papers.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Gradient Clipping:&lt;/strong&gt; This became non-negotiable. Without aggressive gradient clipping, the "Straight-to-RL" updates were too volatile, shattering the weights before they could settle.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Rank Reduction:&lt;/strong&gt; Dropping to Rank 16 stabilized the training. It forced the model to learn efficient updates rather than overfitting to the noise of early random exploration.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  3. The Hardware Bottleneck (KV Cache vs. M4)
&lt;/h2&gt;

&lt;p&gt;I am running this on an M4 Pro with 48GB of Unified Memory using the &lt;strong&gt;MLX&lt;/strong&gt; framework. During my training runs, my tokens-per-second would suddenly crash by 8x. I initially thought it was a memory leak in my code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Culprit: The KV Cache.&lt;/strong&gt;&lt;br&gt;
In GRPO, you generate multiple "rollouts" (completions) for every prompt to calculate the group advantage.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generating text is cheap.&lt;/li&gt;
&lt;li&gt;Generating text &lt;em&gt;inside a gradient tape&lt;/em&gt; with &lt;code&gt;num_generations=4&lt;/code&gt; is expensive.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On Apple Silicon, the &lt;strong&gt;Key-Value (KV) Cache&lt;/strong&gt; grows linearly with the group size. Each parallel generation requires its own massive cache. Once that cache filled the Unified Memory, the system fell back to heavy Swap Memory (20GB+ Swap Used), crippling performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Lesson:&lt;/strong&gt; If you are training locally, &lt;code&gt;num_generations&lt;/code&gt; is your most expensive hyperparameter. I had to tune the batch size and group size specifically to hover around 40GB RAM usage to prevent swapping.&lt;/p&gt;
&lt;h2&gt;
  
  
  4. Prompting: Symbols vs. English
&lt;/h2&gt;

&lt;p&gt;I originally fed the model raw Wordle grids (e.g., &lt;code&gt;'xxx✓x'&lt;/code&gt;). It struggled to track state. I switched to a structured text summary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;**Current Knowledge:**
*   **Correct Position (Green):** `A _ _ _ _`
*   **Wrong Position (Yellow):** 'O', 'R', 'T', 'U'
*   **Not in Word (Gray):** B, E, I, S
*   **Words Already Guessed:** ARISE, ABOUT
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Explicitly summarizing the state in natural language gave the model a "scratchpad" to reason from. It transformed the problem from "Visual Pattern Matching" to "Logical Deduction."&lt;/p&gt;

&lt;h2&gt;
  
  
  Results &amp;amp; Analysis
&lt;/h2&gt;

&lt;p&gt;My first attempt at training from Turn 1 (starting from scratch) failed. The 4B model was too "dumb" to stumble upon a winning strategy randomly.&lt;/p&gt;

&lt;p&gt;I implemented a &lt;strong&gt;Curriculum Strategy&lt;/strong&gt; to fix this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Single Guess History:&lt;/strong&gt; I first trained on prompts that already had one previous guess. This gave the model enough context to start learning basic constraints.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Random History (0-4 Turns):&lt;/strong&gt; Once the model stabilized, I expanded the dataset to include games with 0 to 4 turns of history.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By feeding the model synthetic data with random histories (0-4 turns), I created a "Zone of Proximal Development" where the model could actually learn.&lt;/p&gt;

&lt;p&gt;I evaluated the trained LoRA adapter against the base Gemma-3 model on 150 unseen games.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Win Rate Improvement (Zero-Shot)
&lt;/h3&gt;

&lt;p&gt;Without any game history (starting from scratch), the base model is effectively guessing randomly. The RL training provided a massive boost in reliability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F098gwa6nusl64vux4svr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F098gwa6nusl64vux4svr.png" alt="Win Rate Comparison" width="800" height="560"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Base Model:&lt;/strong&gt; 4.7% Win Rate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GRPO Trained:&lt;/strong&gt; 16.0% Win Rate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; A &lt;strong&gt;~3.4x improvement&lt;/strong&gt; in reasoning capability without seeing a single expert game.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. The Power of Context (With History)
&lt;/h3&gt;

&lt;p&gt;When provided with partial game history (e.g., entering the game at Turn 3), the model's ability to deduce the answer skyrocketed. This proves the model learned to &lt;strong&gt;utilize constraints&lt;/strong&gt; (Green/Yellow letters) rather than just memorizing words.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9owvha4gykw0tw9bp8x3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9owvha4gykw0tw9bp8x3.png" alt="Cumulative Wins With History" width="800" height="464"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GRPO Trained:&lt;/strong&gt; 31.3% Win Rate (Red Line)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Base Model:&lt;/strong&gt; 16.0% Win Rate (Blue Line)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Creativity vs. Consistency (Temperature)
&lt;/h3&gt;

&lt;p&gt;I benchmarked the model at Temperature 0.9 (Creative) vs. 0.1 (Deterministic).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Temp 0.1:&lt;/strong&gt; Consistently outperformed high temperature.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temp 0.9:&lt;/strong&gt; Win rates dropped significantly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Insight:&lt;/strong&gt; For logic/reasoning tasks, "creativity" is often detrimental. The model performs best when forced to be deterministic, reducing the chance of hallucinating a strategy that violates the rules.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Work &amp;amp; Comparison
&lt;/h2&gt;

&lt;p&gt;This project sits at the intersection of two recent approaches to "Reasoning" models:&lt;/p&gt;

&lt;p&gt;DeepSeek-R1 (Zero): Uses pure RL with sparse outcome rewards (Win/Loss). This often fails on small models because they never stumble onto the solution (the "Cold Start" problem).&lt;/p&gt;

&lt;p&gt;Supervised Reinforcement Learning (Deng et al., Oct 2025): Solves the Cold Start problem by using Expert Trajectories to provide dense, step-by-step rewards based on similarity to human reasoning.&lt;/p&gt;

&lt;p&gt;My Approach (Wordle-RL) takes a third path. I solved the Cold Start problem without Expert Trajectories (as required by Deng et al.). Instead of supervising with Data, I supervised with Information Theory.&lt;/p&gt;

&lt;p&gt;By calculating the Entropy of every guess, I generated the same kind of "Dense, Step-wise Rewards" that Deng et al. advocate for, but I did it using pure computation rather than human datasets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This project proves that we don't always need massive clusters to do interesting RL research.&lt;/p&gt;

&lt;p&gt;By combining Apple MLX for efficient local training and Heuristic Rewards (Entropy) as a substitute for expert data, I was able to train a small model to "reason" about game states. It learned to burn guesses to find vowel positions and navigate the trade-off between exploration and exploitation.&lt;/p&gt;

&lt;p&gt;The code is open source. If you have an M-series Mac, you can run this today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Code and Logs are available on GitHub: &lt;a href="https://github.com/charbull/wordle-rl-gemma" rel="noopener noreferrer"&gt;https://github.com/charbull/wordle-rl-gemma&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deng et al. (2025):&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2510.25992" rel="noopener noreferrer"&gt;Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning&lt;/a&gt; (An alternative approach using data instead of math for dense rewards).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek-R1:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2501.12948" rel="noopener noreferrer"&gt;Incentivizing Reasoning Capability in LLMs via Reinforcement Learning&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>deepseek</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>applesilicon</category>
    </item>
  </channel>
</rss>
