Charbel

Posted on May 13 • Edited on May 14

Building a Stock Advisor on a Coral Dev Board

#python #machinelearning #tensorflow #ai

A few months ago I set out to answer a simple question: can I build a scientific framework for deciding when to sell my Google RSUs instead of making decisions based on gut feeling?

The answer turned out to be "sort of, but the process taught me far more than the answer did." This post covers the full arc — hardware choices, architecture decisions, the bugs that kept predictions stuck at 0.00%, and finally a working system running at 2.5ms on the Edge TPU.

I also added a second model — a direction classifier that predicts whether price will go up or down — to complement the original price regression model. The dual-model results are instructive and sometimes humbling.

The Hardware Stack

I started with what I had: a Google Coral Dev Board sitting on my shelf. The Coral has an Edge TPU coprocessor connected to the CPU via PCIe — not the USB Accelerator version, the on-chip variant. It's discontinued hardware, but it's genuinely capable for what I needed.

HP Victus RTX 3050 — primary training environment
Coral Dev Board   → inference + sentiment (2W idle, always on)

The key insight that drove the architecture: you don't need the same hardware for training and inference. The Coral is terrible at training (no backprop support) but excellent at fast, cheap, power-efficient inference.

Why Conv1D and Not LSTM

The Coral TPU's supported op set is frozen at 2019. This matters enormously:

Operation	TPU Support	Notes
`CONV_2D`	✅ Full	Conv1D maps here
`ReLU6`	✅ Native	NOT regular ReLU
`GlobalAvgPool`	✅ Native
`BatchMatMul`	❌ CPU fallback	Kills LSTM, Transformers
`LayerNorm`	❌ CPU fallback	Kills BERT-family
`GELU`	❌ CPU fallback	Use ReLU6 instead

LSTM falls back to CPU because of BatchMatMul. FinBERT falls back to CPU because of LayerNorm. Conv1D runs 100% on-chip because it maps directly to CONV_2D. The practical result: 2.5ms on TPU vs ~300ms on the ARM CPU.

The Feature Set: 52 Indicators Across 7 Groups

The input is a 60-day window of 52 features per day, computed from OHLCV data for the target ticker plus SPY (market proxy) and VIX (fear gauge):

Group 1 – Price/Volume (5)      close_norm, OHLC ratios, volume deviation
Group 2 – Returns & RVol (6)    1d/5d/20d returns, log-return, realized vol
Group 3 – Momentum (11)         RSI×3, Stochastic K/D, Williams%R, MFI, CCI, ROC×3
Group 4 – MACD family (4)       line, signal, histogram, histogram delta
Group 5 – Trend & MAs (12)      close vs MA5/10/20/50/100/200, Bollinger, ATR, ADX, DI+/-
Group 6 – Volume (4)            OBV, vol ratio, CMF, vol momentum
Group 7 – Market context (10)   SPY returns, VIX z-score, relative strength,
                                 calendar cyclicals, 52w high/low distances

The price model outputs three log-return predictions for 1-day, 3-day, and 5-day forward closes. The direction model outputs three up-probabilities for the same horizons.

Bug 1: The Scaler That Refitted Itself

For weeks the model was outputting this:

1-day    →  $ 314.74  ▼ 0.00%
3-day    →  $ 314.74  ▼ 0.00%
5-day    →  $ 314.74  ▼ 0.00%

The inference code was silently fitting a brand new RobustScaler on 2 years of current data when the scaler file wasn't found:

# BUG — silently refits if the file doesn't exist
if not os.path.exists(SCALER_PATH):
    scaler = RobustScaler()
    scaler.fit(feat.values)  # ← fits on 2 years of live data

The model was trained with a scaler fit on 10 years of data across 30 tickers. Different statistics, different scaling — the model received garbage inputs and output zeros.

# Fix — crash loudly instead of silently producing wrong results
if not os.path.exists(SCALER_PATH):
    raise FileNotFoundError(
        "Scaler not found. Copy price_model_scaler_params.npz from your "
        "training machine. Never refit the scaler at inference time."
    )

Bug 2: GlobalAveragePooling1D vs Flatten

Using Flatten instead of GlobalAveragePooling1D caused only 2 of 40 ops to run on the TPU:

# WRONG — Flatten breaks the TPU execution graph
x = tf.keras.layers.Flatten()(x)

# RIGHT — GlobalAveragePooling1D maps to MEAN (TPU-native)
x = tf.keras.layers.GlobalAveragePooling1D()(x)

Bug 3: BatchNormalization Splits the Graph

Even after fixing the above, the edgetpu-compiled model output all-zeros. The edgetpu_compiler log revealed why:

DEQUANTIZE         1   Operation is working on an unsupported data type
CONV_2D            1   Mapped to Edge TPU
CONV_2D            4   More than one subgraph is not supported
FULLY_CONNECTED    3   More than one subgraph is not supported
MAX_POOL_2D        1   More than one subgraph is not supported

Only 2 ops out of ~20 ran on the TPU. BatchNormalization uses float32 accumulators. When TFLite quantizes the graph it inserts a DEQUANTIZE node — and DEQUANTIZE is unsupported on the Edge TPU. This creates a subgraph boundary. The TPU runs everything before the first DEQUANTIZE (one Conv), and everything after (pooling, dense layers, output) runs on CPU with uninitialized output quantization (scale=0.0, zp=0), which dequantizes to all-zeros.

Fix: remove BatchNormalization entirely and switch use_bias=False → use_bias=True. The inputs are already RobustScaler-normalized, so BN isn't needed for stability. ReLU6 keeps activations bounded for INT8.

# Before — BatchNorm causes DEQUANTIZE → subgraph split → zeros on TPU
x = tf.keras.layers.Conv1D(32, 3, padding="same", use_bias=False)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Activation("relu6")(x)

# After — clean graph, 100% TPU execution
x = tf.keras.layers.Conv1D(32, 3, padding="same", use_bias=True)(x)
x = tf.keras.layers.Activation("relu6")(x)

After this change, the compiler log became all Mapped to Edge TPU.

Bug 4: Reading the Wrong Quantization Scale

Even with all ops on the TPU, inputs showed Std: 28.89 | Unique Levels: 149 — meaning values were being crushed to the INT8 boundary. The model was receiving a barcode of extreme values instead of a price chart.

The cause: reading the input scale from the wrong field.

# WRONG — reads per-channel WEIGHT scales of the first Conv layer
p = in_d['quantization_parameters']
sc = p['scales'][0]   # a tiny weight-magnitude value like 0.003

# RIGHT — reads the per-tensor INPUT scale from the calibration dataset
sc, zp = in_d['quantization']   # correctly ~0.039 (= 5/127)

quantization_parameters['scales'] is an array of per-channel weight scales — one per Conv filter. quantization is the plain (scale, zero_point) 2-tuple the TFLite INT8 converter computes from the representative calibration data for the input tensor. Using the weight scale to quantize a [-5, 5] input means a value of 1.0 quantizes to 1.0/0.003 = 333, clips to 127, and 90%+ of the input space collapses to the boundary. After the fix: Std: 24.32 | Unique Levels: 152. Real predictions.

Multi-Ticker Training: Why 30 Stocks Instead of 1

Training only on GOOGL gives ~2,300 bars — thin for a 60-day sequence model. Training on 30 tickers gives 55,560 sequences and forces the model to learn generalizable price dynamics rather than GOOGL-specific patterns.

DEFAULT_TICKERS = [
    "GOOGL", "AAPL", "MSFT", "NVDA", "META", "AMZN", "TSLA",  # mega-cap tech
    "JPM", "BAC", "GS", "V", "MA",                              # financials
    "JNJ", "UNH", "PFE", "ABBV",                                # healthcare
    "XOM", "CVX",                                               # energy
    "WMT", "HD", "CAT", "UPS",                                  # consumer/industrial
    "AMD", "INTC", "TSM",                                       # semiconductors
    "XLK", "XLF", "XLE", "XLV", "SPY",                         # sector ETFs
]

Preventing Data Leakage: The Embargo Gap

Adjacent sequences in a sequence model share almost all their data. Sequence 100 uses days 40–99; sequence 101 uses days 41–100. A standard train/val split puts these in different sets, creating look-ahead leakage. The fix:

EMBARGO = SEQ_LEN  # must be >= SEQ_LEN

split       = int(len(X) * 0.85)
train_end   = split - EMBARGO
val_start   = split + EMBARGO

X_train = X[:train_end]
X_val   = X[val_start:]

Adding a Direction Model

The price model can cheat by predicting "slightly positive" for everything and still minimize MAE on bull market data. A direction model predicts binary up/down, which is harder to game:

# Price model: linear head, Huber loss
out = tf.keras.layers.Dense(3, name="price_output")(x)
loss_fn = tf.keras.losses.Huber(delta=0.5)

# Direction model: sigmoid head, binary cross-entropy
out = tf.keras.layers.Dense(3, activation="sigmoid", name="direction_output")(x)
loss_fn = "binary_crossentropy"

Both models train in one command with --mode both, sharing the same dataset and producing all deployment artifacts including automatic Edge TPU compilation.

The Walk-Forward CV Results

Price model CV Mean : 1d=53.5%  3d=56.4%  5d=57.7%
Direction model CV  : 1d=52.0%  3d=55.5%  5d=55.9%

Held-out val:
  Price      1d=53.0%  3d=56.8%  5d=57.6%
  Direction  1d=52.6%  3d=56.2%  5d=57.6%

Both models cross 54% on 5-day, which is the threshold that indicates a real edge. Results are consistent across all 4 folds with no suspicious outlier fold.

The Backtest Results

Mode            ROI      Ann. ROI   Sharpe   Drawdown   Trades   Win Rate
─────────────────────────────────────────────────────────────────────────
Price only    +2.63%    +1.34%     -0.72    -5.96%       14      57.1%
Direction     +16.48%   +8.13%     +0.36   -11.68%       30      46.7%
Fusion        +2.76%    +1.40%     -0.99    -5.80%       10      40.0%
─────────────────────────────────────────────────────────────────────────
Buy & Hold   +98.95%

All three modes underperform buy-and-hold on GOOGL over 3 years. This is the right conclusion for RSU decisions: in a sustained bull trend the default should be to hold, and the bar for the model to recommend a sale should be high. The system's value is in providing a rigorous framework for when to deviate from holding, not in trading actively.

The System Running Live

After fixing all four bugs, both models run on the Edge TPU at 2.5ms each:

════════════════════════════════════════════════════════════════════
  📈  GOOG Advisor  |  Coral Edge TPU Dev Board  [FUSION mode]
  2026-04-27 07:03:58
════════════════════════════════════════════════════════════════════
  Last close : $342.32  ▲ 4.57 (1.35%)  [592ms]

────────────────────────────────────────────────────────────────────
  📊 Technical Analysis  (18 indicators)
────────────────────────────────────────────────────────────────────
  🟢  RSI-14 69.7 → Above midline
  🟢  RSI trend +4.6 → accelerating upward
  🟢  MACD 10.17 > Signal 7.43 → Bullish
  🔴  MACD histogram contracting → momentum fading
  🟢  Price $342.32 > MA50 $308.57
  🟢  Price $342.32 > MA200 $276.80
  🟢  MA5 > MA10 > MA20 → Momentum stacked bullish
  ⚪  BB %B 0.81 → mid-band territory
  🟢  ADX 29.9 strong | DI+ 36 > DI- 16 → bullish trend
  ⚪  Volume 1.1× avg → average participation
  🔴  MFI 80.5 → overbought money flow

────────────────────────────────────────────────────────────────────
  📰 News Sentiment
────────────────────────────────────────────────────────────────────
  Source     : yfinance+GoogleRSS  (2006ms)
  Headlines  : 9 scored  /  11 filtered
  Ticker     : +0.1717  →  BULLISH  (58% confidence)
  Macro      : +0.1343  →  NEUTRAL  [gate: —]

  +           +0.000  Chicago Capital LLC Reduces Stock Holdings in Alphabet Inc
  +█          +0.158  Why Alphabet (GOOG, GOOGL) Is a Compelling AI Investment i
  +████       +0.486  Alphabet Stock (GOOG) Opinions on Upcoming Q1 Earnings and
  +███        +0.346  Tanager Wealth Management LLP Has $37.11 Million Stock Pos
  +           +0.000  Alphabet Inc. (GOOG) Laps the Stock Market: Here's Why
  +█          +0.175  Alphabet Inc. $GOOG Stock Holdings Lowered by Natural Inve
  +           +0.000  Lbp Am Sa Trims Stock Holdings in Alphabet Inc. $GOOG
  +███        +0.380  Is GOOG Stock a Buy Ahead of Q1 Earnings and Amid Fragile

────────────────────────────────────────────────────────────────────
  TECHNICAL VERDICT : 🟢 BUY 🟢  (score: +6)
  ADJUSTED VERDICT  : 🟢 BUY 🟢
  CONFIDENCE        : MEDIUM
  FUSION SIGNAL     : 🟢 BUY 🟢  (price + direction)

────────────────────────────────────────────────────────────────────
  🤖 Price Model  [Coral Edge TPU (price) ⚡  2.6ms]

  1-day    →  $ 343.32  ▲ 0.29%
  3-day    →  $ 344.93  ▲ 0.76%
  5-day    →  $ 346.44  ▲ 1.20%

  Day-trade (1d)  BUY → SELL  +0.29%
  Swing     (3d)  BUY → SELL  +0.76%
  Week      (5d)  BUY → SELL  +1.20%

────────────────────────────────────────────────────────────────────
  🧭 Direction Model  [Coral Edge TPU (direction) ⚡  2.6ms]

  1-day    →  ▲  52.7%  ██████████
  3-day    →  ▲  55.5%  ███████████
  5-day    →  ▲  56.6%  ███████████

────────────────────────────────────────────────────────────────────
  Key levels : MA50 $308.57  MA200 $276.80  52wH $344.90  52wL $152.80
════════════════════════════════════════════════════════════════════

592ms total latency — data fetch + 52-feature engineering + two TPU inferences. Results pushed to Telegram automatically.

What I'd Do Differently

Remove BatchNorm from the start. For quantized edge deployment, BatchNormalization is a trap. The right design is Conv1D(use_bias=True) → ReLU6. Pre-normalized inputs make BN redundant.

Read the edgetpu compiler log immediately. The compiler exits with code 0 even when only 2 of 40 ops map to the TPU. The .log file it writes alongside the compiled model is the only way to know.

Use weighted horizon agreement. The fusion signal's MIN_AGREEMENT=2 gate treats all three horizons equally. The 1-day prediction is noisier than the 5-day but counts the same. A weighted agreement score matching prediction weights [0.5, 0.3, 0.2] would be more accurate.

Add bear-regime training data. The sell signal never triggered once in 250 backtest days. The training window skews bullish. Explicitly oversampling high-VIX / drawdown windows would help.

Use FinBERT instead of VADER for sentiment. VADER was designed for social media. Financial language ("impairment charge," "above consensus," "guidance raised") isn't in its vocabulary.

What This Project Is Actually For

The technical goals were always secondary to three things:

A better framework for RSU selling decisions than my gut feeling. Replacing "the stock feels extended" with "RSI is 70, direction model sees 52.7% up on 1-day but 56.6% on 5-day, price model predicts +1.20% over the week — this is not a signal to sell."

Hands-on experience with ML systems at the hardware layer. Understanding why BatchNorm breaks INT8 graphs, how subgraph splitting silently produces zeros, and why quantization_parameters['scales'][0] vs quantization[0] is the difference between a working model and a broken one.

A concrete signal about whether quantitative finance is genuinely interesting. The answer: yes, but the gap between "57% directional accuracy" and "beating buy-and-hold" is enormous. That gap is where the real research lives.

DEV Community