DEV Community: Allen Elzayn

Building a Trading Bot That Could Turn $10K into $102K: xLSTM (DL) + PPO (RL)

Allen Elzayn — Sun, 29 Mar 2026 15:36:07 +0000

My trading bot lost $176 in its first real backtest.

Not because of a bug. Not because of bad data. The algorithm was working exactly as designed it just couldn't figure out when to exit trades.

The bot would enter positions with 48.6% accuracy (better than random), hold them for an average of 27 bars, and then... panic. It would close winning trades too early and hold losing trades too long. Classic human behavior, except this was supposed to be an emotionless machine.

That was Run 4. Two runs later (Run 5 and Run 6B), I had a system that generated $507 profit on completely unseen 2024-2025 data (1.87 years, 45,246 bars), with a Sharpe ratio of 6.94 and max drawdown of 0.98%.

For perspective: With proper position sizing (Half Kelly), that same system could turn $10K into $102K over the same period. Compare that to:

Savings account (5% APY): $11,025
S&P 500 (11% avg): $12,321
Hedge funds (12%): $12,544

This is the story of Amertume a gold trading bot built with xLSTM (Extended Long Short-Term Memory) and PPO (Proximal Policy Optimization) that combines deep learning and reinforcement learning.

Why I Built This

I wanted to build a trading system that could pass prop firm evaluations not because I'm obsessed with trading, but because it's a perfect testbed for combining deep learning and reinforcement learning.

The constraint is simple: make 10% profit without losing more than 5% in drawdown. But the challenge is hard: 97% of traders fail.

This became my design goal: build a system that survives volatility without blowing up.

Why Most Trading Bots Fail (And Why Mine Did Too)

Before Amertume, I tried everything:

Run 1: LSTM models with basic features (overtrading problem - 1981 trades, -$867 loss)
Run 2: Fixed transaction costs (oscillated between 9-983 trades, unstable)
Run 3: Better xLSTM encoder with focal loss (hold exploit - avg 41 bars, always hitting max time)

They all had the same core problems:

Overtrading: Run 1 executed 1981 trades in training because transaction costs were invisible (0.00004 vs 0.01 log returns)
Hold Exploit: Run 2-3 learned to hold positions for exactly 60 bars (max time limit) instead of exiting naturally
Exit Paralysis: Run 4 became too selective (only 37 trades in 1.87 years) but still lost money because it didn't know when to close

But there was a deeper problem I discovered: 1-minute data is too noisy.

The 1-Minute → 15-Minute Pivot

My first 4 encoder training attempts used 1-minute OHLCV data. The results were terrible:

Encoder v1-v4 (1-minute data):

Accuracy: 50.3% (coin flip)
Problem: Model just memorized training data
Insight: Predicting next 1-minute move is basically random noise

Why 1-minute failed:

Gold moves $0.10-$0.50 per minute (mostly noise)
News events cause instant spikes (unpredictable)
Spread costs eat profits on short timeframes
ATR(14) on 1-min = only 14 minutes of context

Encoder v5+ (15-minute data):

Validation accuracy: 42.3%
Test accuracy: 41.9% (8.6% edge over random 33.3%)
3-class classification: UP/DOWN/NEUTRAL (random baseline = 33.3%)
ATR(14) on 15-min = 3.5 hours of context
Filters out microstructure noise
Captures actual momentum moves

The math:

1-min: 1440 bars/day → 99% noise, 1% signal
15-min: 96 bars/day → 70% noise, 30% signal

Switching to 15-minute was the breakthrough that made xLSTM encoder actually work.

The bot needed to understand: "Is this a breakout I should chase, or noise I should ignore?"

That's where xLSTM comes in.

What is xLSTM

xLSTM is the 2024 evolution of LSTM, created by Sepp Hochreiter (the guy who invented LSTM in 1997).

The key innovation: Instead of just remembering sequences, xLSTM has two types of memory:

sLSTM (scalar memory): Tracks single values over time with exponential gating
- Perfect for: price momentum, volatility regimes, trend strength
mLSTM (matrix memory): Stores relationships between multiple features
- Perfect for: correlations (DXY vs Gold), multi-timeframe patterns

Why xLSTM (not XGBoost, Random Forest, or Transformer)?

XGBoost & Random Forest are powerful for tabular data but struggle with temporal dependencies. Tree-based models make predictions by averaging values in leaf nodes if the test data falls outside the training range (common in financial markets), they simply return the nearest leaf's average. This "extrapolation ceiling" is fatal for trading, where regime changes and unprecedented volatility are the norm.

Transformers solve the extrapolation problem but introduce computational overhead that's prohibitive for real-time trading. Research on self-attention computational complexity shows that Transformers require quadratic memory (O(n²)) relative to sequence length due to self-attention mechanisms. For a 60-bar window with 25 features (1,500 tokens), attention matrices explode to 2.25 million parameters per layer.

Why xLSTM wins for trading:

xLSTM processes sequentially, updating its memory state bar-by-bar. It can handle infinite context without exploding memory, and it naturally captures temporal dependencies.

For financial time series, this translates to:

Better regime detection (remembers volatility patterns from 1000+ bars ago)
Faster inference (linear complexity vs. quadratic for Transformers)
Natural extrapolation (unlike tree-based models, can predict beyond training ranges)
Less overfitting (sequential processing = natural regularization)

The Architecture: xLSTM + PPO + Triple Barrier

Here's how Amertume works:

Raw OHLCV (15-min gold prices)
    ↓
Feature Engineering (25 features)
    ↓
xLSTM Encoder (frozen, pre-trained)
    ↓
128-dim embedding (market state)
    ↓
PPO Agent (trainable)
    ↓
Action: BUY / SELL / HOLD

Why this architecture is hard to replicate:

The magic isn't in any single component it's in how they're wired together:

xLSTM encoder is pre-trained separately (7 training runs, 22 epochs, Focal Loss with gamma=2.0)
Then frozen (no gradients during RL training)
PPO learns on top of frozen embeddings (not end-to-end)
Curriculum learning (3 stages, each with different volatility filtering)
Triple Barrier exits (agent can't close positions manually)

Each piece alone is standard. The combination + training procedure is what makes it work.

Want to See the Full System?

This is just the beginning. The full blog post covers:

Complete Architecture Breakdown

Feature engineering pipeline (25 features from raw OHLCV)
xLSTM pre-training with Triple Barrier labeling
PPO training with curriculum learning (calm → mixed → full volatility)
Dynamic ATR Triple Barrier (2:1 RR) implementation

The 6 Failed Runs

Run 1: Overtrading disaster (1981 trades, -$867)
Run 2-3: Hold exploit (agent gaming the time limit)
Run 4: Exit paralysis (48.6% entry accuracy but -$176 loss)
Run 5: EV trap (agent refused to trade)
Run 6: The breakthrough (6.94 Sharpe, 0.98% drawdown)

Kelly Criterion Position Sizing

Why the $507 PnL is deliberately conservative (0.01 micro-lot stress test)
Projections with proper position sizing:
- 1% risk: $10K → $18K (81.6% return)
- 2% risk: $10K → $30K (206% return)
- Half Kelly: $10K → $102K (924% return)
The brutal truth about drawdowns and sleep quality

Academic Comparison

How Amertume compares to recent papers
Kalman-Enhanced DRL: 13.12 Sharpe (vs my 6.94)
Why action space reduction is underappreciated
Statistical significance analysis (294 trades, 95% CI)

Production Deployment

Live testing on demo account
Safety features (kill-switches, latency checks)
What could go wrong (overfitting, regime change, execution issues)

Full References

20+ academic papers cited
xLSTM, PPO, Focal Loss, Triple Barrier, Kelly Criterion
Comparison papers on tree-based vs deep learning

Read the full post on my blog

Disclaimer: This is educational content about machine learning and trading system design. Trading involves substantial risk of loss. I am not a financial advisor. Do your own research and never risk money you can't afford to lose.

Forge: Lightweight, Fast, and Reliable Local CI/CD

Allen Elzayn — Mon, 15 Dec 2025 16:28:10 +0000

Originally published at allenarch.dev

here's a frustrating moment every developer has experienced: push code to GitHub, wait for GitHub Actions to run, realize there's a typo in the config file. Push again, wait again. Repeat until the CI/CD pipeline finally passes.

The problem? We can't easily test pipelines locally. GitHub Actions only runs in the cloud. GitLab CI is the same. CircleCI? You need to set up the project first. Everything requires pushing to remote, which means unnecessary delays.

That's why I built Forge - a lightweight, fast, and reliable local CI/CD tool. Built with Rust for performance, runs using Docker, and has syntax similar to GitHub Actions so you don't need to learn a new format.

Why Do We Need Local CI/CD?

Imagine this scenario: you have a monorepo with 5 different services. Each service has its own test suite. Before pushing, you need to:

Run tests for all services manually
Build each service
Lint check
Format check
And many more

Or... you could push first, wait for cloud CI/CD, then find out something failed. Waste of time.

Forge solves this simply: run the same pipeline as GitHub Actions, but locally. Test before pushing. Fix before committing. Save time, save frustration.

Architecture: Rust + Docker = Speed + Isolation

Forge is built with Rust for performance and reliability. It ships as a single binary that can run without additional dependencies (except Docker).

Docker is used for isolation. Each step in the pipeline runs in a separate container, so:

Dependencies don't conflict between steps
Environment can be customized per step
Execution results are consistent and reproducible

Three-Tier Architecture

Forge uses a three-layer architecture:

1. Main Orchestrator (src/main.rs)

Parse CLI commands
Load and validate YAML config
Resolve dependencies between stages
Schedule execution (sequential or parallel)

2. Container Management (src/container.rs)

Docker abstraction layer
Image pulling and caching
Container lifecycle management
Volume mounting for workspace
Real-time log streaming

3. Execution Engine

Stage dependency resolution (topological sort using Kahn's algorithm)
Parallel execution with proper log aggregation
Cache key derivation from lockfiles
Secrets management and environment variable injection

Familiar Syntax: YAML Like GitHub Actions

One important design decision: Forge uses syntax similar to GitHub Actions and GitLab CI. Why? Because developers are already familiar with this format. No need to learn new syntax.

# forge.yaml
cache:
  enabled: true
  directories:
    - /workspace/node_modules

stages:
  - name: install
    steps:
      - name: Install dependencies
        image: node:20
        command: npm ci
        working_dir: /workspace

  - name: test
    depends_on:
      - install
    steps:
      - name: Run tests
        image: node:20
        command: npm test
        working_dir: /workspace
        env:
          NODE_ENV: test

  - name: build
    depends_on:
      - test
    steps:
      - name: Build application
        image: node:20
        command: npm run build
        working_dir: /workspace

Simple, clean, and immediately understandable by anyone who has used GitHub Actions.

Key Features

1. Container Isolation

Each step runs in its own Docker container. This means:

Dependencies don't interfere with each other
Can use different images per step (Node.js for frontend, Python for ML, Rust for backend)
Clean environment every time you run

2. Parallel Execution

Stages that don't depend on each other can run in parallel. Forge automatically detects dependencies and schedules execution optimally.

stages:
  - name: lint-frontend
    steps: [...]

  - name: lint-backend
    steps: [...]

  - name: build
    depends_on:
      - lint-frontend
      - lint-backend
    steps: [...]

lint-frontend and lint-backend will run simultaneously, then build runs after both complete.

3. Intelligent Caching

Forge has a smart caching system. Cache is configured at the top level of forge.yaml and applies to all stages. Cache keys are automatically derived from lockfiles (package-lock.json, Cargo.lock, requirements.txt, etc.), so cache automatically invalidates when dependencies change.

cache:
  enabled: true
  directories:
    - /workspace/node_modules
    - /workspace/.next/cache

Cache is stored locally at ./.forge/cache/ and automatically managed by Forge. Cache keys are derived from lockfiles (package-lock.json, Cargo.lock, etc.), so cache automatically invalidates when dependencies change.

4. Secrets Management

Forge supports secure secrets management. Secrets are defined in forge.yaml and loaded from host environment variables (or .env files). They are automatically injected into all containers as environment variables.

secrets:
  - name: API_KEY
    env_var: FORGE_API_KEY

stages:
  - name: deploy
    steps:
      - name: Deploy
        image: alpine:latest
        command: deploy.sh
        env:
          API_KEY: ${{ secrets.API_KEY }}

5. Real-Time Logging

Logs from each step are streamed in real-time with proper formatting. If a step fails, Forge immediately stops execution (fail-fast) and displays a clear error.

6. Multi-Language Support

Forge isn't tied to one programming language. Since it uses Docker, you can use any image:

node:20 for Node.js
rust:1.75 for Rust
python:3.11 for Python
golang:1.21 for Go
Or your own custom image

Dependency Resolution: Topological Sort

One feature I'm most proud of is dependency resolution. Forge uses Kahn's algorithm for topological sorting, so:

Circular dependencies are detected before execution
Self-dependencies are detected and rejected
Missing dependencies are detected with clear error messages
Optimal execution order

stages:
  - name: A
    depends_on: []

  - name: B
    depends_on: [A]

  - name: C
    depends_on: [A]

  - name: D
    depends_on: [B, C]

Forge will execute: A → (B, C parallel) → D.

Use Cases: From Simple Scripts to Monorepos

Simple Node.js Project

cache:
  enabled: true
  directories:
    - /workspace/node_modules

stages:
  - name: install
    steps:
      - name: Install
        image: node:20
        command: npm ci
        working_dir: /workspace

  - name: test
    depends_on: [install]
    steps:
      - name: Test
        image: node:20
        command: npm test
        working_dir: /workspace

  - name: build
    depends_on: [test]
    steps:
      - name: Build
        image: node:20
        command: npm run build
        working_dir: /workspace

Rust Project with Cargo

cache:
  enabled: true
  directories:
    - /workspace/target

stages:
  - name: test
    steps:
      - name: Run tests
        image: rust:1.75
        command: cargo test --all-features
        working_dir: /workspace

  - name: build
    depends_on: [test]
    steps:
      - name: Build release
        image: rust:1.75
        command: cargo build --release
        working_dir: /workspace

Multi-Language Monorepo

stages:
  - name: frontend-test
    steps:
      - name: Test frontend
        image: node:20
        working_dir: /workspace/frontend
        command: npm test

  - name: backend-test
    steps:
      - name: Test backend
        image: python:3.11
        working_dir: /workspace/backend
        command: pytest

  - name: build-all
    depends_on: [frontend-test, backend-test]
    steps:
      - name: Build frontend
        image: node:20
        working_dir: /workspace/frontend
        command: npm run build

      - name: Build backend
        image: python:3.11
        working_dir: /workspace/backend
        command: python setup.py build

Performance: Fast Thanks to Caching and Parallel Execution

Forge is designed to be fast. The combination of smart caching and parallel execution means pipelines can complete in seconds for small projects, or minutes for large monorepos.

Benchmark from my own project:

Without cache: ~2 minutes (download dependencies, compile, test)
With cache: ~30 seconds (only compile and test)
Parallel stages: ~20 seconds (multiple stages running simultaneously)

For monorepos with 10+ services, parallel execution can cut execution time from 15 minutes to 5 minutes.

Installation: Simple and Cross-Platform

Forge can be installed in several ways:

Pre-compiled binaries (recommended):

# Linux
curl -L https://github.com/0xReLogic/Forge/releases/latest/download/forge-linux-amd64 -o forge
chmod +x forge
sudo mv forge /usr/local/bin/

# macOS
curl -L https://github.com/0xReLogic/Forge/releases/latest/download/forge-macos-amd64 -o forge
chmod +x forge
sudo mv forge /usr/local/bin/

# Windows (PowerShell)
Invoke-WebRequest -Uri "https://github.com/0xReLogic/Forge/releases/latest/download/forge.exe" -OutFile "forge.exe"

Via Cargo:

cargo install forge

From source:

git clone https://github.com/0xReLogic/Forge
cd Forge
cargo build --release

The only dependency is Docker. If Docker is already installed, Forge is ready to use.

Workflow: Init, Validate, Run

Forge has three main commands:

1. forge init
Creates a new forge.yaml with a template that can be used immediately.

2. forge validate
Validates config without executing the pipeline. Useful for checking syntax before committing.

3. forge run
Executes the pipeline. Supports several flags:

--file <FILE>: Use a custom configuration file (default: forge.yaml)
--stage <STAGE>: Run a specific stage and its dependencies
--cache: Force enable caching
--no-cache: Force disable caching
--verbose: Enable verbose output with performance metrics
--dry-run: Validate and preview execution without running containers

# Initialize new project
forge init

# Validate config
forge validate

# Run full pipeline
forge run

# Run specific stage
forge run --stage test

# Preview execution without running
forge run --dry-run

# Run without cache
forge run --no-cache

# Run with verbose output
forge run --verbose

Error Handling: Fail-Fast with Clear Messages

Forge has comprehensive error handling:

Validation errors: Detected before execution, with clear messages about what's wrong
Dependency errors: Circular dependencies, self-dependencies, missing dependencies are all detected
Execution errors: If a step fails, Forge immediately stops and displays an error with clear context

Every error message is designed to help developers fix issues quickly, not frustrate them with cryptic error messages.

Security: Secrets and Container Isolation

Security is a priority. Forge uses several strategies:

Container isolation: Each step runs in a separate container, so there's no cross-contamination
Secrets management: Secrets are not logged, only injected into containers that need them
No network access: Containers don't have network access unless explicitly needed (can be configured)

Roadmap: The Future of Forge

Forge is still in early development, but stable enough for production use. Some features being considered:

Remote execution: Execute pipelines on remote servers or cloud
Web UI: Dashboard to monitor pipeline execution
Plugin system: Support for custom plugins
Integration: Integration with GitHub Actions, GitLab CI, etc.

But for now, Forge is already powerful enough to handle most local CI/CD use cases.

Conclusion: Local CI/CD That Should Have Existed Long Ago

Forge is a tool that should have existed long ago. Local CI/CD should be standard practice, not an exception. With Forge, developers can:

Test pipelines before pushing
Debug issues locally without waiting for cloud CI/CD
Save time and reduce frustration
Maintain consistency between local and remote environments

Built with Rust for performance, using Docker for isolation, and familiar syntax for good developer experience. Lightweight, fast, and reliable.

If you've ever been frustrated by having to push-push-push just to test CI/CD config, or if you have a monorepo that needs a complex test suite, Forge might be worth a try.

Repository: github.com/0xReLogic/Forge

Forge is an open-source project still in active development. Contributions, feedback, and bug reports are very welcome!

Resources:

Further Reading:

Connect

GitHub: @0xReLogic
LinkedIn: Allen Elzayn

MCP x Cursor PoC: Rogue MCP Servers, IDE Browsers, and Real Defenses

Allen Elzayn — Thu, 13 Nov 2025 18:06:50 +0000

Originally published at allenarch.dev

A proof‑of‑concept published this week shows how a malicious Model Context Protocol (MCP) server can inject JavaScript into the built‑in browser of an AI IDE (e.g., Cursor/Windsurf/VsCode/ClaudeCode) and then leverage the editor's privileges for system actions. This post avoids rehash and focuses on what practitioners need: a clear threat model, a safe lab plan to reproduce (without destructive payloads), generic detections you can wire into your telemetry today, and a security baseline for MCP deployments.

Key context:

PoC impact (news + demo): Rogue MCP servers can replace login pages inside Cursor's in‑IDE browser and harvest credentials; the same capability can lead to workstation compromise (CSO Online, Nov 13, 2025; Knostic deep‑dive, Nov 5, 2025).
Standard & ecosystem: MCP is an open protocol that connects LLM apps to tools/data; clients/servers exist across vendors (Anthropic/GitHub/OSS). Misconfigurations and weak session handling have already produced CVEs and field‑grade abuse paths.
Electron reality: In‑IDE browsers are Electron webviews. Security settings (contextIsolation, sandbox, no Node for remote, strict CSP) often decide whether a UI compromise stays in the renderer or escalates to the OS.

Threat model: where this exploit actually lives

Boundaries
- LLM/Agent ↔ MCP client ↔ MCP server ↔ IDE (Electron) ↔ in‑IDE browser ↔ OS.
Assets
- Source code, credentials/tokens, SSH material, organization secrets, local files, IDE privileges.
Adversary paths
- Malicious or hijacked MCP server returns active content (HTML/JS) to the in‑IDE browser; content manipulates UI flows (credential capture) or tries to escape the renderer via weak Electron settings (e.g., enabled Node, missing isolation, permissive IPC/preload).
- Related ecosystem issues raise risk: predictable/reused MCP session IDs (CVE‑2025‑6515) enable session takeover; client‑side command injection (CVE‑2025‑6514) abuses crafted authorization URLs.
Assumptions
- Developers add third‑party MCP servers for convenience; defaults may be permissive.
- Teams lack network egress controls for IDE processes; logs from Electron/IPC are sparse by default.

Reproducing it safely

Goal: Reproduce the class of issue and validate mitigations without executing real malware.

Environment
- Disposable VM (no prod creds), snapshot enabled.
- IDE under test: Cursor or Windsurf (latest), default + hardened variants.
- Mock MCP server under your control; two profiles: benign and "attacker."
- Instrumentation: process tree, file I/O events, network egress (domain/port), IPC/Electron warnings.
Procedure 1) Wire IDE to the benign MCP server; confirm normal flows. 2) Switch to attacker server that returns a crafted HTML page (login impostor + harmless JS markers). 3) Observe: in‑IDE browser navigation, form capture attempts, outbound requests, attempts to use special protocols or preload bridges. 4) Flip hardening toggles (see below) and repeat. Ensure behavior changes (blocked actions, console warnings, reduced surface).
Evidence to collect
- Renderer → unexpected child processes or helpers.
- Outbound connections from IDE to unallowed MCP hostnames or raw IPs.
- File access attempts from IDE renderer to sensitive paths (~/.ssh, project .env, keychains).
- IPC usage or errors indicating blocked Node/remote bridging.

What to watch in telemetry

These are vendor‑neutral hints you can adapt to OSQuery, Sigma, or your SIEM.

Process lineage
- Parent process in Cursor, Windsurf, Electron spawning shells or platform installers.
File events (denylist paths)
- Actor = IDE/Electron process AND path in ~/.ssh/, ~/.config//tokens*, project *.env
Network egress
- New outbound to MCP hosts outside your allowlist following an MCP session start.
DNS
- Sudden spikes to newly observed MCP domains; mismatches between configured MCP endpoints and resolved targets.
Renderer security signals (if you can tap logs)
- Warnings about disabled contextIsolation/sandbox, attempted use of remote, window open/navigation to external origins.

Pseudocode (conceptual):

Processes: parent in (Cursor, Windsurf, Electron) AND child in (bash, zsh, cmd.exe, powershell, wscript, osascript)
Files: actor in (Cursor, Windsurf, Electron) AND path matches (ssh keys, tokens, .env)
Net: dest_domain NOT IN mcp_allowlist AND proc in (Cursor, Windsurf, Electron)

Ship‑ready hardening

Electron/IDE
- contextIsolation: true across renderers.
- sandbox: true; avoid disabling sandbox via Node integration.
- Do not enable Node.js for any remote/untrusted content; keep remote APIs off.
- Define a strict CSP (no unsafe-eval, tight script-src), limit navigation/new windows.
- Validate IPC senders; limit or eliminate preload bridges to the minimum.
MCP security baseline
- Allowlist MCP servers; prefer local/stdio for sensitive tooling; if HTTP(S) needed, require TLS and verify pins/org.
- Session IDs: cryptographically random, globally unique; never reuse or expose pointers/addresses as IDs.
- Audit/log: server instructions/tool descriptors, tool invocations, and unexpected schema fields.
- Policy‑as‑code (concept): gate tool actions by origin and intent (e.g., block filesystem writes unless explicitly approved by a rule/user step).
Network
- Egress control for IDE/Electron to the MCP allowlist only; proxy inspection for HTML/JS "UI pages" returned by MCP.
- Alert on raw IP connections, unusual ports, or domain mismatches.

Pre‑flight checks

[ ] Renderer isolation on (contextIsolation) and sandbox enforced.
[ ] No Node in remote content; preload minimized and reviewed.
[ ] MCP server allowlist + TLS (pinning optional, encouraged).
[ ] Session IDs: strong RNG; no reuse; no pointer‑derived IDs.
[ ] Egress to MCP restricted; detections firing on anomalous proc/file/net.
[ ] Attack page can't exfiltrate secrets or spawn OS helpers under hardening.

Open threads

Public defaults for Cursor/Windsurf Electron settings (isolation, sandbox, remote) in the in‑IDE browser path.
Standardized MCP security profile (policy schema + conformance tests) akin to "baseline" CIS‑style benchmarks.

Resources:

Further Reading:

Connect

GitHub: @0xReLogic
LinkedIn: Allen Elzayn

Majestic Labs vs. the Memory Wall

Allen Elzayn — Tue, 11 Nov 2025 15:47:01 +0000

Originally published at allenarch.dev

On November 10, 2025, three former Google and Meta silicon executives announced they've raised $100 million to build what they're calling a fundamentally different kind of AI server. Not faster chips. Not more GPUs. More memory orders of magnitude more packed into a single box that could replace entire racks of today's hardware (CNBC, Nov 10).

Majestic Labs' pitch is simple: the bottleneck in AI inference isn't compute anymore. It's memory. Specifically, the fixed compute-to-memory ratio that every GPU ships with and the KV cache bloat that comes free with every long-context request.

Key context:

Majestic Labs: $100M raised (Series A led by Bow Wave Capital, Sept 2025); founders Ofer Shacham (CEO, ex-Google/Meta silicon lead), Sha Rabii (President, ex-Google Argos video chip lead), Masumi Reynders (COO, ex-Google TPU biz dev). Claims patent-pending architecture delivers 1,000× typical server memory. Prototypes target 2027 (CNBC, Nov 10)
Global AI capex surge: Alphabet $91–93B (2025), Meta ≥$70B, Microsoft $34.9B in Q3 alone (+74% YoY), Amazon ~$118B (TrendForce, Oct 30)
vLLM PagedAttention: 2–4× throughput vs state-of-the-art at same latency; achieves near-zero KV cache waste (arXiv, Sept 2023)
CXL memory pooling: 100 TiB commercial pools available in 2025; XConn/MemVerge demo showed >5× performance boost for AI inference vs SSD (AI-Tech Park, Oct 2025)

The memory wall isn't new, but the scale is

You feel it first as a ceiling, not a wall. Batch a few more requests and tokens-per-second look great until you stretch the context or let tenant count creep up. Suddenly the GPU says no. Not because FLOPs tapped out. Because memory did.

"Nvidia makes excellent GPUs and has driven incredible AI innovation. We're not trying to replace GPUs across the board we're solving for memory-intensive AI workloads where the fixed compute-to-memory ratio becomes a constraint."

Ofer Shacham, Majestic Labs CEO (CNBC, Nov 10)

Translation: inference is a KV-cache business. Every token you generate requires storing attention keys and values for every previous token in the sequence. Increase context length and memory grows quadratically. Serve multi-tenant RAG and your index footprints follow you into VRAM. Disaggregate prefill and decode and now you're passing state across workers which means duplicating it or bottlenecking on fabric.

The cheapest way to buy back throughput is often not "more compute." It's more room.

Software has done heroic work to bend this curve. vLLM's PagedAttention achieves near-zero KV cache waste by borrowing virtual memory tricks from operating systems, delivering 2–4× higher throughput than prior systems at the same latency (arXiv, Sept 2023). NVIDIA's open-source Grove (part of Dynamo) popularized disaggregated prefill/decode workers so you can scale the hot path without over-provisioning the cold one (NVIDIA Developer Blog, Nov 2025). And CXL memory pooling moved from "interesting research" to 100 TiB commercial deployments in 2025, with demos showing >5× performance boost for AI workloads vs SSD-backed memory (AI-Tech Park, Oct 2025).

Still, the physics are stubborn. HBM ships in fixed ratios. Datacenter memory is expensive and fragmented. The only way to get "more room" today is to scale horizontally add more nodes, duplicate state, pay network tax.

Majestic is betting that flipping the ratio at the box level changes the game. If each server carries 1,000× typical memory (their claim), you consolidate footprint, reduce duplication, and push batch/context limits higher without paying OOM tax.

Prototypes won't land until 2027. Bandwidth, latency, fabric integration, and TCO will determine whether this is a real shift or just a bigger box. But the thesis is grounded: memory-bound workloads are real, growing, and under-served by today's hardware.

What a T4 tells us about the slope

We ran a small vLLM benchmark on Google Colab (Tesla T4 16GB) to make the memory-throughput tradeoff concrete. Not production scale, just the shape of the curve.

Setup:

Hardware: Tesla T4 (16GB VRAM, Compute Capability 7.5)
Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 (max_model_len=2048, derived from model config)
Backend: vLLM with TORCH_SDPA attention (fp16 fallback), gpu_memory_utilization=0.70
Test grid: context lengths {512, 1024, 2048} tokens × batch sizes {1, 4}
Generation: 32 tokens per request, 3 iterations per config

Results:

Context	Batch	Decode TPS (median)	E2E Latency (median)	GPU Memory Used
512	1	4.57		~10,990 MiB
512	4	98.43	1.30s	~11,006 MiB
1024	1	26.85	1.19s	~10,988 MiB
1024	4	96.81	1.32s	~11,010 MiB
2048	1	21.59	1.48s	~11,390 MiB
2048	4	80.27	1.59s	~11,396 MiB

Key observations:

Batch scales throughput hard. Single-request runs deliver 4.57–26.85 tok/s. Batch 4 jumps to 80–98 tok/s. That's a 3.6–21× multiplier depending on context length.
Long context taxes throughput and memory. At batch 4, going from 512 → 2048 tokens drops TPS from 98.43 → 80.27 (-18%), while GPU memory climbs ~390 MiB. The KV cache is visible in the numbers.
Latency stays reasonable but creeps up. Median end-to-end for 32 tokens ranges 1.19–1.59s. P99 was 1.36–1.61s (not shown in table). This is a small model on modest hardware, so the absolute numbers are forgiving, but the slope is there.

This is exactly where Majestic's thesis lands. If you had 10× or 100× the memory per box, you could push batch and context higher without the OOM cliff. Long-context multi-tenant inference the stuff that's memory-bound today gets headroom to breathe. The TPS-per-server number climbs, and you consolidate footprint instead of scaling horizontally and paying network tax.

It's a small test on a small model. But the curve is the curve. Memory limits batch. Batch limits throughput. More memory buys you more throughput per box for the workloads that matter.

Resources:

Further Reading:

Connect

GitHub: @0xReLogic
LinkedIn: Allen Elzayn

China's AI Dual Flywheel: Why Mainland Hardware Wins First, Hong Kong Internet Later

Allen Elzayn — Sun, 09 Nov 2025 18:28:50 +0000

Originally published at allenarch.dev

On November 9, 2025, UBS dropped Q3 earnings data for China A-shares: +12% YoY growth overall, but the real story is in the splits.

AI-related sectors led the charge:

Media: +57%
Electronics: +41%
Computers: +34%

Meanwhile, Hong Kong tech stocks the ones everyone watches for AI cloud exposure delivered mixed signals. Tencent reports this Thursday. Alibaba hasn't announced its date yet.

The mainland hardware boom is real. The internet AI monetization story? Still loading.

Key data points:

UBS Securities: A-shares Q3 earnings +12% YoY, driven by AI-related tech sectors (CNBC, Nov 9)
HSBC: Mainland winners = AI infrastructure hardware; Hong Kong winners = internet with AI cloud/models (CNBC, Nov 9)
Alibaba: AI spending in e-commerce already break-even, +12% ROAS in testing (CNBC, Oct 16)
Baidu: "Domestically developed chips and homegrown software" shield AI push from US export controls (Reuters, May 21)
Global capex context: Alphabet $91–93B (2025), Meta ≥$70B (2025, "notably larger" in 2026), Microsoft $34.9B in Q3 alone (+74% YoY) (TrendForce, Oct 30)

Two cycles, different clocks

China's AI story isn't monolithic. It's two cycles running on different timelines, and understanding the gap between them is the whole game.

Mainland hardware suppliers are cashing in now. The capex surge from global hyperscalers, plus China's domestic substitution push, is hitting their order books immediately. Server components, optical transceivers, power modules, liquid cooling all translating to actual earnings.

"Given a demand upswing from AI and self-reliance, the greater tech sector's rapid earnings growth drove overall ex-financials' earnings."

— Lei Meng, UBS Securities China equity strategist

The numbers back it up: electronics +41%, computers +34%, media +57% in Q3. Not narrative, profit.

Hong Kong internet platforms Alibaba, Tencent, Baidu are in a different phase. They're building AI cloud infrastructure and training models, spending heavily (Alibaba alone committed RMB 380 billion over three years), but the revenue from AI products is still ramping.

"In mainland China, hardware manufacturers related to AI infrastructure have benefited the most from the rally. In Hong Kong, internet names with AI-related cloud services and models have benefited the most."

— Herald van der Linde, HSBC head of Asia Pacific equity strategy

Translation: the datacenter gets built first, monetization comes later.

The gap between these cycles matters. Hardware suppliers get paid when the datacenter ships. Internet platforms get paid when enterprises adopt AI tools and scale usage over quarters.

That's a 12–18 month lag.

The ROI split: e-commerce vs cloud

Not all AI spending is created equal. Some verticals are already cash-flow positive. Others are still in the build phase.

Alibaba's e-commerce AI is already profitable. On October 16, Alibaba VP Kaifu Zhang told reporters the spend on Taobao and Tmall is breaking even, with preliminary tests showing +12% return on ad spend.

"It's very rare to see double-digit changes in such metrics."

— Kaifu Zhang, Alibaba VP

That's not hype, that's measurable ROI.

The playbook is straightforward: AI personalizes search, improves virtual try-ons, optimizes ad targeting. Conversion rates lift, take rates improve, and the math closes within quarters.

Source: CNBC, Oct 16

Cloud AI is a different story. Alibaba's cloud division, along with Tencent and Baidu, is burning cash to build datacenters, deploy GPUs, and scale inference capacity. CFO Toby Xu was clear on the August call: AI and consumption are "two major historic opportunities" requiring investments of "historic scale." Near-term margin gets deprioritized.

That means capex/sales stays elevated while ROI accrues slowly:

Datacenter utilization has to climb (still ramping)
Enterprise adoption has to scale (early innings)
Net dollar retention on AI workloads has to improve (2026 story)

E-commerce delivers ROI in quarters. Cloud delivers in years.

The hardware cycle is already paying off

Global hyperscalers are jacking up capex, and that money flows straight to physical components. China's mainland suppliers are cashing in now.

TrendForce's October numbers tell the story:

Alphabet: $91–93 billion in 2025 (third upward revision, was $52.5B in 2024)
Meta: at least $70 billion in 2025, "notably larger" in 2026
Microsoft: $34.9 billion in Q3 alone, +74% YoY, with 2026 growing "even faster"
Amazon: on track for ~$118 billion (raised from $100B forecast)

Source: TrendForce, Oct 30

That capital gets spent on GPUs/ASICs (Nvidia Blackwell, Google TPU, AMD MI300X, Huawei Ascend 910C), HBM and memory, optical transceivers (100G/800G for datacenter interconnects), server chassis and power supplies, liquid cooling systems, and networking switches.

China's mainland suppliers play in everything except top-tier GPUs (export controls). But the rest of the bill of materials? They're competing and winning.

Take optical transceivers, the components that convert electrical signals to light for datacenter networking. Innolight (中际旭创) ranked #1 globally in 2024 with $3.3 billion revenue, up 122% YoY. New Yi Sheng (新易盛) jumped from 7th to 3rd place globally with $1.2 billion revenue, up 179%. Seven of the top 10 global optical module vendors are now Chinese companies.

This isn't just market share, it's capturing the upgrade cycle. Cignal AI reports 800G optical transceiver shipments will grow 60% in 2025, with 1.6T entering volume production in 2H25. When hyperscalers build out AI datacenters, these are the components going into every rack.

Domestic substitution accelerates this.

"Domestically developed chips and increasingly efficient homegrown software will form a strong foundation for long-term innovation in China's AI ecosystem."

— Shen Dou, Baidu VP

Reuters reported Huawei's Ascend 910C is prepping for mass shipments. It's not Blackwell-class (roughly 60% of H100 performance for inference), but for Chinese AI labs, it's the only option.

When Huawei ships Ascend at scale, mainland component suppliers win twice: they supply global hyperscaler builds and domestic Chinese datacenters (China's total computing power reached 246 EFLOPS in mid-2024, ranked #2 globally).

The earnings data confirms it. Hardware cycle is live.

The optical transceiver numbers are particularly telling. These components sit at the intersection of global hyperscaler spending and China's domestic buildout. When Innolight reports 122% revenue growth and New Yi Sheng jumps 179%, that's not inventory build it's actual datacenter deployment translating to component orders. The 60% shipment growth forecast for 800G modules in 2025, followed by 1.6T volume ramp in 2H25, gives us a leading indicator: as long as those shipment numbers hold, the hardware capex cycle is intact. When those numbers roll over, that's the signal to watch for cycle 1 peaking.

The internet monetization lag

Alibaba Cloud, Tencent Cloud, Baidu AI Cloud are in build mode. They're burning capital to scale infrastructure and wait for enterprise customers to actually adopt and pay.

HSBC flagged Hong Kong's AI winners as "internet names with AI-related cloud services and models," which is accurate, but timing is everything. Cloud AI revenue scales when enterprises move from pilot to production, datacenters fill up, and net dollar retention starts expanding. None of that happens overnight.

Chinese enterprises are still early in cloud AI adoption especially compared to U.S. companies. It's a multi-quarter ramp, and the platforms are building capacity ahead of demand.

Alibaba's RMB 380 billion commitment, Tencent's and Baidu's similar investments they're all betting enterprise adoption accelerates. The risk: if it lags, datacenter utilization stays low and ROI gets pushed out. The reward: if it scales, they own the stack and capture margin.

Right now, the expense is visible. Revenue is still loading.

The tell will be 2026 earnings. Watch for higher net dollar retention on AI workloads (existing customers spending more), growing backlog (contracted revenue not yet recognized), and improving gross margin on cloud AI (utilization kicking in).

If those metrics show up, the second flywheel is spinning. Until then, it's a capex story.

The sequence to watch

If the dual-flywheel thesis holds, here's what should show up over the next 12–18 months:

1H26: Hardware margin expansion outpaces internet. Mainland suppliers (optical, power, liquid cooling) see gross margin improve from volume leverage and better product mix. Inventory turns accelerate as orders pull forward. Operating leverage kicks in.

Hong Kong platforms still show elevated capex/sales ratios (building datacenter capacity), flattish or down cloud gross margin (low utilization early in ramp), and high R&D spending (model development, toolchains).

2H26: Internet monetization inflects. Hong Kong platforms start reporting accelerating cloud AI revenue growth, net dollar retention trending up, and gross margin expanding as utilization climbs.

If that happens, the second flywheel spins, and HK internet multiples re-rate.

Key indicators:

Metric	What it signals	Where to find it
800G/1.6T optical transceiver mix	Upgrade cycle acceleration; 1.6T volume ramp in 2H25 signals demand pulling forward	Innolight, New Yi Sheng quarterly earnings and commentary
Optical module ASP trends	Pricing power = tight supply; ASP compression = competition/oversupply	Supplier earnings calls (gross margin discussion)
Liquid cooling attach rate	GPU density in new builds; higher attach = more advanced AI workloads	ODM/server vendor disclosures
Alibaba Cloud revenue growth	Enterprise AI adoption pace	Quarterly earnings
Tencent Cloud gross margin	Utilization and cost efficiency	Quarterly earnings
Baidu AI Cloud backlog	Forward visibility on revenue	Earnings call commentary

When does cycle 2 spin up?

The mainland hardware cycle is validated. Q3 earnings showed media +57%, electronics +41%, computers +34%. Real numbers, not forecasts.

"We think 'growth' may remain a key investment theme. We highlight better risk/reward in the ChiNext board, due to its accelerating earnings with long-term resilience and valuation."

— Lei Meng, UBS Securities China equity strategist

ChiNext's largest members include CATL, Innolight, and Sungrow Power the hardware suppliers riding the capex wave.

The Hong Kong internet cycle is projected. It depends on enterprise AI adoption accelerating, datacenter utilization climbing, and NDR and backlog inflecting upward.

Base case: 2H26. If enterprise adoption follows the typical cloud ramp (18–24 months from pilot to production scale), Hong Kong platforms start showing meaningful AI revenue contribution in late 2026.

Bull case: mid-2026. If Chinese government procurement and SOE mandates accelerate adoption (plausible, given the state AI push), the monetization phase pulls forward 2–3 quarters.

Bear case: 2027. If enterprises stay cautious (macro headwinds, ROI unclear), datacenter utilization stays low, and the second flywheel doesn't spin until 2027 or later.

The timing matters for positioning. Mainland hardware is paying off now. Hong Kong internet is a 2026 call option.

The substitution wildcard

Export controls are forcing China to build its own stack. Near-term, that's a handicap Huawei's Ascend delivers roughly 60% of H100 performance. Long-term? It's a wildcard.

Nvidia's CUDA has 15+ years of tooling, libraries, and community code. Huawei's CANN framework works, but code has to be rewritten, performance optimization starts from scratch, and debugging cycles are longer. For Chinese AI labs racing to keep up with GPT-5-class models, that's a real disadvantage.

Source: Reuters, Apr 21

But compute scarcity breeds innovation. When you can't throw more GPUs at a problem, you optimize model architecture (sparsity, distillation, quantization), improve software efficiency (kernel fusion, memory management), and rethink training recipes.

China's AI labs, locked out of Blackwell-class hardware, might develop techniques that deliver similar output at lower compute cost. If that happens, the "handicap" becomes a competitive edge.

We'll know by 2027 whether forced substitution hurt or helped.

Three things to watch

Tencent earnings (Nov 14). Watch for cloud revenue growth acceleration, commentary on AI workload adoption by enterprise customers, and signals on datacenter utilization rates. If Tencent guides higher on cloud AI, that's the first signal the second flywheel is spinning up.

Alibaba Singles Day results (mid-Nov). Alibaba already said AI e-commerce is break-even and expects "very significant" positive impact on GMV during Singles Day (Nov 11). If GMV growth beats and management attributes it to AI, that validates the fast-ROI playbook.

Chinese government procurement announcements. Beijing's mandate for state-funded datacenters to use domestic chips is just the start. Watch for provincial-level AI infrastructure programs, SOE cloud migration mandates, and university procurement shifts. If government spending accelerates, it pulls forward the enterprise adoption curve.

The framework

Most coverage treats "China AI" as one trend. It's not. The data splits into two cycles with different ROI timelines:

Mainland hardware: triggered by global capex surge and domestic substitution. Beneficiaries are server, optical, power, and cooling suppliers. Timing: earnings impact now (Q3 2025 validated it). Risk: capex slowdown or supply chain shifts.

Hong Kong internet: triggered by enterprise AI adoption and datacenter utilization. Beneficiaries are Alibaba, Tencent, Baidu cloud platforms. Timing: monetization inflects 2H26 (projected). Risk: adoption lags, utilization stays low, ROI gets pushed out.

Knowing which cycle you're analyzing matters more than the headline.

The hardware cycle is real. The internet cycle is loading. The gap between them is the trade.

Resources:

Further Reading:

Connect

GitHub: @0xReLogic
LinkedIn: Allen Elzayn

Compute Wars: Google’s TPU Push vs Nvidia Blackwell and China’s No‑Sell Moment

Allen Elzayn — Sat, 08 Nov 2025 13:53:35 +0000

Originally published at allenarch.dev

On November 7, 2025, Nvidia CEO Jensen Huang made it official: Blackwell chips are a no‑sell to China.

"Currently, we are not planning to ship anything to China," Huang told Reuters. No active discussions. No workarounds. No China‑spec'd variants coming.

Meanwhile, Beijing fired back with its own directive: state‑funded data centers must switch to domestic AI chips. Projects less than 30% complete? Pull out the foreign silicon or lose funding.

The AI compute map just split in two.

Key developments this week:

Nvidia says it has "zero share" of China's AI datacenter compute market (Reuters, Nov 7)
U.S. poised to block even scaled‑down Blackwell variants, ending the China‑only SKU playbook (Reuters, Nov 7)
China mandates domestic chips in state‑funded DCs; projects <30% complete must remove foreign hardware (Reuters, Nov 7)
Senators urge White House to maintain the ban on advanced AI chips to China (The Verge, Nov 6)
Huawei's Ascend 910C readies mass shipments as China's domestic alternative (Reuters, Apr 21)

Here's what this actually means

Nvidia's China revenue just evaporated. The money's going somewhere else.

Nvidia's China revenue: gone.

The company already excluded China H20 shipments from its guidance back in August. CFO Colette Kress told analysts the "bottleneck is diplomacy," not demand. Now it's official: zero datacenter compute share in the world's second‑largest AI market.

To put that in perspective: China accounted for 21.4% of Nvidia's total revenue in FY2023 before export controls tightened. By FY2025, that share dropped to 13.1% ($17.1 billion), and datacenter-specific sales are now effectively zero. That's an estimated $15+ billion in annual datacenter revenue evaporated.

Source: Reuters, Aug 27 | Nvidia FY2025 financials

Demand doesn't disappear it relocates.

Blackwell capacity that would've gone to China is being rerouted to:

U.S. hyperscalers (Microsoft, Google, Meta) scaling trillion‑parameter models
Sovereign AI programs in the Middle East (UAE, Saudi Arabia buying at premium)
European model labs (Mistral, Aleph Alpha) competing for slots

The result? Lead times stretched, prices firm, and buyers with early allocations holding leverage.

China accelerates substitution.

Huawei's Ascend 910C is positioning as the domestic answer. Reuters reported in April that Huawei is prepping mass shipments, with plans to double output. The chip targets similar workloads as Nvidia's H100 not Blackwell‑level, but "good enough" for many Chinese AI labs.

The catch: software ecosystem. Nvidia's CUDA has 15+ years of tooling, libraries, and developer mindshare. Huawei's CANN framework is catching up, but the gap is real. Training a frontier model on Ascend means rewriting code, debugging new performance profiles, and accepting slower iteration cycles for now.

Source: Reuters, Apr 21

How we got here

The current export controls trace back to October 2022, when the Bureau of Industry and Security (BIS) first restricted advanced computing chips and semiconductor manufacturing equipment to China.

Oct 2022: Initial rules target GPUs for AI and supercomputing. Nvidia scrambles to create China‑spec variants (A800, H800) that technically comply but deliver similar performance.

Oct 2023: BIS tightens the rules. New thresholds close loopholes. Performance metrics now account for interconnect bandwidth, not just raw FLOPS. The A/H‑series workaround gets nuked.

Source: BIS, Oct 7, 2022 · BIS, Oct 17, 2023

Nov 2025: Blackwell enters the picture. This time, no workarounds. The Information reports the U.S. will block even scaled‑down variants before they ship. Huang confirms: not planning to ship anything.

The playbook that worked for A/H‑series is dead.

Who's winning (and who's scrambling)

The clear winners: anyone who locked Blackwell capacity early.

Microsoft, Google, Meta they secured multi-billion-dollar commitments months ago. Now that China demand is offline, they're not competing for scarce supply. Lead times for new buyers? Stretching into 2026.

Sovereign AI programs in the Middle East are also winning. UAE and Saudi buyers are paying premium for early slots, betting that AI infrastructure is strategic national priority. When compute is the new oil, you pay whatever it takes.

China: forced sprint to self-sufficiency.

Huawei's Ascend 910C becomes the default. But here's the reality check: it's not Blackwell-class. Performance reaches 60% of H100 for inference workloads, though training large-scale models remains more challenging.

The bigger issue is software. CUDA has 15 years of ecosystem. CANN (Huawei's framework) works, but you're rewriting code, debugging new bottlenecks, and accepting slower iteration. For Chinese AI labs racing to keep up with frontier models, that's a meaningful handicap.

Still, China has options the U.S. doesn't: state funding, captive market, and willingness to play the long game. Ascend might be 60% of H100 performance today, but in 3-5 years? The gap could narrow.

Nvidia: short-term pain, long-term TBD.

Zero China datacenter revenue hurts. But demand ex-China is strong enough that Blackwell is supply-constrained anyway. The real question is 2027-2028: if China builds a competitive domestic stack, does Nvidia lose strategic leverage permanently?

For now, Nvidia is fine. The company guided for record revenue without China. But this is the first time in modern tech history that the world's second-largest market is completely off-limits for cutting-edge silicon.

Meanwhile, Google is building something different

While everyone's watching the Nvidia-China standoff, Google is quietly advancing its own compute stack and it might be the smartest hedge in AI infrastructure.

The setup: Axion CPUs + next-gen TPU pods.

Google's been pushing its custom silicon for years (TPUs have been around since 2016), but 2025 marks a shift in strategy. The company is positioning its "AI Hypercomputer" platform not as a Nvidia replacement, but as a workload-optimized alternative.

The pitch: why rent scarce Nvidia capacity at premium when you can run certain workloads faster and cheaper on purpose-built hardware?

What TPUs are actually good at.

TPUs excel at specific tasks:

Inference at scale (serving models to millions of users, Google's bread and butter)
Training Transformer-based models (the architecture Google invented)
Batch processing where you can saturate the chip's matrix multiply units

What they're not great at: general-purpose GPU workloads, highly dynamic graphs, or code that's deeply optimized for CUDA.

The trade-off: if your workload fits TPU's sweet spot, you get better price-performance. If it doesn't, you're rewriting code and debugging performance regressions.

Why this matters right now.

Nvidia Blackwell supply is constrained. Lead times are 12-18 months. Prices are firm.

Google TPU? Available through Google Cloud with no waitlist. Pricing is competitive (and Google controls it).

For companies building on Transformer models (basically everyone doing LLMs, vision transformers, or diffusion models), TPU is a viable alternative especially if you're already on GCP.

The strategic angle.

Google isn't trying to beat Nvidia in the open market. It's creating an alternative ecosystem:

Hyperscalers (Microsoft, Meta) lock in Nvidia capacity through multi-billion-dollar deals
Google builds proprietary stack, keeps capacity for itself and select cloud customers
Enterprise buyers caught in the middle get optionality: pay Nvidia premium or retool for TPU

This is the same playbook AWS ran with Graviton (Arm CPUs) and Trainium (AI training chips). Build your own silicon, control your destiny, offer customers a lower-cost path that happens to lock them into your cloud.

The difference: Google's TPU has a 9-year head start and real production workloads (Search, YouTube, Translate all run on TPU).

The catch: ecosystem lock-in.

Switching from Nvidia to TPU isn't trivial. You're trading:

CUDA (15+ years, massive library ecosystem) for JAX/XLA (Google's framework)
Portable code that runs anywhere for Google Cloud vendor lock-in
Broad hardware support for optimized-but-narrow use cases

For startups, that's a risky bet. For Google Cloud customers already committed? It's a natural hedge against Nvidia supply risk.

The bottom line on TPU.

Google's compute push isn't about displacing Nvidia. It's about carving out workloads where purpose-built silicon wins on economics and availability.

As Nvidia prioritizes hyperscaler deals and China gets locked out, Google's alternative stack becomes more attractive not because it's better across the board, but because it's available and predictable.

That's a compelling pitch when the alternative is waiting 18 months for Blackwell allocation.

What this means if you're...

If you're building AI products:

Blackwell supply is tight. If you don't have allocation locked, you're either paying premium on secondary market or waiting until mid-2026. Budget accordingly.

Alternatives exist Google TPU pods, AMD MI300X, even cloud providers reselling capacity but switching costs are real. CUDA code doesn't magically run on TPU.

If you're in China:

Your compute options just narrowed. Ascend 910C is the domestic play, but expect tooling friction. If you're training frontier models, prepare for longer iteration cycles and higher engineering overhead.

The upside: China's government is pouring money into domestic silicon. HBM production, packaging, software toolchains all getting funded. The ecosystem will mature, question is how fast.

If you're watching geopolitics:

This is tech decoupling in real-time. Two separate AI compute ecosystems are forming:

West: Nvidia/AMD/Intel + Google TPU, optimized for performance, backed by hyperscaler capital
China: Ascend/Kunlun/Cambricon, optimized for sovereignty, backed by state funding

They'll diverge on hardware, software, and standards. Interoperability will fade. Think of it like 5G equipment Huawei and Ericsson serve different markets now.

Four things to watch

1. BIS rule updates (Q1 2026)

Watch for new performance thresholds or licensing carve-outs. If BIS tightens further, even older-gen GPUs (A100, H100) could get restricted. If they ease, limited Blackwell variants might ship under strict monitoring.

2. China procurement mandates

Beijing's directive to state-funded DCs is just the start. Expect expanded mandates: provincial governments, SOEs, universities. The domestic chip push will accelerate.

3. Nvidia's product segmentation

Will Nvidia create Blackwell-lite SKUs for markets outside China but below full-spec? Or is the company done with workaround products? Product roadmap will signal strategy.

4. Huawei shipment volumes

If Ascend 910C scales to tens of thousands of units per quarter, China's substitution is working. If volumes stay low, the domestic stack isn't ready for prime time.

Why Blackwell actually matters

Blackwell isn't just "faster H100." It's a generational leap designed for trillion-parameter models and real-time inference.

Key specs:

208 billion transistors across two reticle-limited dies, connected via 10 TB/s chip-to-chip link
2nd-gen Transformer Engine with FP4 precision support (faster inference, lower memory)
5th-gen NVLink enabling rack-scale systems (72 GPUs in single coherent domain)
Confidential computing for secure multi-tenant AI workloads
Native decompression engine for database analytics (non-AI workload expansion)

Real-world impact:

The GB200 NVL72 system (72 Blackwell GPUs + 36 Grace CPUs in a rack) can serve trillion-parameter models in real-time. That's the scale needed for GPT-5-class systems or multi-modal models processing video + text simultaneously.

For context: training GPT-4 took thousands of A100 GPUs over months. Blackwell systems can do similar runs in weeks, at lower power draw per FLOP.

This is why the export ban matters. China's AI labs just lost access to the fastest iteration cycles in the industry.

Source: Nvidia Blackwell Architecture · DGX B200 · GB200 NVL72

The real question nobody's asking

The AI compute map just redrew itself along geopolitical lines.

Nvidia's Blackwell stays in the West. China builds its own stack with Huawei Ascend. Two ecosystems, increasingly incompatible.

For Nvidia, it's short-term pain (zero China revenue) but manageable ex-China demand is strong enough to absorb supply. For China, it's forced acceleration of domestic alternatives that deliver roughly 60% of H100 performance today but could close the gap in 3-5 years.

The real wildcard: what happens when China's AI labs locked out of cutting-edge hardware start optimizing models for lower compute budgets? Efficiency gains could flip the script. Compute abundance breeds inefficiency; scarcity breeds innovation.

We'll know by 2027 whether this export control strategy worked or backfired.

Resources:

Further Reading:

Connect

GitHub: @0xReLogic
LinkedIn: Allen Elzayn

After Meta's $75B Bet: A $30B Bond Deal and Wall Street's Harsh Reality Check

Allen Elzayn — Mon, 03 Nov 2025 07:31:53 +0000

Last week, I wrote about Meta's $75 billion infrastructure spending spree the biggest AI bet in tech history.

Seven days later, the market delivered its verdict: an 11% single-day drop, Meta's worst since October 2022.

What happened? Meta didn't scale back. It doubled down.

Between October 29 and November 3, 2025, Meta:

Raised 2025 capex guidance to $70–$72 billion (from $66–$72B)
Completed a $30 billion bond offering the largest since 2023
Warned that 2026 spending will be "significantly larger"
Watched its stock tank while Alphabet rallied 6%

Meanwhile, CoreWeave's $9 billion acquisition of Core Scientific collapsed, and analysts began openly warning about "circular financing" in AI infrastructure.

This is the story of one week that exposed the tension between Wall Street's patience and Big Tech's AI ambitions.

What We Called (One Week Ago)

Before diving into what happened, let me highlight what we got right in last week's analysis:

1. "Wall Street Is Worried It Might Be a Bubble"

What I wrote (Oct 28):

"Meta's $75B AI Infrastructure Bet: How Meta spent $75 billion in three months on AI infrastructure and why Wall Street is worried it might be a bubble."

What happened (Oct 29-30):

Stock dropped 11% (worst day in 3 years)
DA Davidson: "The levering up is truly unhealthy behavior"
Axios headline: "The AI boom isn't going anywhere" (but mentions bubble concerns)
Fed Chair Powell forced to address circular financing risks

Accuracy: ✓ Nailed it. The bubble concern wasn't speculation it was the dominant narrative within 48 hours.

2. Off-Balance-Sheet Debt Would Become a Theme

What I wrote (Oct 28):

"Meta doesn't own most of it [Hyperion], but they're on the hook for 16 years through a residual value guarantee... This isn't just creative accounting it's a fundamental shift."

What happened (Oct 31):
Bloomberg published: "Meta, xAI Starting Trend for Billions in Off-Balance Sheet Debt"

Direct quote from Bloomberg:

"Meta is among firms popularizing a way for debt to sit completely off balance sheet, allowing enormous sums to be raised while limiting impact on its financial health."

Accuracy: ✓ We called the trend before mainstream coverage.

3. ROI Timeline Was Going to Be Wall Street's Concern

What I wrote (Oct 28):

"Break-even Scenarios... Reality Check: Meta needs a combination of all three scenarios to hit reasonable ROI timelines (5-7 years)."

What happened (Oct 29):

Zuckerberg gave no specific revenue targets for 2026+
Analysts asked repeatedly about ROI timeline on earnings call
Stock tanked because "spending trajectory" worried investors more than revenue beat

Accuracy: ✓ The 5-7 year timeline concern was exactly what spooked the market.

4. Circular Financing Risk

What I wrote (Oct 28):
I linked to Yahoo Finance: "AI's self-investment spree sets off bubble alarms on Wall Street"

What happened (Oct 29-30):

Powell addressed it in Fed presser
Axios deep-dive on circular financing
Multiple analysts citing it as primary concern

Accuracy: ✓ Week-old article referenced the exact issue that dominated the week's coverage.

5. CoreWeave Supplier Power

What I wrote (Oct 28):

"CoreWeave: $14.2 billion (6+ years)" as one of Meta's core dependencies

What happened (Oct 30):

CoreWeave's acquisition of Core Scientific collapsed
Core Scientific shareholders rejected $9B offer
Showed supplier power in AI infrastructure market

Accuracy: ✓ Highlighting the CoreWeave dependency proved prescient when the deal fell apart.

What We Missed

To be fair, I didn't predict:

The $30B bond size (I mentioned debt financing risk but not the specific deal)
The -11% stock drop magnitude (I flagged bubble risk but not the severity)
2026 capex acceleration being disclosed this early (thought it'd come in Q4 guidance)

But the core thesis massive spending, bubble concerns, ROI uncertainty, off-balance-sheet leverage all materialized within 7 days.

Now, let's examine what actually happened.

The Market's Message: Show Me the ROI

On October 29, Meta reported Q3 earnings that beat on every metric except the one that mattered: spending trajectory.

The Numbers

Metric	Result	Wall Street View
Revenue	$51.24B (+26% YoY)	Beat estimates ($49.5B)
Adjusted EPS	$7.25	Beat
2025 CapEx	$70–72B	Raised lower bound +$4B
2026 CapEx	"Significantly larger"	First acceleration signal
Stock (Oct 30)	-11%	Worst day since Oct 2022

Source: Meta Q3 2025 Earnings

What spooked investors? Not current spending, but the acceleration signal.

CFO Susan Li: "We expect capital expenditures dollar growth will be notably larger in 2026 than 2025, with growth primarily driven by infrastructure costs."

Translation: If 2025 is $71 billion, 2026 could be $90–100 billion or more.

The ROI Gap

Here's the math that scared Wall Street:

Current State (Q3 2025):
AI-enhanced ad revenue: ~$7B incremental
AI infrastructure costs: ~$80B total
Gap: -$73B

2026 Projection:
AI revenue: ~$15–20B optimistic
Infrastructure costs: ~$100B+
Gap: Still -$80B+

Meta is betting AI will transform its business by 2027-2030. Wall Street wants proof now.

The $30 Billion Bond: Financing at a Premium

On October 30, Meta launched a "$25 billion+" bond offering. By October 31, it closed at $30 billion the fifth-largest corporate bond deal ever.

The Structure

Six tranches of senior notes:

Maturity	Size	Coupon	Spread
5-year	$4.0B	4.20%	T+50
7-year	$4.0B	4.60%	T+70
10-year	$6.5B	4.875%	T+78
20-year	$4.5B	5.50%	T+88
30-year	$6.5B	5.625%	T+98
40-year	$4.5B	5.75%	T+110

Source: PitchBook, Oct 31, 2025

For Context:

Largest corporate bond deals ever:

Verizon: $49B (2013, Vodafone acquisition)
Anheuser-Busch InBev: $46B (2016, SABMiller)
CVS Health: $40B (2018, Aetna)
Pfizer: $31B (2023, M&A)
Meta: $30B (2025, AI infrastructure)

Every deal above Meta's was for acquisitions. Meta is borrowing $30 billion to build data centers.

Why Issue Bonds?

Meta has $65 billion in cash. Why borrow?

Preserve cash optionality keep powder dry for M&A, buybacks
Tax efficiency interest is tax-deductible
Off-balance-sheet leverage Blue Owl JV keeps $27B of Hyperion off books

Order books were 4× oversubscribed. But spreads widened vs. Meta's August 2024 deal, suggesting investors demanded a premium for AI infrastructure risk.

Oracle's $18B bond deal (Sept 24) initially traded tight but widened to T+158 by October 30 as AI sentiment soured. Meta priced 40-year bonds at T+110 a 48 bps discount.

The CoreWeave Collapse: A Warning Sign?

On October 30, Core Scientific shareholders rejected CoreWeave's $9 billion acquisition offer.

What Happened

CoreWeave (Nvidia-backed AI cloud) had agreed in July to acquire Core Scientific (crypto miner-turned-data center operator) for $9B all-stock.

By October:

CoreWeave stock: +235% YTD, $67B market cap
Core Scientific view: "Why sell now when we could become the next CoreWeave?"
Largest shareholder opposed: argued standalone value higher

Shareholders voted no. Deal terminated.

Why It Matters

CoreWeave has $42.9B in contracted revenue through 2032:

OpenAI: $22.4B
Meta: $14.2B
Nvidia: $6.3B (backstop)

The collapsed deal means:

CoreWeave continues as customer, not owner
Core Scientific retains pricing power
Meta's $14.2B contract stays with third-party supplier

The Broader Signal: If AI valuations are too high for strategic M&A to make sense, bubble dynamics may be taking hold.

The Circular Financing Question

Here's the concern that broke into mainstream coverage:

The Loop

Nvidia invests in OpenAI, CoreWeave (billions)
OpenAI, CoreWeave buy Nvidia chips (billions)
Meta invests in Scale AI ($14.3B), buys CoreWeave ($14.2B)
Scale AI, CoreWeave buy more Nvidia chips
Repeat

DA Davidson analyst Gil Luria:

"They're using that capital to raise debt. It's the levering up 
that's the truly unhealthy behavior."

Fed Chair Jerome Powell (Oct 29):

Data center spending is "not especially interest sensitive" meaning 
spending continues regardless of rates.

Is This a Bubble?

Bull Case:

Companies generate massive cash flow ($24.7B for Meta in Q3)
AI products have real users (ChatGPT: 200M+ weekly)
Infrastructure has alternative uses

Bear Case:

Circular investments create artificial demand
Revenue concentration (CoreWeave: 71% from Microsoft)
Timing mismatch: spending today, ROI 2028-2030+

My take: It's a bubble with substance. AI is real, but the scale and speed of investment may exceed monetization by years.

The Hyperscaler Arms Race

All major players raised guidance Oct 29-30:

Company	2025 CapEx	YoY Growth	Stock Reaction
Meta	$70–72B	+44%	-11%
Alphabet	$91–93B	+83%	+6%
Microsoft	~$140B (FY26)	+74%	-3%
Amazon	~$75B	~40% est.	+3%

Total: ~$380 billion across four companies.

For context:

Finland's GDP: ~$300B
ExxonMobil 2024 revenue: ~$350B

Why Different Reactions?

Alphabet +6%: Google Cloud breaks out AI revenue separately, shows clear ROI.

Microsoft -3%: Azure AI growing 80%+ YoY, but enterprise contracts provide visibility.

Meta -11%: AI spending goes to ad improvements hard to quantify separately. No standalone AI product revenue yet.

The Lesson: Wall Street tolerates massive spending if you show where revenue comes from. Google can. Microsoft can. Meta can't (yet).

What Zuckerberg Told Investors

October 29 earnings call key quotes:

On underinvesting risk:

"We're seeing the returns in the core business that's giving us 
a lot of confidence that we should be investing a lot more, and 
we want to make sure that we're not underinvesting."

On excess capacity:

"If you got to a point where you overbuilt, you could have that 
as an option to offer [capacity] to third parties."

On worst-case:

"In the very worst case, you end up with several years worth of 
excess data center capacity. You'd have some loss and depreciation 
of those assets, but over time you'd grow into that and use it."

On timing:

"It's the right strategy to aggressively front-load building capacity, 
so that way we're prepared for the most optimistic cases."

What He Didn't Say

Notice what's missing:

No specific AI revenue targets for 2026+
No timeline for infrastructure-as-a-service
No quantification of ad improvement from AI

Zuckerberg articulated a vision (superintelligence, optionality) but not a business plan (how much revenue, by when).

That's why the stock dropped.

The ROI Math: Three Scenarios

Current State (2025)

Costs:

CapEx: $71B
Operating costs: ~$15B
Total: ~$86B

Revenue:

AI-enhanced ads: ~$7B incremental
Direct AI products: ~$0
Gap: -$79B annually

Scenario 1: Ad Revenue Only

Assumption: AI improves CPM by 10%

Meta ad revenue: ~$150B
10% improvement: +$15B/year
Years to break even (2025): 5.7 years
Years to break even (cumulative $600B through 2028): 40 years

Verdict: Not viable standalone.

Scenario 2: New AI Products

Assumption: $50B/year by 2030

Meta AI subscriptions: $15B (100M @ $12.50/mo)
Business tools: $20B
API/developer: $15B

Break-even timeline:

2025-2028: Net -$300B cumulative
2029-2032: Net -$200B cumulative
2033+: Positive

Verdict: Plausible if Meta achieves OpenAI-scale monetization.

Scenario 3: Infrastructure-as-a-Service

Assumption: Sell 30% excess capacity @ $0.50/GPU-hour

With 2M GPUs by 2028:

Available: 600K GPUs
Revenue: ~$2.6B/year

Verdict: Nice optionality, not primary driver.

The Realistic Path

Meta needs all three working:

2026-2027: +$10-15B from ads
2028-2029: +$20-30B from products
2030+: +$5-10B from infrastructure

Break-even: 2033-2035 (8-10 years from now)

Risk: If any pillar fails, $600B in sunk costs with no recovery path.

The Debt That Doesn't Show Up

Meta now has two types of AI debt:

On-Balance-Sheet: $30B

The bond offering (Oct 31):

Annual interest: ~$1.4B (4.7% weighted avg)
Debt-to-equity: ~0.05 (still low)

Off-Balance-Sheet: $21.6B

Blue Owl/Hyperion deal structure:

Total project: $27B
Blue Owl owns 80%
Blue Owl's debt: $21.6B (in SPV, not on Meta's books)
Meta has residual value guarantee (16 years)

Source: Bloomberg, Oct 31: "Meta, xAI Starting Trend for Billions in Off-Balance Sheet Debt"

Total Economic Exposure: ~$51.6B

Meta's reported debt-to-equity (~0.05) masks true economic leverage (~0.10).

Bloomberg: "Meta is among firms popularizing a way for debt to sit completely off balance sheet, allowing enormous sums to be raised while limiting impact on its financial health."

The Advertising Paradox

Irony: While investors panic about spending, Meta's ad business is booming.

Q3 2025 Performance

Revenue: $51.2B (+26% YoY)
Ad impressions: +7%
Price per ad: +11%
Fastest growth since Q1 2024

Drivers:

AI-powered targeting
Reels ads (50%+ of time on platform)
Chinese retailers (Shein, Temu) despite tariffs

The Paradox: Meta is already seeing AI ROI in ads. Zuckerberg's claim validated by numbers.

Why Wall Street Worries: Meta can't isolate how much of 26% growth is from AI vs. other factors. Without attribution, investors can't model returns.

CNBC (Nov 1): "Meta has continued to point to how its AI investments are improving its online advertising business, but it's having a more difficult time showing how that spending will benefit the company in the future."

The Energy Wildcard

Power constraints may limit Meta's buildout:

The Math

Each GW of data center = 1 GW power needed.

Meta's Secured Power:

ENGIE (Texas solar): 1.3 GW
Hyperion (Entergy): 2 GW (operational 2030)
Existing: ~2-3 GW
Total: 5-6 GW

Problem: If 2026 capex is "significantly larger," Meta needs 8-10 GW by 2028. That's a 3-4 GW gap.

The Risk: In 2022-23, crypto miners built data centers that sat idle for months couldn't secure power fast enough.

Meta's Advantage: Deep pockets + 10-year commitments make utilities build dedicated infrastructure (Entergy's $1.2B transmission line).

What to Watch: Meta needs 2-3 GW in new power purchase agreements by mid-2026. If not, physical constraints force spending cuts.

Wall Street: Deeply Divided

The Bulls

Wedbush (Dan Ives):

"While the bears will continue to yell 'AI Bubble' from their 
hibernation caves, we continue to point to this tech cap-ex 
supercycle that is driving this 4th Industrial Revolution."

Rating: Outperform
Target: $900 (84% upside)

Bank of America:

Rating: Buy
Sees 2026 as AI product revenue inflection

The Bears

DA Davidson (Gil Luria):

"The levering up is the truly unhealthy behavior. If they get 
stuck with this capacity, they won't have anything to do with it."

Vanguard (Joe Davis):

"This has been an important backstop for the economy... but 
company-level ROI still unproven."

Consensus

Current: $487 (down from $545 pre-earnings)
Median target: $600 (23% upside)
Ratings: 42 Buy, 8 Hold, 2 Sell

Translation: Cautiously optimistic long-term, wants proof in 2-3 quarters.

The Analyst Flip-Flop Chronicles: A 24-Hour Masterclass in Changing Your Mind

Before we look ahead to 2026, let's appreciate the comedy of errors that unfolded on October 29-30. Wall Street analysts, who'd been cheerleading Meta's AI spending for months, did a collective 180° in less than 24 hours.

The Downgrades That Weren't Supposed to Happen

Oppenheimer (Oct 30, 9:00 AM):

Previous rating: Outperform ($696 price target)
New rating: Perform (removed price target entirely)
Their reasoning: "Significant investment in Superintelligence despite unknown revenue opportunity mirrors 2021/2022 Metaverse spending."

Wait, what? The Metaverse comparison is exactly what bears had been saying for months. Oppenheimer is now citing... the exact concern they'd been dismissing?

Benchmark (Oct 30, 9:15 AM):

Previous rating: Buy
New rating: Hold
Explanation: Concerns about spending trajectory

These are the same analysts who, 48 hours earlier, were modeling Meta at $800-900.

The Price Target Massacre

Firm	Old Target	New Target	% Cut	Timing
Bank of America	$900	$810	-10%	Oct 30
RBC Capital	$840	$810	-3.6%	Oct 30
Barclays	$850	$800	-5.9%	Oct 30
JPMorgan	$875	$825	-5.7%	Oct 30
Wells Fargo	$900	$850	-5.6%	Oct 30

Source: CNBC, Schwab Network analyst reports, Oct 30, 2025

The irony? Every single one of these firms had raised their price targets in the previous 30 days, citing Meta's "strong AI positioning."

What Changed in 24 Hours?

Absolutely nothing fundamental changed. Meta's Q3 earnings were strong:

Revenue beat by 3.7%
EPS beat by 8.4%
User growth exceeded estimates
Ad business accelerating

The only new information was:

2025 capex raised by $4B (to $70-72B range)
2026 will be "significantly larger"

But here's the kicker: We already knew this. Meta had been telegraphing higher spending for months. The Blue Owl deal ($27B) was announced October 21. The Scale AI stake ($14.3B) was September 30.

The "Nobody Could Have Seen This Coming" Defense

Oppenheimer's downgrade note is a masterpiece of revisionist history:

"Significant investment in Superintelligence despite unknown 
revenue opportunity..."

Translation: "We just realized that spending $600B on AI without a business plan might be risky."

Question: Where was this insight on October 28, when the price target was $696?

What Actually Happened

Analysts got caught in groupthink. When Meta's stock was rising (up 60% YTD through Oct 28), AI spending was "visionary leadership." When it dropped 11% in a day, the same spending became "reckless gambling."

The Tell: Not a single analyst downgrade mentioned new information. Every concern cited ROI uncertainty, 2026 acceleration, Metaverse comparison had been true for months.

The Lesson

Wall Street doesn't predict the future. It narrates the present.

When stock goes up → Spending is strategic
When stock goes down → Spending is problematic

The Real MVP: The Feb 2024 analyst who maintained a Hold rating the whole time. Status: Still employed, still correct.

Why This Matters

These aren't just embarrassing flip-flops. They expose a deeper issue: Most analysts can't distinguish between speculation and strategy until the market tells them which is which.

The data that spooked Wall Street on October 30 was available on October 28. The difference wasn't the data it was the narrative.

And that's why, in the next section, we're focusing on concrete metrics rather than analyst sentiment.

Three Key Questions for 2026

1. Can Meta Launch a Paid AI Product?

Watch for:

Meta AI subscription (Q1 2026 rumored)
Business AI tools
Developer API

Success = $10B+ ARR by end of 2026

2. Does Ad Growth Accelerate or Plateau?

Watch for:

Q4 2025 (holiday): 30%+ growth?
Can Meta sustain 25%+ through 2026?

Success = Growth stays above 25%

3. Can Meta Secure Enough Power?

Watch for:

New PPAs (need 3-4 GW by mid-2026)
Hyperion milestones

Success = 8-10 GW total secured by end 2026

Dot-Com Parallels (and Differences)

Similarities

Massive infrastructure deployment
Circular financing
Valuation on potential, not profits
Overbuilding risk
"This time is different" narratives

Critical Differences

Dot-com: Companies burned cash, no profitability path
AI boom: Meta made $24.7B cash flow in Q3

Dot-com: Pets.com had traffic, no business model
AI boom: ChatGPT has 200M+ users, clear (if uncertain) monetization

Dot-com: Fiber had no moat
AI boom: Compute creates barriers to entry

Dot-com: Mostly B2C speculation
AI boom: B2B (Azure, AWS) has clear enterprise ROI

The Key Difference:

Dot-com: Built hoping customers would come
AI: Building because customers are here, but capacity constrained

The Risk: Not that demand is fake, but that supply scales faster than monetization. You can have real demand and still overbuild.

My Take: Calculated Risk or Reckless Bet?

After analyzing October 29-November 3 data:

What Meta Got Right

Timing: Locked $30B bonds before credit tightened
Structure: Off-balance-sheet (Blue Owl) preserves flexibility
Diversification: Multiple providers (CoreWeave, Oracle, Google)
Energy: Early PPAs (ENGIE) secure long-term capacity

What's Risky

Acceleration: "Significantly larger" 2026 with no revenue guidance
Timeline: 5-7 year break-even assumes aggressive growth
Circular exposure: Invested in Scale AI ($14.3B), CoreWeave customer ($14.2B)
Sentiment: Wall Street skeptical, needs proof points

The Verdict

This isn't a traditional bubble there's substance. But it's a leverage bubble.

Meta is using debt (on and off-balance-sheet) to finance infrastructure that won't generate returns for 5-10 years, betting that AI transforms advertising and creates new product categories.

If right: Meta emerges as AI infrastructure leader with $50B+ in new revenue streams by 2030.

If wrong: $600B in sunk costs, years of depreciation, potential activist pressure.

The Probability: I'd put it at 60/40 in Meta's favor. The ad business validates near-term ROI. But the gap between spending ($80B/year) and incremental revenue ($7B/year) is uncomfortably wide.

What Changes My Mind:

Bullish: Meta launches paid AI hitting $5B+ ARR in 2026
Bearish: Ad growth slows below 20% while capex accelerates

The Bottom Line

One week after Meta announced the biggest AI infrastructure bet in history, Wall Street sent a clear message: Show us the money.

The $30 billion bond offering and 11% stock drop crystallize the central tension in tech right now:

Companies believe: AI is the future; invest now or lose forever
Investors worry: Returns are 5-10 years out; what if timing is wrong?

Meta's bet is that by the time superintelligent AI arrives (Zuckerberg's timeline: "five, seven years, or longer"), it will have:

Infrastructure capacity to train and serve models at scale
Consumer products generating $50B+ annual revenue
Enterprise tools capturing B2B spending
Option to sell excess capacity like AWS

The gamble: All four need to work. If even one fails, the ROI math breaks.

Over the next 12-18 months, we'll find out whether Zuckerberg's vision of "aggressively front-loading capacity" was genius or the most expensive mistake in tech history.

Epilogue: The House Always Wins (Unless You're the House)

Let's recap what just happened in one week:

Mark Zuckerberg:

Spent $30 billion (borrowed)
Committed to spending "significantly larger" amounts in 2026
Gave no revenue targets
Watched $200+ billion in market cap evaporate

His defense? "Better to overbuild than underbuild."

That's not a business strategy. That's a slot machine philosophy.

The Zuckerberg Gambler's Playbook

Traditional investor: "Show me the ROI projections."
Zuckerberg: "If we overbuild, we can sell the excess."

Wall Street: "When does this become profitable?"
Zuckerberg: "In the very worst case, we'd have excess capacity for several years."

Analysts: "Can you quantify the returns?"
Zuckerberg: "We're seeing the returns in the core business." (Refuses to elaborate)

This is the equivalent of doubling down at the blackjack table and saying, "Don't worry, if I lose, I can always get a job as a dealer."

The $600 Billion Bet

Here's what Meta is actually doing:

Year 1-3: Spend $600B
Year 4-7: Hope AI revenue materializes
Year 8-10: Break even (maybe)
Year 11+: Profit (fingers crossed)

Normal companies call this "speculation."
Tech CEOs call it "superintelligence."
Investors call it "Thursday."

The Irony

The funniest part? Zuckerberg's previous big bet was the Metaverse.

2021-2022 Metaverse spending: ~$36 billion
Analyst reaction then: "This is reckless."
Zuckerberg's response: "You don't understand the vision."
Outcome: Legs on avatars, eventually.

2025-2026 AI spending: ~$600 billion
Analyst reaction now: "This is reckless."
Zuckerberg's response: "You don't understand superintelligence."
Outcome: TBD 2033.

The man literally got roasted for the Metaverse, watched his stock crater 70%, and his takeaway was: "I should bet 17× more on the next thing."

That's not learning from mistakes. That's speedrunning them.

Why This Might Actually Work

Here's the uncomfortable truth: Zuckerberg might be right.

Not because the AI bet is smart (jury's out), but because he can afford to be wrong.

Meta generates $100 billion in annual cash flow. Even if AI flops completely, the ad business prints money. The company won't go bankrupt.

Zuckerberg is playing with house money. The house being Instagram reels ads.

The calculation:

Downside: $600B sunk cost over 5 years (he has it)
Upside: Dominant position in AI infrastructure (priceless?)

For a company worth $1.7 trillion (pre-crash), betting $600B on the future isn't gambling.

It's insurance.

The question isn't whether Meta can afford the bet. It's whether shareholders want to watch $600 billion disappear into data centers for 5-7 years before seeing returns.

October 30's -11% drop was their answer.

The Real Lesson

Wall Street hates uncertainty. Zuckerberg thrives in it.

He's not running a public company optimized for quarterly earnings. He's running a private kingdom that happens to be publicly traded.

And because he controls 61% of voting shares through dual-class stock, he doesn't need your permission.

The setup:

You (retail investor): Own shares, no voting power
Zuckerberg: Owns shares, controls Meta completely
The bet: $600B on AI with no timeline

Your options:

Hold and pray
Sell and watch from sidelines
Short and bet against superintelligence

Choose wisely. The house Mark's house always wins.

Postscript: If you're reading this in 2033 and Meta is the AI infrastructure leader generating $200B/year from selling compute... Zuck, I'm available for consulting. My rates are reasonable.

If you're reading this in 2033 and Meta is explaining why they need to write down $400B in stranded data center assets... I told you so. The invoice is in the mail.

Resources:

Further Reading:

Connect

GitHub: @0xReLogic
LinkedIn: Allen Elzayn

Frontend Roundup October 2025: Next.js 16, Astro 5.15, Node 22 LTS, CSS View Transitions, and Vite+

Allen Elzayn — Wed, 29 Oct 2025 13:37:08 +0000

Originally published on My Blog

For code examples, benchmarks, and detailed implementation guides, check out the full article.

October 2025 was packed with major frontend releases. I spent a few days testing these updates in production, and some of the performance improvements are genuinely impressive.

Next.js 16: Turbopack Goes Stable

Next.js 16 dropped on October 21, 2025, and Turbopack is finally the default bundler. I tested it on a project with 200+ components:

Development startup:

Webpack: ~1083ms
Turbopack: ~603ms

That's 44% faster. But the real win is production builds:

Production builds:

Webpack: ~45 seconds
Turbopack: ~17 seconds

2.6x faster builds. This isn't a micro-optimization—it's a game changer for daily workflow.

Cache Components

Next.js 16 also introduces Cache Components, an evolution of PPR that's more explicit and controllable:

// app/actions.ts
import { revalidateTag, unstable_cacheLife as cacheLife } from 'next/cache';

export async function revalidateProducts() {
  await revalidateTag('products', cacheLife({ stale: 60, revalidate: 300 }));
}

What I like: no more implicit caching. Everything is opt-in. Dynamic code executes at request time unless you explicitly cache it.

Breaking Changes

Watch out for:

Node.js 20.9+ required (Node 18 dropped)
Async params everywhere
middleware.ts → proxy.ts

A detailed migration guide with production examples is coming soon.

Astro 5.15: Deployment Skew Protection

Released October 23, 2025. This solves a subtle but annoying problem: users loading old client assets while the server runs new code.

Real scenario I hit:

Deploy with breaking API changes
Users still have old JavaScript cached
API calls fail, error messages unclear

Astro 5.15 fixes this by automatically including deployment IDs in asset requests when deploying to Netlify. Zero config needed.

// Access deployment ID manually if needed
const deploymentId = import.meta.env.NETLIFY_DEPLOYMENT_ID;

Works seamlessly across View Transitions, Server Islands, Prefetch, and Astro Actions.

Full implementation guide with adapter customization coming soon.

Node.js 22 LTS

Node 22.21.0 became LTS on October 20, 2025. Supported until April 2027.

Most notable: native proxy support for HTTP/HTTPS:

export NODE_USE_ENV_PROXY=1
export HTTPS_PROXY=http://proxy.local:8080
node app.js

Useful for:

Enterprise environments with corporate proxies
Development behind firewalls
Testing with proxy tools

Also underrated: --max-old-space-size now accepts percentages:

node --max-old-space-size=50% app.js

Perfect for containerized environments with dynamic memory allocation.

Complete feature breakdown and upgrade checklist coming soon.

CSS View Transitions: Now Baseline

October 14, 2025: View Transitions officially became Baseline Newly available. Supported in all major browsers.

/* Enable for page navigation */
@view-transition {
  navigation: auto;
}

Or trigger manually:

if (document.startViewTransition) {
  document.startViewTransition(() => {
    filterItems();
  });
}

Firefox 144 finally implemented it, meaning:

Chrome 111+
Edge 111+
Firefox 144+
Safari 16.4+

Important: Always wrap in prefers-reduced-motion for accessibility. Users with vestibular disorders can get sick from excessive motion.

Complete guide with practical examples coming soon.

Vite+: Unified JavaScript Toolchain

Evan You announced Vite+ on October 13, 2025 at ViteConf Amsterdam. This isn't just another Vite update—it's a unified toolchain aiming to solve JavaScript's fragmentation problem.

Current landscape:

Different bundlers (Webpack, Rollup, esbuild, Turbopack)
Different dev servers
Different test runners
Different plugin systems
Different configs

Vite+ provides an all-in-one solution:

vite new      # Scaffold projects
vite test     # Run Vitest
vite build    # Production builds
vite dev      # Dev server

Includes:

Project scaffolding
Vitest integration (Jest-compatible)
Browser mode testing
Visual regression testing
Sharding support

Pricing: Free for non-commercial use, paid for commercial projects. Core tools (Vite, Vitest, Oxc) stay open-source.

Still in development, targeting public preview early 2026. Early access at viteplus.dev.

Deep dive into Vite+ architecture coming soon.

Key Takeaways

Performance is real. Next.js 16 with Turbopack isn't hype—44% faster startup and 2.6x faster builds make a noticeable difference in daily workflow.

Deployment reliability matters. Astro's skew protection solves problems you don't realize you have until you hit weird production bugs. Zero-config solutions like this are underrated.

Tooling consolidation is happening. Vite+ shows the industry recognizing the fragmentation problem. A unified toolchain would significantly improve developer experience.

I'm working on detailed implementation guides for:

Next.js 16 migration strategies
Astro 5.15 skew protection testing
Node 22 LTS production deployment

Read the full article with all code examples, benchmarks, and resources →

Connect:

Blog: allenarch.dev
GitHub: @0xReLogic
LinkedIn: Allen Elzayn

Meta's $75B AI Infrastructure Bet: Inside the Biggest Cloud Deals of 2025

Allen Elzayn — Tue, 28 Oct 2025 03:38:36 +0000

Originally published on My Blog

I watched Meta's capital expenditure ratio hit 37% of revenue in Q3 2025.

That's not a typo. Meta is now spending more than a third of every dollar it makes on AI infrastructure. For context, that's nearly double what they spent last year (20%), and it's the highest capex-to-revenue ratio in the company's history.

But here's what really caught my attention: in just three months (September to October 2025), Meta announced $75.5 billion in infrastructure deals. That's more than most countries spend on their entire tech sector in a decade.

The $75 Billion Question Nobody's Asking

When Mark Zuckerberg talks about building "superintelligent" AI systems, most people focus on the models. But the real story is in the infrastructure-and the unprecedented way Meta is financing it.

Between September 30 and October 27, 2025, Meta signed four massive deals:

CoreWeave: $14.2 billion (6+ years)
Oracle: ~$20 billion (multi-year)
Scale AI: $14.3 billion (49% stake)
Blue Owl/Hyperion: $27 billion (joint venture)

Total: $75.5 billion

For perspective, that's:

More than Netflix's entire market cap ($280B) × 0.27
Equivalent to building 15 nuclear power plants
Enough to buy every data center in Ireland twice

But what's really interesting isn't the size-it's the structure.

The Private Credit Revolution: How Meta Broke Wall Street's Playbook

Here's where it gets fascinating.

Traditional tech infrastructure spending works like this: Company makes money → Company spends cash on servers → Company owns infrastructure.

Meta just rewrote that playbook.

The Hyperion Deal: A New Financial Model

On October 21, 2025, Meta announced a joint venture with Blue Owl Capital for the Hyperion data center in Louisiana. The structure is unlike anything I've seen in tech:

Ownership Split:

Blue Owl: 80%
Meta: 20%

Financing Structure:

Morgan Stanley arranged $27B+ debt
$2.5B equity into Special Purpose Vehicle (SPV)
PIMCO as anchor lender (144A bonds, maturing 2049)
Meta received $3B cash distribution upfront

The Kicker:
Meta doesn't own most of it, but they're on the hook for 16 years through a residual value guarantee.

Think about that: Meta is essentially leasing infrastructure they're building, financed by private credit, with a 16-year financial commitment.

Why This Matters

This isn't just creative accounting-it's a fundamental shift in how tech infrastructure gets built.

Traditional Model:

Revenue → CapEx → Owned Assets → Depreciation

New Model:

Revenue → JV Partnership → Leased Assets → Operating Expense

The implications:

Balance sheet stays cleaner (assets off-balance-sheet)
Faster deployment (external capital accelerates build)
Shared risk (partners absorb some downside)
Higher leverage (debt financing amplifies scale)

But there's a catch: if AI doesn't deliver ROI, Meta is still paying rent for 16 years.

The Technical Specs: What $75B Actually Buys

Let me break down what Meta is actually getting for this money.

Hyperion Data Center (Louisiana)

Scale:

Size: 2,250 acres (1,700 football fields)
Power: 2 gigawatts (2,000 megawatts)
Completion: 2030
Location: Richland Parish, Louisiana (between Rayville and Delhi)

For Context:

2GW is enough to power 1.5 million homes
That's more power than some small countries use
Requires dedicated power infrastructure (Entergy building $1.2B transmission line)

What It Can Do:
Train models the size of GPT-4 multiple times simultaneously. We're talking about infrastructure that can handle:

Hundreds of thousands of GPUs
Petabytes of training data
Months-long training runs without interruption

CoreWeave Infrastructure

Hardware:

Nvidia GB300 server racks
72 Blackwell GPUs per rack
Access through December 2031 (optional extension to 2032)

Why GB300 Matters:
The Blackwell architecture (GB300) is Nvidia's latest generation. Each GPU delivers:

2.5× performance vs previous gen (H100)
Better power efficiency (critical for multi-GW facilities)
Native support for FP4 precision (faster inference)

Real-World Impact:
According to CoreWeave's SEC filing (September 30, 2025), this infrastructure can reduce training time for large models from months to weeks.

Oracle Cloud Computing

Confirmed Details:

$20 billion multi-year contract
Part of $65 billion in Oracle Cloud Infrastructure (OCI) bookings in single 30-day period
Announced October 16, 2025

Oracle's Projections:

FY2030: $166 billion cloud infrastructure revenue
Cloud gross margins: 35% target
AI database revenue: $20B by FY2030

What Meta Gets:
Flexible cloud capacity for:

Inference workloads (serving AI models to users)
Distributed training (across multiple data centers)
Backup and redundancy (if owned infrastructure fails)

Energy Infrastructure: The Hidden Cost

On October 27, 2025, Meta signed a deal with ENGIE for 1.3 GW of solar power across four Texas projects.

Key Project: Swenson Ranch Solar

Capacity: 600 MW
Location: Stonewall County, Texas
Meta's Commitment: 100% of output
Operational: 2027
Significance: ENGIE's largest solar project (11 GW total portfolio)

Why This Matters:
AI training is energy-intensive. A single training run for a large language model can consume:

1,287 MWh (equivalent to 120 US homes for a year)
Cost: $100,000+ in electricity alone

Source: Stanford HAI Report, 2024

With 1.3 GW of dedicated solar, Meta can:

Reduce energy costs by ~30% vs grid power
Meet sustainability commitments
Insulate against energy price volatility

The Numbers That Made Me Skeptical

I'm a numbers person, so let's talk about the economics.

Meta's Spending Trajectory

Year	Total CapEx	% of Revenue	YoY Growth
2024	~$50B	20%	-
2025	$66-72B	37%	+44%
2026 (est)	~$97B	TBD	+35%

Source: Meta Q3 2025 Earnings, Wall Street consensus estimates

Through 2028:
Meta has outlined plans to invest $600 billion in US data centers and infrastructure.

That's $200 billion per year for three years.

The ROI Question

Here's where it gets uncomfortable.

Current AI Revenue:
Meta doesn't break out AI-specific revenue, but analysts estimate:

AI-enhanced ads: ~$5-8B incremental revenue (2025)
Direct AI products: Minimal (mostly free)

Infrastructure Costs:

2025 CapEx: $66-72B
Operating costs: Additional $10-15B/year (estimated)

Simple Math:

Revenue from AI: ~$7B
Cost of AI infrastructure: ~$80B
Net: -$73B

That's a 10:1 cost-to-revenue ratio.

When Does This Pay Off?

Meta's bet is that AI will:

Improve ad targeting (higher CPMs, better conversion)
Enable new products (AI assistants, business tools)
Create infrastructure business (sell excess capacity)

Bull Case Timeline:

2026-2027: AI products launch, revenue ramps
2028-2029: Infrastructure-as-a-service business scales
2030+: Positive ROI on cumulative investment

Bear Case:
AI doesn't deliver proportional revenue growth, and Meta is stuck with:

$600B in sunk costs
16-year lease commitments
Massive depreciation expenses

The Bubble Question: Are We in Dot-Com 2.0?

I had to ask: is this sustainable?

The Bull Case: "It's Different This Time"

Argument 1: Cash Flow
Unlike dot-com startups, today's AI giants generate massive cash:

Meta Q3 2025 operating cash flow: $24.7B
Microsoft: $30B+ per quarter
Google: $25B+ per quarter

Source: Company earnings reports, Q3 2025

Argument 2: Real Demand
AI infrastructure is being used:

ChatGPT: 200M+ weekly active users
GitHub Copilot: 1.8M+ paid subscribers
Enterprise AI: $50B+ market (2025)

Source: OpenAI Blog, GitHub Stats

Argument 3: Defensive Necessity
Companies aren't spending because they want to-they're spending because they have to. If Meta doesn't build this infrastructure, Google or Microsoft will, and Meta loses competitive position.

The Bear Case: "Circular Financing Red Flags"

Concern 1: Vendor Financing
Nvidia is investing in its customers (OpenAI, CoreWeave), who then buy Nvidia chips. That's circular.

Example:

Nvidia invests $5B in Intel (October 2025)
Nvidia commits $6.3B to buy CoreWeave capacity through 2032
CoreWeave uses that to buy more Nvidia chips

DA Davidson analyst Gil Luria told Yahoo Finance (October 14, 2025):

"They're using that capital to raise debt. It's the levering up that's the truly unhealthy behavior."

Concern 2: Valuation Disconnect
CoreWeave market cap: $67 billion
CoreWeave contracted revenue: $43 billion (OpenAI + Meta + Nvidia)

That's a 1.6× revenue multiple for a company that:

Isn't profitable yet
Has 71% revenue concentration (Microsoft)
Operates in capital-intensive business

Concern 3: Macro Dependency
Deutsche Bank analysis (September 2025) found:

"Without AI-related investment, the US economy might already be in a recession."

AI spending accounted for 1.1% of US GDP growth in H1 2025.

Source: Deutsche Bank Research, September 2025

If AI spending slows, it could trigger broader economic weakness.

My Take: Bubble with Substance

Here's what I think after digging through the data:

Yes, there are bubble characteristics:

Circular financing
Aggressive valuations
Hype-driven investment

But there's real substance:

Actual products with millions of users
Cash-generative businesses (not dot-com burn rates)
Infrastructure that will be useful regardless (compute demand is real)

The Risk:
Not that AI is fake, but that the timing and scale of returns don't match the timing and scale of investment.

Meta might be right about AI's importance but wrong about how quickly it generates revenue.

The Competitive Landscape: Who's Spending What

Meta isn't alone in this infrastructure arms race.

Big Tech AI CapEx (2025)

Company	2025 CapEx	AI Focus	Key Deals
Meta	$66-72B	37% of revenue	CoreWeave, Oracle, Blue Owl
Microsoft	$125B (FY26 est)	Azure AI, Copilot	OpenAI partnership
Google	$82.4B	TPUs, Anthropic	1M TPU chips to Anthropic
Amazon	~$75B	AWS AI services	Trainium chips
Apple	~$30B	On-device AI	Apple Silicon

Source: Company earnings, Bank of America estimates, FactSet

Total: ~$400 billion in AI infrastructure spending across Big Tech in 2025.

The CoreWeave Factor

CoreWeave has become the infrastructure kingmaker:

Major Contracts:

OpenAI: $22.4 billion
Meta: $14.2 billion
Nvidia: $6.3 billion (backstop)

Total: $42.9 billion in contracted revenue through 2032.

Stock Performance:

IPO: March 2025
Current: +235% YTD
Market cap: $67 billion

But here's the concern: 71% of revenue comes from Microsoft (Q2 2025). If Microsoft builds its own infrastructure or switches providers, CoreWeave's business model breaks.

What I Learned: Three Key Insights

After spending weeks researching this, three things stand out:

1. Infrastructure Is the New Moat

In the AI era, the competitive advantage isn't just algorithms-it's infrastructure.

Meta can't compete with OpenAI on model quality alone. But if Meta has:

2GW of dedicated compute
Exclusive access to latest Nvidia chips
Vertically integrated stack (data centers → models → products)

Then Meta can iterate faster, train bigger models, and serve more users.

The Insight:
AI is becoming an infrastructure business, not just a software business.

2. Private Credit Is Reshaping Tech Finance

The Hyperion deal represents a new model: infrastructure-as-a-service financed by private credit.

Traditional tech companies avoided debt (except Apple). But AI infrastructure is so capital-intensive that equity financing alone can't scale fast enough.

The Shift:

Old: Equity → CapEx → Owned assets
New: Debt → JV → Leased assets

The Risk:
If AI doesn't deliver, tech companies are stuck with debt obligations they can't service.

3. The Winner Isn't Clear Yet

Everyone's spending billions, but nobody knows what the winning AI product looks like.

Current Revenue Leaders:

ChatGPT Plus: ~$2B ARR (estimated)
GitHub Copilot: ~$1B ARR
Midjourney: ~$500M ARR

Infrastructure Spending:

Meta: $75B in deals (Q3-Q4 2025)
Microsoft: $125B (FY26)
Google: $82B (2025)

The Math:
$400B+ in infrastructure spending to support ~$10B in AI product revenue.

That's a 40:1 investment-to-revenue ratio.

The Question:
Will AI revenue scale 40× in the next 3-5 years? Or will infrastructure spending collapse when companies realize the ROI isn't there?

The Trade-offs: What Meta Is Betting On

Let me be honest about what could go wrong.

The Good

1. Scale Advantage
With 2GW+ of dedicated infrastructure, Meta can:

Train models faster than competitors
Serve billions of users without cloud costs
Experiment with new architectures cheaply

2. Vertical Integration
Owning the full stack (data centers → chips → models → products) means:

No vendor lock-in
Better margins long-term
Faster iteration cycles

3. Optionality
Even if Meta's AI products fail, the infrastructure has value:

Sell excess capacity (like AWS)
Lease to other companies
Repurpose for other workloads

The Not-So-Good

1. Massive Capital Commitment
$600B through 2028 is irreversible. If AI doesn't deliver:

Sunk costs can't be recovered
Depreciation hits earnings for years
Shareholder pressure mounts

2. Execution Risk
Building 2GW data centers is hard:

Construction delays (Hyperion target: 2030)
Power infrastructure challenges
Cooling and energy efficiency

3. Competitive Pressure
If Google or Microsoft's AI products win, Meta's infrastructure advantage doesn't matter. Users will use the best product, regardless of who has the biggest data center.

The Cost Calculation That Surprised Me

Let me show you the math that made me realize how big this bet really is.

Meta's AI Infrastructure Costs (2025-2028):

CapEx (2025-2028): $600B
Operating costs (4 years @ $15B/year): $60B
Total: $660B

Break-even Scenarios:

Scenario 1: Ad Revenue Improvement

Assumption: AI improves ad targeting, increasing CPM by 10%
Meta's 2025 ad revenue: ~$150B
10% improvement: $15B/year
Years to break even: 44 years

Scenario 2: New AI Products

Assumption: Meta launches AI assistant, business tools
Target: $50B/year in new revenue by 2030
Years to break even: 13 years

Scenario 3: Infrastructure-as-a-Service

Assumption: Meta sells 30% of excess capacity
Pricing: $0.50/GPU-hour (market rate)
Potential revenue: $20B/year
Years to break even: 33 years

Reality Check:
Meta needs a combination of all three scenarios to hit reasonable ROI timelines (5-7 years).

Your Turn

I've been thinking about this for weeks, and I keep coming back to one question:

Is Meta making the smartest bet in tech history, or the most expensive mistake?

The bull case is compelling: AI is transformative, infrastructure is a moat, and Meta has the cash flow to wait for returns.

The bear case is scary: $600B in sunk costs, 16-year lease commitments, and no clear path to proportional revenue growth.

What do you think? Are we watching the birth of the next AWS, or the next WeWork?

Drop a comment below-I'd love to hear your take, especially if you're working in AI infrastructure or have insights on the economics.

Resources

For more deep-dives on AI infrastructure and cost optimization:

DeepSeek-OCR: When a Picture Is Worth 10× Fewer Tokens

Connect

-GitHub: @0xReLogic
-Linkedin: Allen Elzayn

DeepSeek-OCR: When a Picture Is Actually Worth 10 Fewer Tokens

Allen Elzayn — Sun, 26 Oct 2025 20:44:39 +0000

Published: October 26, 2025

Model Version: DeepSeek-OCR v1 (Oct 20, 2025)

Last Verified: October 26, 2025

Originally published on My Blog

I spent three hours last week watching my API costs balloon because of one document.

Not a video. Not a massive dataset. Just a 10-page PDF that needed OCR processing. The problem? Traditional OCR pipelines were spitting out thousands of tokens that my LLM had to chew through. Every. Single. Page.

That's when I stumbled upon DeepSeek-OCR, and honestly, the numbers looked too good to be true.

The Token Problem Nobody Talks About

Here's the thing about modern LLMs: they're expensive. Not because the models are bad, but because context windows eat tokens like candy.

Let's say you're building a document processing pipeline. You scan an invoice, extract text with OCR, then feed it to GPT-4 for analysis. Simple, right? But that 1000-word document becomes 1000+ tokens. Multiply that by hundreds of documents daily, and suddenly you're bleeding money.

Traditional OCR treats text as... well, text. One character, one token. Makes sense, until you realize there might be a smarter way.

What if Text Could Be Compressed Visually?

DeepSeek-OCR flips the script completely. Instead of converting images to text tokens, it keeps them as vision tokens -compressed visual representations that carry the same information but use way fewer tokens.

Think of it like this: you could describe a stop sign with 50 words, or you could just show someone the octagon shape and red color. Same information, drastically different bandwidth.

The team at DeepSeek asked a fascinating question: "For a document with 1000 words, how many vision tokens do we actually need to decode it accurately?"

The answer shocked me: around 100 tokens. That's a 10× compression.

The Architecture: Two Parts, One Goal

DeepSeek-OCR uses a two-stage pipeline that's surprisingly elegant.

Stage 1: DeepEncoder (~380M parameters)

This is the compression engine. It takes high-resolution document images and squeezes them into a minimal set of vision tokens while keeping activations low. The secret sauce? It combines SAM-base (80M) and CLIP-large (300M) in series with a 16× convolutional compressor.

What I love about this design: it doesn't just blindly reduce tokens. It maintains low activation memory even with massive images, which means you won't run into GPU memory issues with large documents.

Stage 2: MoE Decoder (~3B parameters)

The decoder (DeepSeek3B-MoE-A570M) takes those compressed vision tokens and reconstructs the text. It uses a Mixture-of-Experts architecture, which basically means different "expert" networks handle different parts of the task in parallel.

Here's where it gets interesting: the decoder doesn't just do OCR. It understands layout, preserves formatting, and can output structured Markdown. It's not reading text it's understanding documents.

Show Me the Numbers

I'm a skeptic by nature, so I needed concrete data. Here's what the benchmarks show:

Compression vs. Accuracy Trade-off

According to the arXiv paper (v1):

"When the number of text tokens is within 10 times that of vision tokens (i.e., a compression ratio < 10), the model can achieve decoding (OCR) precision of 97%. Even at a compression ratio of 20, the OCR accuracy still remains at about 60%."

Let me break this down:

10× compression: ~97% precision (nearly lossless)
20× compression: ~60% accuracy (acceptable for certain use cases)

The sweet spot is clearly around 10×, where you get massive token savings without sacrificing quality.

OmniDocBench: The Real Performance Test

The team tested DeepSeek-OCR against two popular alternatives on OmniDocBench. The results are pretty stark:

Model	Tokens per Page	Performance
GOT-OCR 2.0	256	Baseline
DeepSeek-OCR	~100	Better
MinerU 2.0	6000+	Worse

Source: arXiv v1, DeepSeek Blog

DeepSeek-OCR beats GOT-OCR 2.0 while using 60% fewer tokens. And compared to MinerU 2.0? It's not even close under 800 tokens vs 6000+.

Production Throughput

If you're wondering about real-world performance, the numbers from their official blog are wild:

Single A100-40G: 200,000+ pages per day
20-node cluster (160× A100): 33 million pages per day

With vLLM, they're seeing ~2,500 tokens/s for PDF processing on an A100-40G (source: GitHub README).

Getting Your Hands Dirty: Setup

I tried this on my local setup. Here's what you need:

Environment Setup

# Create fresh conda environment
conda create -n deepseek-ocr python=3.12.9 -y
conda activate deepseek-ocr

# Install PyTorch (CUDA 11.8)
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 \
  --index-url https://download.pytorch.org/whl/cu118

# Clone the repo
git clone https://github.com/deepseek-ai/DeepSeek-OCR.git
cd DeepSeek-OCR

# Install dependencies
pip install -r requirements.txt

# Optional but recommended: FlashAttention
pip install flash-attn==2.7.3 --no-build-isolation

One gotcha: if you're using vLLM, you'll need the 0.8.5 wheel for CUDA 11.8. Download it from vLLM releases before installing.

Quick Start with Transformers

The simplest way to test it:

from transformers import AutoModel, AutoTokenizer
import torch

model_name = 'deepseek-ai/DeepSeek-OCR'

# Load model with FlashAttention (faster)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(
    model_name,
    _attn_implementation='flash_attention_2',  # or 'eager' if no FlashAttn
    trust_remote_code=True,
    use_safetensors=True,
)
model = model.eval().cuda().to(torch.bfloat16)

# Run OCR on a document image
prompt = "<image>\n<|grounding|>Convert the document to markdown."
image_file = 'invoice.jpg'

result = model.infer(
    tokenizer,
    prompt=prompt,
    image_file=image_file,
    base_size=1024,     # Base resolution
    image_size=640,      # Processing size
    crop_mode=True,
    save_results=True,
)

For Production: vLLM

If you need speed at scale, vLLM is the way to go:

from vllm import LLM, SamplingParams
from vllm.model_executor.models.deepseek_ocr import NGramPerReqLogitsProcessor
from PIL import Image

# Initialize model
llm = LLM(
    model="deepseek-ai/DeepSeek-OCR",
    enable_prefix_caching=False,
    mm_processor_cache_gb=0,
    logits_processors=[NGramPerReqLogitsProcessor]
)

# Batch processing
images = [Image.open(f"page_{i}.png").convert("RGB") for i in range(10)]
prompt = "<image>\nFree OCR."

inputs = [
    {"prompt": prompt, "multi_modal_data": {"image": img}}
    for img in images
]

sampling = SamplingParams(
    temperature=0.0,
    max_tokens=8192,
    extra_args=dict(
        ngram_size=30,
        window_size=90,
        whitelist_token_ids={128821, 128822},  # for tables
    ),
)

outputs = llm.generate(inputs, sampling)

Note: As of October 2025, DeepSeek-OCR is officially supported in upstream vLLM (source: vLLM docs).

Resolution Modes: Pick Your Poison

The model supports different resolution modes depending on your needs:

Mode	Resolution	Vision Tokens	Use Case
Tiny	512×512	64	Simple text/slides
Small	640×640	100	Books, reports
Base	1024×1024	256	Standard documents
Large	1280×1280	400	High-detail docs
Gundam	Dynamic	Variable	Complex layouts

I typically use Small (100 tokens) for most documents. It hits the sweet spot between quality and token efficiency.

For newspapers or documents with complex tables, Gundam mode (dynamic resolution) automatically tiles the image and uses more tokens where needed.

What Surprised Me: The Trade-offs

After testing this for a week, here's what I learned:

The Good

Token savings are real. I processed 50 invoices that would normally cost me ~$2 in API fees. With DeepSeek-OCR doing the heavy lifting and only sending compressed context to my LLM? Under $0.30.

Layout preservation works. The Markdown output actually respects document structure. Tables stay as tables. Headings stay as headings. This is huge for downstream processing.

Multilingual support is solid. I threw Chinese, Arabic, and mixed-language documents at it. No complaints.

The Not-So-Good

20× compression is tempting but risky. At 60% accuracy, you'll catch most content but miss details. Fine for rough drafts, dangerous for legal docs or financial statements.

Complex nested tables struggle. If your PDF has tables within tables with merged cells, expect some manual cleanup.

GPU memory matters. You need a decent GPU. I tested on an RTX 3090 (24GB) and it was smooth. Anything below 16GB VRAM might struggle with large documents in high-resolution modes.

Where This Gets Interesting: Future Implications

The paper (source: arXiv v1) hints at something fascinating: using vision-text compression for long-context memory in LLMs.

Think about it: instead of storing conversation history as thousands of text tokens, what if you compressed older context into vision tokens? You could keep way more history in memory without hitting context limits.

It's like how humans remember conversations we don't replay every word in our head, we remember visual snapshots and key moments.

The researchers call it "historical long-context compression and memory forgetting mechanisms in LLMs." I call it the future of context management.

The Cost Calculation That Made Me Switch

Let me show you why this matters financially.

Before DeepSeek-OCR:

10-page report = ~10,000 text tokens (1000/page)
GPT-4 input cost: $3 per 1M tokens
Cost per report: $0.03
1000 reports/day: $30/day = $900/month

After 10× compression:

10-page report = ~1,000 vision tokens (100/page)
GPT-4 input cost: $3 per 1M tokens
Cost per report: $0.003
1000 reports/day: $3/day = $90/month

That's an $810/month savings on input tokens alone. For a small startup processing thousands of documents daily, this is the difference between profitable and bleeding money.

(Cost math based on standard GPT-4 pricing and compression ratios from DeepSeek Blog)

When Should You Use This?

DeepSeek-OCR makes sense if:

You're processing high volumes of documents (hundreds to thousands daily)
Your documents have consistent layouts (invoices, forms, reports)
You need structured output (Markdown, not just raw text)
You want to reduce LLM API costs significantly
You have GPU infrastructure (or can spin it up)

It's probably overkill if:

You process 10-20 documents per month (traditional OCR is fine)
You need 100% accuracy on every character (critical legal/medical docs)
You don't have GPU access and can't justify cloud costs

What I'm Building With This

I'm working on a document processing pipeline for technical RFPs (Request for Proposals). These things are monsters 50-100 pages, tables everywhere, multiple formats.

Before DeepSeek-OCR, I was using Azure Form Recognizer → text extraction → GPT-4 analysis. It worked, but the token counts were killing me.

Now I'm feeding everything through DeepSeek-OCR first. It compresses the visual layout into ~100 tokens per page, preserves table structure, and the downstream GPT-4 analysis is 10× cheaper.

The pipeline runs on a single A100 instance on Lambda Labs (~$1.10/hour). I process an entire RFP batch in under an hour. Previously, between OCR services and API costs, each batch ran me $50-80. Now it's under $5.

Try It Yourself

The model is fully open source under MIT license. Everything you need:

Paper: arXiv:2510.18234
Model: HuggingFace
Code: GitHub
Docs: vLLM Recipe

I'd start with the Transformers example first to get a feel for it, then move to vLLM if you need production speed.

What I Learned

Three key takeaways from this experiment:

Token compression isn't just about size it's about cost. The ability to represent 1000 words with 100 visual tokens fundamentally changes the economics of document processing at scale.
Vision-language models are underutilized. We think of them for image Q&A, but their real power might be in efficient information representation. This feels like early days of what's possible.
Open source is eating AI's lunch. DeepSeek-OCR is MIT licensed, performant, and costs nothing to run locally. Three years ago, this capability would've been a proprietary API charging per page.

Your Turn

Have you hit the token wall with document processing? I'm curious what problems you're trying to solve and whether this approach would work for your use case.

Drop a comment or ping me I'd love to hear what you build with this.

Further Reading:

For deeper technical details, check out:

All benchmarks and quotes in this article are sourced from the official DeepSeek-OCR paper (arXiv v1, October 2025), official blog posts, and README documentation. Links provided throughout for verification.

Building Streaky: Zero-Cost Production Architecture (Part 4)

Allen Elzayn — Sat, 25 Oct 2025 19:09:45 +0000

*Originally published on My Blog

How I built a production app that handles 1000+ users for $0/month using free tiers.

Part 4: Running Production on Free Tiers

In Part 1, I shared the journey from sequential to distributed processing. In Part 2, I explained the Rust VPS proxy. In Part 3, I dove deep into the distributed queue system.

Now, let's talk about the most satisfying part: running a production app that handles 1000+ users for $0/month.

The Challenge

Building a production app is one thing. Running it sustainably is another.

Requirements:

Handle current load (10 users/day)
Scale to 1000+ users without code changes
99.9% uptime
Fast response times (< 5 seconds)
Secure (encryption, authentication)
Zero cost (or as close as possible)

The constraint: I can't afford $50-100/month for infrastructure, and my app must survive entirely on free-tier magic, caffeine, and sheer willpower.

The Stack

After evaluating options, I settled on this stack:

Frontend: Next.js on Vercel (Free tier)
Backend: Cloudflare Workers + D1 (Free tier)
Proxy: Rust on Koyeb (Free tier)
Total cost: $0/month

Let's break down each component.

Component 1: Frontend (Vercel)

Tech: Next.js 15, React 19, Tailwind CSS, shadcn/ui

Deployment: Vercel

Free tier limits:

100 GB bandwidth/month
Unlimited deployments
Automatic HTTPS
Edge network (global CDN)
Serverless functions (100 GB-hours)

Current usage:

Bandwidth: ~2 GB/month (mostly static assets)
Deployments: ~20/month (development + production)
Functions: ~1 GB-hour/month (NextAuth.js)

Headroom: 50x capacity

Why Vercel?

Best Next.js experience (they built it)
Zero config deployment (git push = deploy)
Automatic preview deployments
Edge network (fast globally)
Generous free tier

Configuration:

// package.json
{
  "name": "frontend",
  "version": "0.1.0",
  "scripts": {
    "dev": "next dev --turbopack",
    "build": "next build",
    "start": "next start"
  },
  "dependencies": {
    "next": "15.5.5",
    "react": "19.1.0",
    "next-auth": "^5.0.0-beta.29",
    "tailwindcss": "^3.4.0"
  }
}

Deployment:

# Connect GitHub repo to Vercel
# Every push to main = automatic deployment
git push origin main

Cost: $0/month

Component 2: Backend API (Cloudflare Workers)

Tech: Hono framework, TypeScript

Deployment: Cloudflare Workers

Free tier limits:

100,000 requests/day
10ms CPU time per request
128 MB memory per request
Unlimited bandwidth

Current usage:

Requests: ~50/day (cron + API calls)
CPU time: ~3 seconds/day (distributed across requests)
Memory: ~20 MB per request

Headroom: 2000x capacity

Why Cloudflare Workers?

Edge network (deployed globally)
Fast cold starts (< 10ms)
No servers to manage
Generous free tier
Built-in cron triggers

Configuration (wrangler.toml):

name = "streaky"
main = "src/index.ts"
compatibility_date = "2025-10-11"

# Observability
[observability]
enabled = true

# D1 Database Binding
[[d1_databases]]
binding = "DB"
database_name = "streaky-db"
database_id = "your-database-id"

# Analytics Engine Binding
[[analytics_engine_datasets]]
binding = "ANALYTICS"
dataset = "streaky_metrics"

# Cron Triggers - Daily at 12:00 UTC
[triggers]
crons = ["0 12 * * *"]

# Service Bindings
[[services]]
binding = "SELF"
service = "streaky"

# Environment Variables
[vars]
VPS_URL = "https://your-vps-url.koyeb.app"

Deployment:

cd web/backend
npx wrangler deploy

Cost: $0/month

Component 3: Database (Cloudflare D1)

Tech: SQLite (via D1)

Free tier limits:

5 GB storage
5 million reads/day
100,000 writes/day

Current usage:

Storage: ~50 MB (users, notifications, queue)
Reads: ~100/day (user queries, queue checks)
Writes: ~50/day (queue updates, notifications)

Headroom: 2000x capacity

Why D1?

SQLite (familiar, powerful)
Integrated with Workers (no network latency)
Generous free tier
Automatic backups
No connection pooling needed

Schema:

-- Users table
CREATE TABLE users (
  id TEXT PRIMARY KEY,
  github_username TEXT NOT NULL UNIQUE,
  github_pat TEXT,
  discord_webhook TEXT,
  telegram_token TEXT,
  telegram_chat_id TEXT,
  is_active INTEGER DEFAULT 1,
  created_at TEXT DEFAULT (datetime('now')),
  updated_at TEXT DEFAULT (datetime('now'))
);

-- Notifications table
CREATE TABLE notifications (
  id TEXT PRIMARY KEY,
  user_id TEXT NOT NULL,
  channel TEXT NOT NULL,
  status TEXT NOT NULL,
  error_message TEXT,
  sent_at TEXT DEFAULT (datetime('now')),
  FOREIGN KEY (user_id) REFERENCES users(id)
);

-- Cron queue table
CREATE TABLE cron_queue (
  id TEXT PRIMARY KEY,
  user_id TEXT NOT NULL,
  batch_id TEXT NOT NULL,
  status TEXT NOT NULL,
  created_at TEXT DEFAULT (datetime('now')),
  started_at TEXT,
  completed_at TEXT,
  error_message TEXT,
  retry_count INTEGER DEFAULT 0
);

-- Indexes
CREATE INDEX idx_cron_queue_status ON cron_queue(status);
CREATE INDEX idx_cron_queue_batch ON cron_queue(batch_id);
CREATE INDEX idx_notifications_user ON notifications(user_id);

Management:

# Create database
npx wrangler d1 create streaky-db

# Run migrations
npx wrangler d1 execute streaky-db --file=schema.sql

# Query database
npx wrangler d1 execute streaky-db --command="SELECT * FROM users"

Cost: $0/month

Component 4: Notification Proxy (Koyeb)

Tech: Rust, Axum framework

Deployment: Koyeb (Docker)

Free tier limits:

512 MB RAM
0.1 vCPU
100 GB bandwidth/month
2.5 GB disk

Current usage:

RAM: ~20 MB (idle), ~40 MB (peak)
CPU: < 1% (idle), ~5% (peak)
Bandwidth: ~500 MB/month
Disk: 85 MB (Docker image)

Headroom: 10x capacity (RAM is the bottleneck)

Why Koyeb?

Generous free tier (512 MB RAM)
Docker support
Automatic HTTPS
Global edge network

Dockerfile:

# Build stage
FROM rust:1.83-slim AS builder

WORKDIR /app

# Install dependencies
RUN apt-get update && apt-get install -y \
    pkg-config \
    libssl-dev \
    && rm -rf /var/lib/apt/lists/*

# Copy manifests
COPY Cargo.toml Cargo.lock ./

# Copy source code
COPY src ./src

# Build release binary
RUN cargo build --release

# Runtime stage
FROM debian:bookworm-slim

WORKDIR /app

# Install runtime dependencies
RUN apt-get update && apt-get install -y \
    ca-certificates \
    libssl3 \
    && rm -rf /var/lib/apt/lists/*

# Copy binary from builder
COPY --from=builder /app/target/release/streaky-server /app/

# Create non-root user
RUN useradd -r -s /bin/false appuser && \
    chown appuser:appuser /app/streaky-server
USER appuser

# Expose port
EXPOSE 8000

# Run application
CMD ["./streaky-server"]

Deployment:

Push to GitHub
Connect Koyeb to repo
Configure build (Dockerfile path: server/Dockerfile)
Set environment variables (ENCRYPTION_KEY, VPS_SECRET)
Deploy

Cost: $0/month

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                    User Browser                             │
└────────────────────────┬────────────────────────────────────┘
                         │
                         │ HTTPS
                         ▼
┌─────────────────────────────────────────────────────────────┐
│              Vercel (Next.js Frontend)                      │
│  • Static pages (HTML, CSS, JS)                            │
│  • NextAuth.js (GitHub OAuth)                              │
│  • Edge network (global CDN)                               │
│  Cost: $0/month                                            │
└────────────────────────┬────────────────────────────────────┘
                         │
                         │ HTTPS API calls
                         ▼
┌─────────────────────────────────────────────────────────────┐
│         Cloudflare Workers (Backend API)                    │
│  • Hono framework                                           │
│  • D1 database (SQLite)                                    │
│  • Service Bindings (distributed cron)                     │
│  • Analytics Engine                                         │
│  Cost: $0/month                                            │
└────────────────────────┬────────────────────────────────────┘
                         │
                         │ HTTPS + Auth Header
                         │ (Encrypted credentials)
                         ▼
┌─────────────────────────────────────────────────────────────┐
│              Koyeb (Rust VPS Proxy)                         │
│  • Axum web framework                                       │
│  • AES-256-GCM decryption                                  │
│  • Discord/Telegram API calls                              │
│  • Clean IP (no rate limiting)                             │
│  Cost: $0/month                                            │
└────────────────────────┬────────────────────────────────────┘
                         │
                         │ HTTPS
                         ▼
┌─────────────────────────────────────────────────────────────┐
│         Discord / Telegram APIs                             │
│  • Receive notifications                                    │
│  • No rate limiting (clean IP)                             │
└─────────────────────────────────────────────────────────────┘

Total Cost: $0/month

Cost Breakdown

Current Usage (10 users/day)

Service	Free Tier Limit	Current Usage	Headroom
Vercel	100 GB bandwidth	2 GB	50x
Cloudflare Workers	100k req/day	50 req/day	2000x
Cloudflare D1	100k writes/day	50 writes/day	2000x
Koyeb	512 MB RAM	40 MB	12x

Total cost: $0/month

Projected Usage (1000 users/day)

Service	Free Tier Limit	Projected Usage	Still Free?
Vercel	100 GB bandwidth	20 GB	Yes (5x headroom)
Cloudflare Workers	100k req/day	5k req/day	Yes (20x headroom)
Cloudflare D1	100k writes/day	5k writes/day	Yes (20x headroom)
Koyeb	512 MB RAM	200 MB	Yes (2.5x headroom)

Total cost: Still $0/month

When Would I Need to Pay?

Vercel:

Pro tier ($20/month) needed at ~2000 users/day (200 GB bandwidth)

Cloudflare Workers:

Paid tier ($5/10M requests) needed at ~20,000 users/day (2M requests/month)

Cloudflare D1:

Paid tier ($5/month) needed at ~20,000 users/day (600k writes/month)

Koyeb:

Paid tier ($7/month) needed at ~5000 users/day (512 MB RAM limit)

First bottleneck: Koyeb RAM at ~5000 users/day

Solution: Upgrade Koyeb to $7/month (1 GB RAM) or optimize Rust memory usage

Performance Metrics

Response Times

Frontend (Vercel):

Time to First Byte (TTFB): ~50ms
First Contentful Paint (FCP): ~200ms
Largest Contentful Paint (LCP): ~500ms

Backend (Cloudflare Workers):

API response time: ~100ms (cold start)
API response time: ~10ms (warm)
Database query time: ~5ms

Notification Proxy (Koyeb):

Cold start: ~10 seconds (VPS sleeping)
Warm: ~3.6 seconds (VPS active)
Processing time: ~100ms (decrypt + forward)

Reliability

Uptime:

Vercel: 99.99% (SLA)
Cloudflare: 99.99% (SLA)
Koyeb: 99.9% (observed)

Error rates:

Frontend: < 0.1%
Backend: < 0.1%
Notifications: 0% (100% success rate)

Scalability

Current load:

10 users/day
50 requests/day
50 database writes/day

Theoretical capacity (free tier):

5000 users/day (Koyeb RAM bottleneck)
100,000 requests/day (Cloudflare Workers)
100,000 writes/day (D1)

Headroom: 500x current load

Optimization Strategies

1. Minimize Database Writes

Problem: D1 free tier = 100k writes/day

Solution:

Batch queue inserts (1 transaction for N users)
Cache GitHub API responses (reduce redundant queries)
Cleanup old data (delete after 7 days)

Result: 2 writes per user (queue + notification) instead of 5+

2. Optimize Rust Memory Usage

Problem: Koyeb free tier = 512 MB RAM

Solution:

Use Rust (20 MB idle vs Node.js 50 MB)
Stateless design (no in-memory cache)
Small Docker image (85 MB vs 200+ MB for Node.js)

Result: 10x more capacity on same RAM

3. Edge Caching

Problem: Repeated API calls for same data

Solution:

Cloudflare CDN caches static assets
Vercel Edge Network caches pages
SWR (stale-while-revalidate) on frontend

Result: 90% cache hit rate, faster load times

4. Distributed Processing

Problem: Single Worker CPU limit (30 seconds)

Solution:

Service Bindings (N Workers for N users)
Each Worker gets fresh CPU budget
Parallel processing

Result: 10x faster processing, no TLE errors

Monitoring & Observability

Cloudflare Analytics

Built-in metrics:

Request count
Error rate
Response time (p50, p95, p99)
CPU time usage
Memory usage

Access:

# View analytics
npx wrangler tail streaky

# Real-time logs
npx wrangler tail streaky --format=pretty

Vercel Analytics

Built-in metrics:

Page views
Unique visitors
Core Web Vitals (LCP, FID, CLS)
Deployment status

Access: Vercel dashboard

Custom Logging

Implementation:

// Log to Analytics Engine
await env.ANALYTICS.writeDataPoint({
  blobs: ['user_processed'],
  doubles: [processingTime],
  indexes: [userId],
});

// Query analytics
const results = await env.DB.prepare(`
  SELECT 
    blob1 as event,
    AVG(double1) as avg_time,
    COUNT(*) as count
  FROM analytics
  WHERE timestamp > datetime('now', '-7 days')
  GROUP BY blob1
`).all();

Security Considerations

1. Encryption

All sensitive data encrypted:

GitHub PAT (AES-256-GCM)
Discord webhooks (AES-256-GCM)
Telegram tokens (AES-256-GCM)

Key storage:

Cloudflare Secrets (not in code)
Koyeb environment variables (not in image)

2. Authentication

Frontend:

NextAuth.js (GitHub OAuth)
JWT tokens (signed, verified)
HTTP-only cookies

Backend:

JWT verification
API secret headers (X-Cron-Secret)
Rate limiting (60 req/min)

3. Network Security

HTTPS everywhere:

Vercel: Automatic HTTPS
Cloudflare: Automatic HTTPS
Koyeb: Automatic HTTPS

CORS:

Strict allowlist (only frontend domain)
No wildcard origins

Security headers:

X-Content-Type-Options: nosniff
X-Frame-Options: DENY
Strict-Transport-Security: max-age=31536000

Lessons Learned

1. Free Tiers Are Generous

Modern platforms offer incredible free tiers:

Vercel: 100 GB bandwidth
Cloudflare: 100k requests/day
Koyeb: 512 MB VPS

You can build production apps for $0.

2. Architecture Matters

Smart architecture maximizes free tier capacity:

Distributed processing (avoid CPU limits)
Edge caching (reduce requests)
Stateless design (minimize memory)

3. Rust Is Worth It

For resource-constrained environments:

10x less memory than Node.js
5x smaller binary
Blazing fast

4. Monitoring Is Essential

Even on free tiers:

Cloudflare Analytics (built-in)
Vercel Analytics (built-in)
Custom logging (Analytics Engine)

5. Plan for Scale

Design for 100x current load:

Identify bottlenecks early
Know when you'll need to pay
Have upgrade path ready

When to Upgrade

Signals you need paid tier:

Hitting rate limits (100k req/day on Workers)
Memory errors (512 MB on Koyeb)
Slow response times (need more resources)
Storage limits (5 GB on D1)

Upgrade path:

Koyeb: $7/month (1 GB RAM) - First bottleneck at ~5000 users
Cloudflare Workers: $5/10M requests - Needed at ~20,000 users
Vercel: $20/month (Pro) - Needed at ~2000 users
D1: $5/month - Needed at ~20,000 users

Total cost at 5000 users: $7/month (just Koyeb)

Conclusion

Building a production app for $0/month is possible with:

Smart architecture (distributed, stateless, cached)
Right tech stack (Rust, Workers, D1)
Generous free tiers (Vercel, Cloudflare, Koyeb)

Current status:

10 users/day
$0/month cost
99.9% uptime
100% notification success rate

Capacity:

Can handle 5000 users/day on free tier
500x current load
First paid tier at $7/month

Key takeaway: Free tiers are powerful. Use them wisely.

Try It Out

Live App: streakyy.vercel.app

GitHub: github.com/0xReLogic/Streaky

Full Stack:

Frontend: web/frontend
Backend: web/backend
Proxy: server

Series Complete

This concludes the 4-part series on building Streaky:

Part 1: The journey from sequential to distributed processing
Part 2: Solving IP blocking with Rust VPS
Part 3: Distributed queue system with Service Bindings
Part 4: Zero-cost production architecture

Thanks for following along! If you found this helpful, give it a reaction and follow for more content.

Let's Connect

Building on free tiers? Have questions about the stack? Drop a comment!

GitHub: @0xReLogic

Project: Streaky

Building Streaky: Distributed Queue System with Service Bindings (Part 3)

Allen Elzayn — Fri, 24 Oct 2025 17:40:30 +0000

*Originally published on My Blog

Part 3: Scaling to 1000+ Users

In Part 1, I shared the journey from sequential processing to distributed architecture. In Part 2, I explained how I solved IP blocking with a Rust proxy.

Now, let's dive deep into the distributed queue system that makes it all work.

The Scalability Problem

After solving the IP blocking issue, I still had the CPU time limit problem. Processing 10 users sequentially took 30+ seconds. Cloudflare Workers have a 30-second CPU time limit.

The constraint:

Each user takes ~3 seconds to process (GitHub API + notifications)
10 users × 3 seconds = 30 seconds
Add any overhead = Time Limit Exceeded (TLE)

The realization:
I can't process users sequentially in a single Worker. I need to distribute the work across multiple Workers.

The Solution: Service Bindings + Queue

Core idea: Instead of one Worker processing N users, spawn N Workers each processing 1 user.

Architecture:

Scheduler Worker
    |
    |-- Dispatch Worker 1 (User A)
    |-- Dispatch Worker 2 (User B)
    |-- Dispatch Worker 3 (User C)
    |-- ...
    |-- Dispatch Worker N (User N)

Each Worker:
- Fresh CPU budget (30 seconds)
- Isolated execution context
- Parallel processing

Result:

10 users processed in ~10 seconds (parallel)
Each Worker uses <5 seconds CPU time
No TLE errors
Scales to 1000+ users

Key Components

1. Queue Table (D1 SQLite)

The queue table tracks which users need processing and their status.

CREATE TABLE cron_queue (
  id TEXT PRIMARY KEY,
  user_id TEXT NOT NULL,
  batch_id TEXT NOT NULL,
  status TEXT NOT NULL CHECK(status IN ('pending', 'processing', 'completed', 'failed')),
  created_at TEXT NOT NULL DEFAULT (datetime('now')),
  started_at TEXT,
  completed_at TEXT,
  error_message TEXT,
  retry_count INTEGER NOT NULL DEFAULT 0
);

CREATE INDEX idx_cron_queue_status ON cron_queue(status);
CREATE INDEX idx_cron_queue_batch ON cron_queue(batch_id);
CREATE INDEX idx_cron_queue_user ON cron_queue(user_id);

Why D1?

Already part of the stack (no external dependencies)
Fast enough for job queues (< 10ms queries)
Supports atomic operations (CTE + UPDATE + RETURNING)
Free tier: 50,000 writes/day (plenty for this use case)

2. Service Bindings

Service Bindings allow a Worker to call itself, creating new Worker instances.

Configuration (wrangler.toml):

[[services]]
binding = "SELF"
service = "streaky"

Usage:

// Each fetch creates a NEW Worker instance
env.SELF.fetch('http://internal/api/cron/process-user', {
  method: 'POST',
  headers: {
    'X-Cron-Secret': env.SERVER_SECRET,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    queueId: queueItem.id,
    userId: queueItem.user_id,
  }),
})

Why Service Bindings?

Each env.SELF.fetch() = new Worker instance
Fresh CPU budget per instance
Automatic load balancing by Cloudflare
No external queue service needed (Redis, SQS, etc.)

Implementation

Step 1: Initialize Batch

When the cron trigger fires, create a batch of queue items.

// web/backend/src/services/queue.ts

export async function initializeBatch(
  env: Env,
  userIds: string[]
): Promise<string> {
  const batchId = crypto.randomUUID();

  // Bulk insert users to queue
  for (const userId of userIds) {
    const queueId = crypto.randomUUID();
    await env.DB.prepare(
      `INSERT INTO cron_queue (id, user_id, batch_id, status)
       VALUES (?, ?, ?, 'pending')`
    )
      .bind(queueId, userId, batchId)
      .run();
  }

  return batchId;
}

Step 2: Atomic Queue Claiming

The critical part: prevent race conditions when multiple Workers try to claim the same user.

// web/backend/src/services/queue.ts

export async function claimNextPendingUserAtomic(
  env: Env
): Promise<QueueItem | null> {
  const result = await env.DB.prepare(`
    WITH next AS (
      SELECT id FROM cron_queue
      WHERE status = 'pending'
      ORDER BY created_at ASC
      LIMIT 1
    )
    UPDATE cron_queue
    SET status = 'processing', started_at = datetime('now')
    WHERE id IN (SELECT id FROM next)
    RETURNING id, user_id, batch_id
  `).all<QueueItem>();

  return result.results[0] ?? null;
}

Why atomic?

CTE (WITH) + UPDATE + RETURNING in single transaction
No gap between SELECT and UPDATE
Prevents duplicate processing
D1 SQLite guarantees atomicity

Without atomic claiming:

Worker 1: SELECT id WHERE status='pending' → Gets user A
Worker 2: SELECT id WHERE status='pending' → Gets user A (race!)
Worker 1: UPDATE status='processing' WHERE id=A
Worker 2: UPDATE status='processing' WHERE id=A
Result: Both workers process user A (duplicate!)

With atomic claiming:

Worker 1: CTE + UPDATE + RETURNING → Gets user A, marks processing
Worker 2: CTE + UPDATE + RETURNING → Gets user B, marks processing
Result: No duplicates, each worker gets unique user

Step 3: Scheduler (Main Worker)

The scheduler initializes the batch and dispatches Workers.

// web/backend/src/index.ts

export default {
  async scheduled(event: ScheduledEvent, env: Env, ctx: ExecutionContext) {
    console.log('[Scheduled] Cron trigger fired:', event.cron);

    // Query active users
    const usersResult = await env.DB.prepare(
      `SELECT id FROM users WHERE is_active = 1 AND github_pat IS NOT NULL`
    ).all();

    const userIds = usersResult.results.map((row: any) => row.id as string);

    if (userIds.length === 0) {
      console.log('[Scheduled] No active users to process');
      return;
    }

    // Initialize batch
    const batchId = await initializeBatch(env, userIds);
    console.log(`[Scheduled] Batch ${batchId} initialized with ${userIds.length} users`);

    // Dispatch Workers via Service Bindings
    for (let i = 0; i < userIds.length; i++) {
      const queueItem = await claimNextPendingUserAtomic(env);

      if (!queueItem) {
        console.log('[Scheduled] No more pending users in queue');
        break;
      }

      ctx.waitUntil(
        env.SELF.fetch('http://internal/api/cron/process-user', {
          method: 'POST',
          headers: {
            'X-Cron-Secret': env.SERVER_SECRET,
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            queueId: queueItem.id,
            userId: queueItem.user_id,
          }),
        })
          .then((res) => {
            console.log(`[Scheduled] User ${queueItem.user_id} dispatched: ${res.status}`);
          })
          .catch((error: Error) => {
            console.error(`[Scheduled] User ${queueItem.user_id} dispatch failed:`, error);
          })
      );
    }

    console.log(`[Scheduled] All ${userIds.length} users dispatched for batch ${batchId}`);
  }
}

Key points:

ctx.waitUntil() ensures async operations complete
Each env.SELF.fetch() creates new Worker instance
Errors in one Worker don't affect others

Step 4: Worker Instance (Process Single User)

Each Worker instance processes one user.

// web/backend/src/routes/cron.ts

app.post('/process-user', async (c) => {
  // Auth check
  const secret = c.req.header('X-Cron-Secret');
  if (!c.env.SERVER_SECRET || secret !== c.env.SERVER_SECRET) {
    return c.json({ error: 'Unauthorized' }, 401);
  }

  try {
    const body = await c.req.json<{ queueId: string; userId: string }>();
    const { queueId, userId } = body;

    if (!queueId || !userId) {
      return c.json({ error: 'Missing queueId or userId' }, 400);
    }

    // Idempotency check
    const status = await getQueueItemStatus(c.env, queueId);

    if (status === 'completed') {
      return c.json({ 
        success: true, 
        queueId, 
        userId, 
        skipped: true, 
        reason: 'Already completed' 
      });
    }

    if (status === 'failed') {
      return c.json({ 
        success: false, 
        queueId, 
        userId, 
        skipped: true, 
        reason: 'Already failed' 
      });
    }

    // Process user
    try {
      await processSingleUser(c.env, userId);
      await markCompleted(c.env, queueId);

      return c.json({ success: true, queueId, userId });
    } catch (error) {
      const errorMessage = error instanceof Error ? error.message : 'Unknown error';
      await markFailed(c.env, queueId, errorMessage);

      console.error(`[ProcessUser] Failed for user ${userId}:`, error);

      // Return 200 (not 500) so scheduler continues with other users
      return c.json({ success: false, queueId, userId, error: errorMessage });
    }
  } catch (error) {
    console.error('[ProcessUser] Error:', error);
    return c.json({ 
      error: 'Process user failed', 
      message: error instanceof Error ? error.message : 'Unknown error' 
    }, 500);
  }
});

Key points:

Idempotency protection (check status before processing)
Return 200 even on failure (don't block other Workers)
Mark completed/failed in queue

Step 5: Process Single User

The actual user processing logic.

// web/backend/src/cron/process-single-user.ts

export async function processSingleUser(env: Env, userId: string): Promise<void> {
  // Fetch user from D1
  const user = await env.DB.prepare(
    `SELECT id, github_username, github_pat, discord_webhook, telegram_token, telegram_chat_id
     FROM users
     WHERE id = ? AND is_active = 1`
  )
    .bind(userId)
    .first<User>();

  if (!user) {
    throw new Error(`User ${userId} not found or inactive`);
  }

  if (!user.github_pat) {
    throw new Error(`User ${user.github_username} has no GitHub PAT configured`);
  }

  // Initialize services
  const encryptionService = await createEncryptionService(env.ENCRYPTION_KEY);
  const notificationService = createNotificationService(env);

  // Decrypt GitHub PAT
  const decryptedPat = await encryptionService.decrypt(user.github_pat);

  // Create GitHub service
  const githubService = createCachedGitHubService(decryptedPat, 5);

  // Check contributions
  const contributionsToday = await githubService.getContributionsToday(user.github_username);
  const currentStreak = await githubService.getCurrentStreak(user.github_username);

  // Prepare notification message
  const notificationMessage = {
    username: user.github_username,
    currentStreak,
    contributionsToday,
    message: contributionsToday > 0
      ? `Great job! You made ${contributionsToday} contribution${contributionsToday > 1 ? 's' : ''} today! Your ${currentStreak}-day streak is safe. Keep it up!`
      : `You have not made any contributions today! Your ${currentStreak}-day streak is at risk. Make a commit to keep it alive!`,
  };

  // Send Discord notification if configured
  if (user.discord_webhook) {
    try {
      const discordResult = await notificationService.sendDiscordNotification(
        user.discord_webhook,
        notificationMessage
      );

      await logNotification(env, user.id, 'discord', discordResult.success ? 'sent' : 'failed', discordResult.error);

      if (discordResult.success) {
        console.log(`[Process] Discord notification sent to ${user.github_username}`);
      } else {
        console.error(`[Process] Discord notification failed for ${user.github_username}:`, discordResult.error);
      }
    } catch (error) {
      console.error(`[Process] Error sending Discord notification to ${user.github_username}:`, error);
    }
  }

  // Send Telegram notification if configured
  if (user.telegram_token && user.telegram_chat_id) {
    try {
      const telegramResult = await notificationService.sendTelegramNotification(
        user.telegram_token,
        user.telegram_chat_id,
        notificationMessage
      );

      await logNotification(env, user.id, 'telegram', telegramResult.success ? 'sent' : 'failed', telegramResult.error);

      if (telegramResult.success) {
        console.log(`[Process] Telegram notification sent to ${user.github_username}`);
      } else {
        console.error(`[Process] Telegram notification failed for ${user.github_username}:`, telegramResult.error);
      }
    } catch (error) {
      console.error(`[Process] Error sending Telegram notification to ${user.github_username}:`, error);
    }
  }

  console.log(`[Process] User ${user.github_username} processed successfully`);
}

Advanced Features

1. Stale Item Requeuing

What if a Worker crashes? Items stuck in "processing" need to be requeued.

// web/backend/src/services/queue.ts

export async function requeueStaleProcessing(
  env: Env,
  minutes: number = 10
): Promise<number> {
  const result = await env.DB.prepare(`
    UPDATE cron_queue
    SET status = 'pending', started_at = NULL
    WHERE status = 'processing'
      AND started_at < datetime('now', '-' || ? || ' minutes')
  `)
    .bind(minutes)
    .run();

  return result.meta.changes;
}

Usage in scheduler:

// Reaper for stale processing items (10+ minutes)
ctx.waitUntil(
  requeueStaleProcessing(env, 10)
    .then((requeued) => {
      if (requeued > 0) {
        console.log(`[Scheduled] Requeued ${requeued} stale processing items`);
      }
    })
    .catch((error: Error) => {
      console.error('[Scheduled] Error requeuing stale items:', error);
    })
);

2. Batch Cleanup

Delete old batches to prevent database bloat.

// web/backend/src/services/queue.ts

export async function cleanupOldBatches(
  env: Env,
  daysOld: number = 7
): Promise<number> {
  const result = await env.DB.prepare(`
    DELETE FROM cron_queue
    WHERE created_at < datetime('now', '-' || ? || ' days')
  `)
    .bind(daysOld)
    .run();

  return result.meta.changes;
}

Usage in scheduler:

// Cleanup old batches (7+ days)
ctx.waitUntil(
  cleanupOldBatches(env, 7)
    .then((deleted) => {
      if (deleted > 0) {
        console.log(`[Scheduled] Cleaned up ${deleted} old queue items`);
      }
    })
    .catch((error: Error) => {
      console.error('[Scheduled] Error cleaning up old batches:', error);
    })
);

3. Batch Progress Tracking

Monitor batch progress in real-time.

// web/backend/src/services/queue.ts

export interface BatchProgress {
  pending: number;
  processing: number;
  completed: number;
  failed: number;
  total: number;
}

export async function getBatchProgress(
  env: Env,
  batchId: string
): Promise<BatchProgress> {
  const results = await env.DB.prepare(`
    SELECT status, COUNT(*) as count
    FROM cron_queue
    WHERE batch_id = ?
    GROUP BY status
  `)
    .bind(batchId)
    .all();

  const progress: BatchProgress = {
    pending: 0,
    processing: 0,
    completed: 0,
    failed: 0,
    total: 0,
  };

  for (const row of results.results as Array<{ status: string; count: number }>) {
    const status = row.status as keyof Omit<BatchProgress, 'total'>;
    progress[status] = row.count;
    progress.total += row.count;
  }

  return progress;
}

API endpoint:

// web/backend/src/routes/cron.ts

app.get('/batch/:batchId', async (c) => {
  const secret = c.req.header('X-Cron-Secret');
  if (!c.env.SERVER_SECRET || secret !== c.env.SERVER_SECRET) {
    return c.json({ error: 'Unauthorized' }, 401);
  }

  try {
    const batchId = c.req.param('batchId');
    const progress = await getBatchProgress(c.env, batchId);

    return c.json({
      batchId,
      progress,
      percentage: progress.total > 0 
        ? Math.round(((progress.completed + progress.failed) / progress.total) * 100) 
        : 0,
    });
  } catch (error) {
    console.error('[BatchProgress] Error:', error);
    return c.json({ error: 'Failed to get batch progress' }, 500);
  }
});

Performance Analysis

Before (Sequential Processing)

10 users × 3 seconds = 30 seconds
CPU time: 30 seconds (at limit!)
Wall time: 30 seconds
Success rate: 0% (TLE errors)

After (Distributed Processing)

10 users / 10 Workers = 1 user per Worker
CPU time per Worker: 3 seconds
Wall time: ~10 seconds (parallel)
Success rate: 100%

Scalability

Current load:

10 users/day
10 Workers dispatched
~10 seconds total processing time

Theoretical capacity:

Cloudflare Workers: 100,000 requests/day (free tier)
D1 writes: 50,000/day (free tier)
Bottleneck: D1 writes (2 writes per user = 25,000 users/day)

Headroom: 2500x current load

Cost Analysis

Free tier limits:

Cloudflare Workers: 100k req/day
D1 database: 50k writes/day
Koyeb VPS: 512MB RAM

Current usage:

Workers: ~20 req/day (10 users × 2 endpoints)
D1 writes: ~40 writes/day (queue + notifications)
VPS: ~20MB RAM

Cost: $0/month

Lessons Learned

1. Service Bindings Are Powerful

Each env.SELF.fetch() creates a new Worker instance with fresh CPU budget. This is the key to scaling beyond single-Worker limits.

2. D1 Is Fast Enough for Queues

No need for Redis or SQS. D1 SQLite handles job queues perfectly:

Atomic operations with CTE + UPDATE + RETURNING
Fast queries (< 10ms)
Built-in indexes
Free tier generous enough

3. Atomic Operations Prevent Races

Without atomic claiming, multiple Workers would process the same user. CTE + UPDATE + RETURNING in single statement solves this.

4. Idempotency Is Critical

Check status before processing. Safe retries, no duplicate notifications.

5. Stale Item Requeuing Is Essential

Workers crash. Items get stuck. Reaper process requeues them after 10 minutes.

6. Return 200 Even on Failure

If a Worker fails processing one user, return 200 (not 500). Don't block other Workers.

What's Next?

This completes the 3-part series on building Streaky:

Part 1: The journey from sequential to distributed processing
Part 2: Solving IP blocking with Rust VPS
Part 3: Distributed queue system with Service Bindings

Try It Out

Live App: streakyy.vercel.app

GitHub: github.com/0xReLogic/Streaky

Queue Code: web/backend/src/services/queue.ts

Let's Connect

Building distributed systems on Cloudflare? Have questions about Service Bindings or D1? Drop a comment!

GitHub: @0xReLogic

Project: Streaky