How I Built a Self-Optimizing Arbitrage Engine with Python and Claude AI

#python #ai #opensource #business

I recently open-sourced my global arbitrage engine — a system that scans price gaps across international markets, generates content, and optimizes itself every 6 hours. Here's how it works under the hood.

GitHub: github.com/victorjzq/global-arbitrage-api

The Problem

A kids' coding robot costs ¥45 ($6) wholesale on 1688 (China's Alibaba) but sells for $22 on Shopee Vietnam. That's a 3.5x markup. Why does this gap exist?

Three barriers: language (1688 is Chinese-only), payment (requires Alipay), and discovery (Vietnamese sellers can't search Chinese platforms). AI eliminates all three.

I wanted a system that finds these gaps automatically — and gets smarter over time.

Architecture: 12 Engines in a Perpetual Loop

Scanners (3)          Engine              Output
┌──────────────┐     ┌──────────┐     ┌──────────────┐
│ Trend Gap    │────▶│ Perpetual│────▶│ API (24/7)   │
│ Price Scan   │     │ Engine   │     │ Reports      │
│ Polymarket   │     │ (6h cycle)│    │ Content (5x) │
└──────────────┘     └────┬─────┘     │ Telegram Bot │
                          │           └──────────────┘
                    ┌─────▼─────┐
                    │ Evolution │  ← self-optimization
                    │ Loop      │    from own data
                    └───────────┘

The perpetual engine orchestrates everything. Every 6 hours it runs a full cycle:

Scan — 3 scanners find opportunities across 26 categories and 9 trade routes
Rank — Score each opportunity by ROI × confidence × urgency
Generate — Turn findings into content for 5 platforms
Evolve — Analyze results and adjust weights for next cycle

Here's the actual orchestrator code:

# perpetual_engine.py — the heartbeat
def main():
    cycle = {"start": datetime.now().isoformat(), "scanners": {}}

    # Phase 1: Scan for opportunities
    scanners = [
        ("Trend Gap Scanner", SRC / "trend_gap_scanner.py"),
        ("Price Scanner",     SRC / "daily_scan.py"),
        ("Polymarket",        SRC / "prediction-markets/polymarket_scanner.py"),
    ]
    for name, path in scanners:
        ok, output = run_script(name, path)
        cycle["scanners"][name] = {"success": ok}

    # Phase 2: Generate multi-platform content
    run_script("Content Engine", SRC / "content_engine.py")

    # Phase 3: Self-optimize
    run_script("Evolution Loop", SRC / "evolution_loop.py")

    # Phase 4: Record metrics for next evolution
    record_metrics(cycle)

The Self-Optimization Loop (The Interesting Part)

Most automation scripts do the same thing every time. This system learns from its own output.

The evolution engine analyzes historical scan data to find patterns:

# evolution_loop.py — the system improves itself
def analyze_opportunities():
    patterns = {
        "high_markup_products": [],     # Which products have 3x+ margins?
        "trending_categories": [],       # What's growing fastest?
        "best_source_markets": Counter(),# Which source markets are most profitable?
        "best_target_markets": Counter(),# Which target markets convert best?
        "price_ranges": defaultdict(list),# What price range has best ROI?
    }
    # ... analyzes all historical data files
    return patterns

def generate_optimization_recommendations(patterns, content_stats):
    recs = []
    # Focus on highest-markup products
    if patterns["high_markup_products"]:
        top = sorted(patterns["high_markup_products"],
                     key=lambda x: x.get("markup", 0), reverse=True)[:3]
        recs.append({
            "type": "focus_products",
            "action": "Increase scan frequency",
            "targets": [t["product"] for t in top],
        })
    return recs

def update_scanning_weights(recommendations):
    weights = {
        "product_priority": [],
        "market_priority": ["VN", "TH", "ID", "PH"],
        "scan_frequency": {"trend_gap": "6h", "price_comparison": "6h"},
    }
    for rec in recommendations:
        if rec["type"] == "focus_products":
            weights["product_priority"] = rec["targets"]
    # Saved to disk — next scan cycle picks this up automatically
    with open(weights_file, "w") as f:
        json.dump(weights, f, indent=2)

Every cycle, the system:

Reads its own past results
Identifies what worked (high margins, trending categories)
Writes new weights to disk
Next scan cycle reads those weights and focuses accordingly

The result: after a few dozen cycles, the system naturally converges on the most profitable product categories and markets without any manual tuning.

Opportunity Scoring

Not all price gaps are worth acting on. The ranker scores every opportunity:

# opportunity_ranker.py
# score = ROI_potential * confidence * urgency

def estimate_roi(opp):
    markup = parse_markup(opp.get('markup', 1))
    roi_raw = markup - 1  # 3x markup = 200% ROI
    return normalize(roi_raw, 0, 3) * 10  # Scale to 0-10

def dedup_key(opp):
    # Hash-based deduplication across scanners
    raw = opp.get('keyword_cn', '') + opp.get('keyword_vn', '')
    return hashlib.md5(raw.encode()).hexdigest()[:12]

This prevents the same opportunity from showing up from multiple scanners and surfaces only the highest-value leads.

The 12 Engines

Engine	What it does
`trend_gap_scanner`	Finds products hot in China but not yet in SEA
`daily_scan`	Price comparison across platforms
`polymarket_scanner`	Prediction market arbitrage (2000+ markets)
`arbitrage_api`	REST API — 26 categories, 9 trade routes
`content_engine`	1 data point → Twitter + LinkedIn + Reddit + Email + Video
`opportunity_ranker`	Scores by ROI × confidence × urgency
`evolution_loop`	Self-optimization from own output
`perpetual_engine`	Orchestrator — runs every 6h
`publish_report`	Multi-channel distribution
`telegram_bot`	Subscription alerts
`system_status`	One-command dashboard
`md_to_html`	Reports → sellable HTML/PDF

Fork It and Make Money

git clone https://github.com/victorjzq/global-arbitrage-api.git
cd global-arbitrage-api
pip3 install requests pytrends

# Run a full cycle: scan → rank → content → evolve
python3 src/perpetual_engine.py

# Start the API
python3 src/api_server.py
# → http://localhost:8899/api/top

# Deploy 24/7
bash start.sh

The system is designed to be extended:

Add new markets: Africa, Latin America, Middle East
Add new scanners: new platforms, new data sources
Improve the evolution algorithm: add ML models, A/B test strategies
Add monetization: the API is ready for RapidAPI, reports for Gumroad

Real Results

Here's what the system found in its latest scan (March 2026):

Product	China (1688)	Vietnam (Shopee)	Markup
Kids STEM Robot	¥45 ($6)	550k VND ($22)	3.5x
Pet GPS Tracker	¥38 ($5)	450k VND ($18)	3.4x
Foldable Keyboard	¥28 ($4)	320k VND ($13)	3.3x
Solar WiFi Camera	¥75 ($10)	850k VND ($34)	3.2x

These aren't theoretical — the scanner finds new opportunities every 6 hours and ranks them by actionability.

Lessons Learned

Self-optimization beats manual tuning. Let the system tell you what's working.
Deduplication is critical. Multiple scanners will find the same opportunity — hash-based dedup prevents noise.
The perpetual loop pattern is reusable. Scan → Score → Act → Learn works for any data pipeline, not just arbitrage.
Start with the orchestrator. Build perpetual_engine.py first, then plug in scanners one at a time.

The full code is MIT licensed. Fork it, extend it, build your own arbitrage engine.

GitHub: github.com/victorjzq/global-arbitrage-api

Newsletter (free): victorjia.substack.com

If you have questions about the architecture or want to contribute, drop a comment or open an issue on GitHub.