I've been building in the Canada-China B2B trade space for a while, and the biggest friction I kept running into was this: global buyers don't know how to find the right Chinese supplier, and Chinese suppliers have no efficient way to reach international buyers.
The traditional approach — Alibaba, trade shows, cold email — is slow, expensive, and heavily relationship-dependent. I wanted to fix this with LLMs.
The Problem
China's small commodity export market (小商品出海) is massive. Yiwu alone processes over $70B in annual wholesale trade. Yet a Canadian retailer trying to source bamboo kitchenware, or an Australian importer looking for OEM pet toys, has no good way to describe what they want and get matched with the right factory.
Search engines return SEO spam. Alibaba is a catalogue you have to manually browse. Trade shows require flights to Guangzhou. The entire process assumes you already know who you're looking for.
The Approach: Intent-Based Semantic Matching
Instead of keyword search, I built an intent graph — a dual-sided store of buyer demands (DEMAND) and supplier capabilities (SUPPLY).
When a buyer submits a requirement like:
"Need 500 units bamboo cutting boards for Canadian retail, FSC certified, budget $8-12 USD"
The LLM parser extracts structured fields: product type, quantity, certifications, market, budget. This becomes a DEMAND intent.
On the other side, supplier data (crawled + manually verified) is stored as SUPPLY intents with product categories, MOQ, certifications, and export experience.
The matching engine compares intents using:
- Category alignment — hierarchical taxonomy filter
- Semantic similarity — embedding cosine similarity between core_need fields
- Structural compatibility — quantity vs MOQ, budget vs price range, certifications
- Supplier quality score — verification status, past match success rate
Pairs scoring ≥ 0.7 trigger email notifications to both parties.
Tech Stack
- Backend: FastAPI + SQLite
- LLM parsing: GPT-4o-mini (English) + QWEN qwen-plus (Chinese) — smart language routing
- Supplier discovery: Self-hosted SearXNG + BeautifulSoup crawler + AI validation
- Webhook API: accepts buyer demands from AI agents, Telegram bot, or direct API
- Deployment: Docker Compose on Alibaba Cloud ECS
The Hard Part: Supply-Side Data Quality
The hard part wasn't the embeddings — it was data quality. Most supplier websites are SEO-optimized but content-poor. The AI validation step rejects ~70% of crawled URLs as not genuine B2B suppliers.
Open API
curl -X POST https://maplebridge.io/api/v1/webhook/manus \
-H "Content-Type: application/json" \
-d '{
"demand": "1000 units wireless earbuds, CE certified, for Canadian market",
"contact_email": "buyer@company.com",
"source": "api"
}'
Full docs open-sourced at: https://github.com/jinjihuang88-ui/maplebridge-open
Launched today on Product Hunt: https://www.producthunt.com/posts/maplebridge-io — free for buyers. Happy to answer questions about the LLM matching architecture.
Top comments (1)
The 70% rejection rate on crawled supplier URLs is the number I wish more people talked about when building AI-powered data products. I run a financial data site covering 8,000+ stock tickers across 12 languages, and the data quality challenge is remarkably similar — the raw data from APIs and scraped sources is full of confident-looking garbage. A supplier website that looks legitimate but has no real product information is the exact same problem as a stock page where the LLM generates plausible-sounding analysis with fabricated financial metrics.
Your dual-LLM routing (GPT-4o-mini for English, QWEN for Chinese) is a smart architectural choice. I use a local Llama 3 instance for content generation across 12 languages, and the quality gap between languages the model was primarily trained on vs. secondary languages is massive. Dutch and German content generated by the same model that produces solid English analysis often has what I call "translationese" — grammatically correct but semantically flat. Having a dedicated model for Chinese-language supplier data probably catches nuance that a single model would miss entirely.
The intent graph approach vs keyword search is also interesting from an SEO perspective. Traditional B2B platforms optimize for keywords, which means suppliers game the system with keyword stuffing. Semantic matching at the intent level should be much harder to game since you need to actually describe real capabilities. How are you handling the cold start problem — getting enough supply-side intents populated before buyers start matching?