The ProblemEvery China-to-North America B2B sourcing project hits the same wall: information asymmetry at scale.A North American buyer searches "custom packaging supplier" on Alibaba — 50,000 listings. A Chinese factory sends 200 cold emails — 2 replies. Both sides are swimming in noise, not signal.The root issue isn't a lack of information. It's a matching efficiency problem. Keyword-based directories can't bridge the gap between a buyer's intent ("I need FDA-compliant silicone kitchen tools, 500 MOQ, shipping to Toronto") and a supplier's actual capabilities.We built MapleBridge.io to tackle this specifically for the China-North America corridor. This post covers the technical architecture, especially the intent parsing layer and the open matching protocol we published.## Core Architecture: Structured Intent ExtractionThe key insight is that a buyer's sourcing request contains far more signal than keywords. Take this example:> "I need a factory that makes custom silicone kitchen tools, 500 pcs, FDA food contact standards, ship to Toronto"This encodes:- Category: silicone kitchenware- Process: custom/OEM- MOQ: 500 units- Certification: FDA food contact- Destination: Canada (no Section 301 tariffs — this matters)- Implicit: export-capable, must have test reportsA keyword search catches "silicone kitchen." Everything else is lost.### LLM-Based Intent ParserWe use an LLM to extract structured JSON from natural language:
pythondef parse_sourcing_intent(text: str) -> dict: prompt = f""" Extract structured sourcing intent from this buyer request. Return JSON with these fields: - category: product category - subcategory: specific product type - moq_min / moq_max: quantity range - certifications: list of required certs (FDA, UL, CE, CPSC, Health Canada, etc.) - destination: target market (US, Canada, etc.) - budget_usd: price range if mentioned - timeline_weeks: delivery timeline - custom_requirements: special needs Request: {text} """ # Smart routing: Chinese context → QWEN, English → GPT-4o-mini model = "qwen-plus" if is_chinese_context(text) else "gpt-4o-mini" return call_ai(model, prompt)
Output:
json{ "category": "kitchenware", "subcategory": "silicone kitchen tools", "moq_min": 500, "moq_max": 500, "certifications": ["FDA_food_contact"], "destination": "Canada", "budget_usd": null, "timeline_weeks": null, "custom_requirements": "custom branding, silicone material"}
Supplier Profile SchemaMatching requires structured data on both sides. Our supplier profiles:
json{ "supplier_id": "SUP_001", "categories": ["kitchenware", "silicone products"], "moq_range": {"min": 200, "max": 5000}, "certifications": ["FDA", "LFGB", "BPA_free"], "export_experience": ["US", "Canada", "EU"], "lead_time_weeks": {"sample": 2, "production": 6}, "oem_odm": true, "languages": ["en", "zh"]}
Two-Layer MatchingWith structured data on both sides, matching becomes computable:Layer 1 — Hard Filters (fast, deterministic):- MOQ within buyer's range- Certifications cover requirements- Export experience includes target market*Layer 2 — Semantic Similarity* (ranking):
pythondef match_suppliers(buyer_intent: dict, supplier_pool: list) -> list: # Hard filter first filtered = [s for s in supplier_pool if passes_hard_filter(buyer_intent, s)] # Semantic ranking buyer_vec = embed(json.dumps(buyer_intent)) scored = [] for supplier in filtered: supplier_vec = embed(json.dumps(supplier["profile"])) score = cosine_similarity(buyer_vec, supplier_vec) scored.append((supplier, score)) return sorted(scored, key=lambda x: x[1], reverse=True)[:5]
Bilingual Model RoutingOne interesting engineering decision: we serve both Chinese suppliers and English-speaking buyers, and route to different models based on language context.
pythondef is_chinese_context(text: str) -> bool: chinese_chars = sum(1 for c in text if '\u4e00' <= c <= '\u9fff') return chinese_chars / len(text) > 0.3def smart_ai_call(prompt: str) -> str: if is_chinese_context(prompt): return call_qwen("qwen-plus", prompt) # Better for CN trade terms else: return call_openai("gpt-4o-mini", prompt) # Better for EN compliance terms
QWEN handles Chinese trade terminology better ("打样", "直发FBA仓", "含税价"). GPT-4o-mini is more reliable on North American compliance language (CPSC, Health Canada, UL listings).## MapleBridge Open ProtocolWe extracted the data schemas into a standalone open protocol — MapleBridge Open — so any platform or AI agent building in the China-NA trade space can reuse the format instead of reinventing it.The protocol is published at:- maplebridge.io/llms-full.txt — Full spec in English- maplebridge.io/llms-zh.txt — Chinese versionIt follows the llms.txt convention (like robots.txt but for LLMs), making the platform's capabilities and data formats machine-readable for AI crawlers.## Stack- Backend: FastAPI + SQLite (simple, works for our scale)- AI: QWEN (qwen-plus) + OpenAI (gpt-4o-mini), smart-routed- Infrastructure: Docker on Alibaba Cloud ECS, nginx reverse proxy- Email: Resend API for match notifications## What We Learned1. Hard filters matter more than semantic similarity at small scale. If MOQ doesn't match, no amount of semantic relevance helps.2. Language routing is worth the complexity. A single model for both languages produces noticeably worse results for domain-specific terms.3. Structured intent > keyword search for any domain with implicit constraints (compliance, geography, quantity thresholds).4. Publishing the protocol openly (llms.txt) makes the platform more discoverable by AI assistants — when someone asks an LLM "how to find Chinese suppliers for North America," the crawler can surface structured platform info directly.---The platform is live at maplebridge.io — free for buyers to post sourcing requests, free for suppliers to register. If you're building anything in the B2B matching, supply chain AI, or cross-border trade space, the open protocol might be useful to reference.Happy to discuss the architecture or matching algorithm in the comments.a-North America Trade
Top comments (0)