CIZO

Posted on Apr 15

We Built an AI That Doesn't Guess: Architecture for Industrial Component Sourcing

#industrialai #engineeringsystems #ecommerceai #productsearch

Most AI search systems follow a flow that looks something like this:

User Input → LLM → Results

For a lot of domains, that's fine. Imprecise results in a playlist recommender or a content search tool are annoying. You try again.

But we recently built an AI-powered sourcing system for industrial components — bolts, springs, fasteners — and in that domain, an imprecise result doesn't mean a slightly wrong recommendation. It means:

wrong parameter → wrong part → real failure

So we threw out the standard architecture and built something different. This post is a technical breakdown of what we built, why, and the design principles behind it.

The Problem: AI Guesses. Always.

When a user types a vague query into a standard AI search system, the LLM fills in the blanks. It has to — that's what it does. It infers, interpolates, and presents a confident output.

Here's the failure mode we kept running into during initial testing:

User input:

"High load spring for a small space"

What an uncontrolled AI might assume:

Load range: something that seems high, based on training data patterns
Material: steel, probably
Dimensions: compact, probably
Standard: whatever seems relevant

The result looks technically formatted. The specs look plausible. But they were never validated against engineering constraints. The system guessed — and presented the guess with full confidence.

In industrial procurement, this is the most dangerous type of error: the confident wrong answer.

The Core Design Principle

Early in this project, we had a realization that reframed the entire architecture:

AI is excellent at understanding intent. AI is not reliable at making technical decisions.

These are two different cognitive tasks. We needed to split them — give AI the job it's genuinely good at, and give the deterministic system the job that requires precision.

The architecture we landed on:

User Input
    ↓
[AI Layer] — Intent extraction only. No spec decisions.
    ↓
[Parameter Structuring Layer] — Engineering logic maps intent to valid ranges
    ↓
[Controlled Search Layer] — Searches using structured constraints, not raw NL
    ↓
[Validation Layer] — Every candidate checked before surfacing
    ↓
Precise Output

Layer-by-Layer Breakdown

Layer 1: Natural Language Input

Users express queries as engineers actually think on the floor:

"I need a corrosion-resistant bolt for outdoor use"
"Spring for high load in a small space"
"Fastener for vibration-heavy environment"

No structured input required. No dropdowns or filter forms.

Layer 2: Intent Understanding — The AI's Actual Job

This is the only layer where the LLM has autonomous authority — and it has one job: structured intent extraction.

For the query "I need a corrosion-resistant bolt for outdoor use", the AI extracts:

{
  "product_type": "bolt",
  "use_case": "outdoor",
  "requirement": "corrosion_resistant"
}

That's it. The AI does not produce:

Material specs
ISO/DIN standards
Dimensional ranges
Final search filters

The AI interprets. Nothing else happens at this layer.

Layer 3: Parameter Structuring — Engineering Logic Takes Over

This is the most important layer in the system.

The extracted intent is passed to a structured parameter engine that maps intent to valid engineering possibilities — not final values, but bounded ranges and candidate sets.

For our bolt query:

use_case: outdoor + requirement: corrosion_resistant
→ material_candidates: ["Stainless Steel A2", "Stainless Steel A4"]
→ coating_candidates: ["Zinc plated (not recommended outdoor)", "Hot-dip galvanized", "None (A4 self-resistant)"]
→ standard_candidates: ["ISO 4017", "ISO 4018", "DIN 931", "DIN 933"]
→ exclusions: ["Carbon steel without coating", "Aluminum (load check required)"]

Two rules this layer enforces absolutely:

No parameter is ever assumed — every value is derived from engineering rules or left as an open range
Invalid combinations are excluded before search — the system doesn't search for carbon steel outdoor bolts and then filter them out; it never includes them in the search space at all

Layer 4: Controlled Search

The search layer receives structured parameters — not the original natural language query. This is a deliberate architectural choice.

Searching with raw NL queries allows semantic drift: the retrieval system surfaces items that are linguistically related but technically incompatible. Structured parameters eliminate this class of error entirely.

Example search execution for the outdoor bolt:

search_params = {
    "product_type": "bolt",
    "material": ["stainless_a2", "stainless_a4"],
    "standard": ["iso_4017", "din_933"],
    "application_compatibility": ["outdoor"],
    "exclusions": ["carbon_steel_uncoated"]
}

results = catalog.search(search_params)

No fuzzy matching. No semantic retrieval on specs. Structured query against structured data.

Layer 5: Validation — Zero Tolerance

Every candidate returned from Layer 4 goes through a validation pass before the user sees anything. This layer catches edge cases the parameter engine might have missed.

Checks run per candidate:

✓ Parameter compatibility (does this combination make engineering sense?)
✓ Standard compliance (does this part actually conform to the claimed standard?)
✓ Inventory / availability check (is this actually sourceable?)
✓ Constraint compatibility (no conflicts between specs)
✗ Reject if any check fails

This is the layer that converts "probably right" into "confirmed right." Nothing passes without clearing all checks.

Layer 6: Output

The user receives:

Exact matching components
Full technical specifications
Availability + stock data
Compatible variations (e.g., same bolt in A4 vs A2)

No "this should work." Only: this is the correct part.

Cross-Cutting Systems

Three systems run continuously across the full pipeline:

Pattern Learning Engine

Stores validated parameter combinations from successful selections. When a query pattern is seen again, the system can reuse pre-validated mappings rather than re-deriving from scratch. Improves both speed and consistency over time.

Feedback & Correction Loop

Incorrect matches (flagged by users or caught by monitoring) trigger rule updates in the parameter engine. The system gets more accurate with each correction, rather than repeating the same edge cases indefinitely.

Quality Monitoring

Tracks per-query accuracy, detects edge case clusters, and fires alerts when result quality metrics start to drift. Essential for maintaining reliability as the product catalog grows — an unchecked system will slowly degrade in accuracy without anyone noticing.

Design Principles We'd Apply to Any Similar System

After shipping this, here are the principles we'd carry forward to any AI system in a precision-critical domain:

1. Never let AI decide technical values autonomously
AI should narrow down possibilities; rules should finalize selections.

2. Structure before search
Never pass raw natural language directly to a search or retrieval layer. Always convert to structured parameters first.

3. Validation is not optional
Every result must be verified before surfacing. This isn't overhead — it's the feature.

4. Separate intent understanding from technical decision-making
These are different cognitive tasks. Model your architecture to reflect that separation.

5. Build for correction from day one
Your system will make mistakes. The question is whether it learns from them. Build feedback + correction loops before you need them.

Tech Stack

Component	Technology
Intent understanding	OpenAI GPT-4o / Claude (via API)
Backend	Node.js + Python
Parameter engine	Custom rule-based system
Validation	Rules + AI-assisted checks
Database	PostgreSQL + catalog systems
Inventory	API integrations (client-specific)

The stack is conventional. The architecture around it is what matters.

Takeaway

The failure mode for most AI systems in high-stakes domains is the same:

The AI is given too much autonomy over decisions it can't make reliably.

The fix isn't a better model. It's a better architecture — one where AI does what it's genuinely good at (understanding language and intent), and deterministic systems do what they're genuinely good at (enforcing constraints and validating outputs).

AI for understanding.
System for control.

If you're building AI for healthcare, manufacturing, logistics, fintech, or any domain where errors have real-world consequences — this separation isn't a nice-to-have. It's the thing that makes your system trustworthy in production.

Built by the team at CIZO — we build production-grade AI systems, mobile apps, and IoT solutions. Say hi: hello@cizotech.com

DEV Community