HTS classification looks simple on paper — but anyone working in trade automation knows it isn’t. It’s a problem full of ambiguity, nested rules, overlapping categories, and legally precise exceptions. And despite being a high-impact step in global trade, most teams still rely on manual lookup and inconsistent human interpretation.
In this post, I’ll walk through how we engineered the Customs Classification Agent at SupplyGraph AI using a hybrid retrieval + constrained-reasoning approach, exposed through our A2A (Agent-to-Agent) protocol. This is a technical deep dive intended for developers, not a marketing overview.
You'll learn:
- what makes HTS classification uniquely difficult
- why embeddings and raw LLM prompts fail
- how our retrieval layer, scoring engine, and reasoning layer work together
- how the A2A protocol structures task execution
- how to run the agent using cURL or Python SDK
- where to find the SDK and examples on GitHub
Let’s get into it.
Why HTS Classification Is a Hard Engineering Problem
HTS classification has several properties that make it a non-trivial task for search, rule-based, and ML systems:
- legal text doesn’t follow natural-language conventions
- category boundaries are often implied, not explicit
- products may match multiple plausible codes
- correctness depends on combinations of attributes (material + form + use)
- legal notes introduce exceptions, inclusions, and cross-references
- tariff schedules update regularly and must remain version-locked for audits
From an engineering perspective, this resembles hierarchical, legal-style reasoning, rather than simple tagging.
Why common approaches fall short
Embeddings alone
They struggle with:
- hierarchical structure
- exclusion rules
- multi-attribute dependencies
- long-range legal references
Raw LLM classification
Common failure cases:
- hallucinating subheadings
- reasoning without grounding in legal text
- no version control
- no reproducibility
- no audit trail
HTS classification benefits from structured grounding, deterministic protocol behavior, and explainability—not just LLM power.
⚙ Architecture Overview
Our classification pipeline:
Product Description
▼
Pre-processing
▼
Attribute Extraction
▼
Candidate Retrieval
(HTS text + notes + enriched nodes)
▼
Candidate Scoring
▼
Constrained Reasoning
(grounded evaluation, no free-form generation)
▼
Ranked HTS Classification
Engineering clarifications
- The retrieval layer searches a structured dataset containing HTS text + legal notes + derived attribute nodes, not a million HTS entries.
- The reasoning layer is LLM-based but constrained, operating strictly over grounded candidate data.
- Confidence score is a normalized ranking score, not a calibrated statistical probability.
- Every output is tied to a specific tariff dataset version for audit reproducibility.
🔌 The A2A Protocol: How the Agent Exposes Its Interface
A2A defines:
- stable event types
- deterministic protocol flow
- explicit state transitions (
interpreting → executing → completed) - optional SSE streaming
- the
WAITING_USERstep when multiple interpretations are possible
Agents expose:
/manifest
/run (mode=run | status | results)
This keeps agent interactions simple but predictable.
📄 Manifest Example
GET /api/v1/agents/tariff_classification/manifest
{
"agent_id": "tariff_classification",
"name": "Customs Classification Agent",
"version": "1.0.0",
"description": "Maps product descriptions to HS/HTS codes.",
"pricing": { "per_run": 2, "unit": "credits" },
"input_schema": {...},
"output_schema": {...}
}
🚀 Running a Classification Task
Non-streaming
curl -X POST https://agent.supplygraph.ai/api/v1/agents/tariff_classification/run \
-H "Authorization: Bearer <API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"mode": "run",
"text": "Knitted cotton T-shirt for women"
}'
Example Output
{
"success": true,
"code": "TASK_COMPLETED",
"data": {
"content": {
"type": "result",
"data": {
"classification_results": [
{
"hts_code": "6109.10.00.40",
"confidence_score": 0.90,
"reasoning": "Identified as knitted cotton T-shirt for women.",
"description": "T-shirts, singlets, tank tops... of cotton; women's or girls'"
}
],
"country_of_origin": "Mexico"
}
}
}
}
✔ Note:
country_of_origin only appears when explicitly present in the input text.
The agent does not infer origin automatically.
📡 Streaming Reasoning via SSE
Enable streaming with:
stream=true
Example events:
event: stream
data: { "stage": "interpreting", "reasoning": ["Extracting product attributes..."] }
event: stream
data: { "stage": "executing", "code": "TASK_ACCEPTED" }
event: end
data: [DONE]
Helpful for interactive UIs and debugging.
Python SDK Example
Repo: https://github.com/SupplyGraphAI/supplygraphai_a2a_sdk
from supplygraph.compat import OpenAICompatibleClient
client = OpenAICompatibleClient(api_key="sg-xxx")
response = client.agents.run(
"tariff_classification",
text="Machine-cut aluminum sheets, thickness 2.5mm"
)
print(response)
Streaming:
for event in client.agents.stream("tariff_classification", text="..."):
print(event)
}
📁 Output Structure
| Field | Meaning |
|---|---|
hts_code |
Suggested classification |
confidence_score |
Ranking score (normalized 0–1) |
reasoning |
Constrained, grounded explanation |
description |
Official HTS text |
country_of_origin |
Included only when present in input |
Outputs are designed for direct integration with:
- duty estimation
- ERP flows
- compliance review
- audit logging
🔍 Why This Hybrid Approach Works
| Problem | Our solution |
|---|---|
| Embeddings miss legal boundaries | Structured retrieval |
| LLMs hallucinate | Constrained reasoning |
| No repeatability | Protocol-level determinism |
| No audit trail | Version-locked datasets |
| Ambiguous inputs |
WAITING_USER clarification |
This yields a classifier that’s transparent, explainable, and suitable for production environments.
🔗 GitHub & Resources
Python SDK
https://github.com/SupplyGraphAI/supplygraphai_a2a_sdk
All repositories
https://github.com/SupplyGraphAI
Contributions, issues, and discussions are welcome.
🏁 Closing Thoughts
HTS classification is deceptively difficult because it blends legal-style logic with multi-attribute categorization. Our approach combines retrieval, scoring, and constrained reasoning, then exposes it through a predictable A2A protocol that developers can embed directly into their systems.
As always, final HTS determination depends on complete product details and applicable legal notes.
This agent supports — but does not replace — professional judgment in tariff classification.
If you'd like a deeper dive into the retrieval engine or A2A internals, I’m happy to write a Part 2.
Top comments (0)