Trying to compute "where is this product cheapest" across Rakuten's and Yahoo!'s APIs, you hit a wall before any price math: how do you decide two listings are the same product? The JAN (the product's barcode number) would make it easy — except the two marketplaces are asymmetric about it.
-
Yahoo! Shopping: a
jan_codeparameter looks the JAN up directly, and the response carriesjanCodeback. -
Rakuten item search: no JAN parameter. You throw the JAN string into
keyword, and the JAN itself usually sits inside the item caption rather than a structured field.
So Yahoo! you can trust into a confirmed JAN match; Rakuten you can't use without proving that a keyword=JAN hit is really that product. Here's how I split matching into three certainty tiers around that.
Split identity into three certainty tiers
Don't force everything through a JAN match. Keep only what you can tie down with confidence at the top; the lower the tier, the weaker the evidence, so the right to enter comparison and to notify narrows with it.
| Tier | Method | Comparison | Notify |
|---|---|---|---|
| ① JAN match | Yahoo! direct lookup + Rakuten keyword=JAN scored for confidence | confirmed only | yes |
| ② Name fuzzy | Score on model no. / size / brand → candidates → user confirms | after confirm | confirmed only |
| ③ No match | single-store (e.g. pasted URL) | no | own drops only |
Minimal implementation: scoring a Rakuten hit
Each Rakuten hit accumulates a score; thresholds split it three ways. ≥0.8 is confirmed, ≥0.5 is needs-review (no notify), below that is discarded.
type Grade = "high" | "mid" | "low";
function janMatchScore(
hit: { itemName: string; itemCaption: string; itemPrice: number },
jan: string,
range: { min: number; max: number } | null,
): { grade: Grade; score: number } {
let s = 0;
// (1) JAN string appears in the name or caption (most direct evidence)
if (hit.itemCaption.includes(jan) || hit.itemName.includes(jan)) s += 0.5;
// (2) price sits inside the product-search API's range
if (range) {
const lo = range.min * 0.7; // allow 30% below (old models, parallel imports)
const hi = range.max * 1.5; // allow 50% above (shipping-included, bundles)
s += hit.itemPrice >= lo && hit.itemPrice <= hi ? 0.3 : -0.4;
}
// (3) a model number or size can be read off the hit
if (/[a-z]+-?\d+/i.test(hit.itemName) || /\d+\s*(ml|g)/i.test(hit.itemName)) s += 0.2;
const score = Math.max(0, Math.min(1, s));
const grade: Grade = score >= 0.8 ? "high" : score >= 0.5 ? "mid" : "low";
return { grade, score };
}
Refusing to treat a weak hit as confirmed — discarding low, and not letting mid notify — was the single most effective guard against merging different products into one.
Gotchas
The product-search API (
productCode=JAN) returns 404 a lot. The price range is missing for many hits, so you can't lean on term (2). I added a fallback anchored on the cross-marketplace counterpart: if a model-number token pulled from the same-JAN Yahoo! hit (alphanumeric, 5+ chars, effectively unique) appears in the Rakuten hit's name, and the price is within ±40% of the counterpart's, promote to confirmed. The token alone can collide; the price band screens that out.The fuzzy tier doesn't auto-confirm. A matching model number, size, and brand still only give you a probability, so stop at surfacing candidates and master only the ones a person confirms. A wrong pairing is rejected in one tap (dropped from the candidate pool for good). Notifications fire on JAN-confirmed and user-confirmed only.
Wrapping up
Even when JANs don't line up, collapsing identity into three certainty tiers makes cross-marketplace comparison hold: confirmed JAN matches go in the table, weak evidence becomes candidates, and unmatchable items fall back to single-store alerts. Not forcing a JAN match on everything is what prevents wrong merges.
The point-inclusive price math, the accumulating mis-match reports, and the notification-trigger rules are written up on the Aulvem site → How Rakuten's API hides the JAN, and the 3-tier cross-marketplace match
Top comments (1)
Okay, the part doing the real work here isn't the JAN scoring, it's the quiet rule that a mid-confidence hit can join the comparison table but never trigger a notification. That split between "good enough to show" and "good enough to alert someone" is the thing most matching code gets wrong, because it treats one threshold as both. Pushing a wrong merge into a price alert is how you lose a user's trust in one tap, so making the bar to notify strictly higher is exactly right. The negative score when the price falls outside the range is a nice touch for the same reason, it actively votes against a bad match instead of just failing to support one.