Cross-matching products when Rakuten's API hides the JAN

#typescript #webdev #api #ecommerce

Trying to compute "where is this product cheapest" across Rakuten's and Yahoo!'s APIs, you hit a wall before any price math: how do you decide two listings are the same product? The JAN (the product's barcode number) would make it easy — except the two marketplaces are asymmetric about it.

Yahoo! Shopping: a jan_code parameter looks the JAN up directly, and the response carries janCode back.
Rakuten item search: no JAN parameter. You throw the JAN string into keyword, and the JAN itself usually sits inside the item caption rather than a structured field.

So Yahoo! you can trust into a confirmed JAN match; Rakuten you can't use without proving that a keyword=JAN hit is really that product. Here's how I split matching into three certainty tiers around that.

Split identity into three certainty tiers

Don't force everything through a JAN match. Keep only what you can tie down with confidence at the top; the lower the tier, the weaker the evidence, so the right to enter comparison and to notify narrows with it.

Tier	Method	Comparison	Notify
① JAN match	Yahoo! direct lookup + Rakuten keyword=JAN scored for confidence	confirmed only	yes
② Name fuzzy	Score on model no. / size / brand → candidates → user confirms	after confirm	confirmed only
③ No match	single-store (e.g. pasted URL)	no	own drops only

Minimal implementation: scoring a Rakuten hit

Each Rakuten hit accumulates a score; thresholds split it three ways. ≥0.8 is confirmed, ≥0.5 is needs-review (no notify), below that is discarded.

type Grade = "high" | "mid" | "low";

function janMatchScore(
  hit: { itemName: string; itemCaption: string; itemPrice: number },
  jan: string,
  range: { min: number; max: number } | null,
): { grade: Grade; score: number } {
  let s = 0;
  // (1) JAN string appears in the name or caption (most direct evidence)
  if (hit.itemCaption.includes(jan) || hit.itemName.includes(jan)) s += 0.5;
  // (2) price sits inside the product-search API's range
  if (range) {
    const lo = range.min * 0.7; // allow 30% below (old models, parallel imports)
    const hi = range.max * 1.5; // allow 50% above (shipping-included, bundles)
    s += hit.itemPrice >= lo && hit.itemPrice <= hi ? 0.3 : -0.4;
  }
  // (3) a model number or size can be read off the hit
  if (/[a-z]+-?\d+/i.test(hit.itemName) || /\d+\s*(ml|g)/i.test(hit.itemName)) s += 0.2;

  const score = Math.max(0, Math.min(1, s));
  const grade: Grade = score >= 0.8 ? "high" : score >= 0.5 ? "mid" : "low";
  return { grade, score };
}

Refusing to treat a weak hit as confirmed — discarding low, and not letting mid notify — was the single most effective guard against merging different products into one.

Gotchas

The product-search API (productCode=JAN) returns 404 a lot. The price range is missing for many hits, so you can't lean on term (2). I added a fallback anchored on the cross-marketplace counterpart: if a model-number token pulled from the same-JAN Yahoo! hit (alphanumeric, 5+ chars, effectively unique) appears in the Rakuten hit's name, and the price is within ±40% of the counterpart's, promote to confirmed. The token alone can collide; the price band screens that out.
The fuzzy tier doesn't auto-confirm. A matching model number, size, and brand still only give you a probability, so stop at surfacing candidates and master only the ones a person confirms. A wrong pairing is rejected in one tap (dropped from the candidate pool for good). Notifications fire on JAN-confirmed and user-confirmed only.

Wrapping up

Even when JANs don't line up, collapsing identity into three certainty tiers makes cross-marketplace comparison hold: confirmed JAN matches go in the table, weak evidence becomes candidates, and unmatchable items fall back to single-store alerts. Not forcing a JAN match on everything is what prevents wrong merges.

The point-inclusive price math, the accumulating mis-match reports, and the notification-trigger rules are written up on the Aulvem site → How Rakuten's API hides the JAN, and the 3-tier cross-marketplace match

Top comments (1)

Nazar Boyko • Jun 27

Okay, the part doing the real work here isn't the JAN scoring, it's the quiet rule that a mid-confidence hit can join the comparison table but never trigger a notification. That split between "good enough to show" and "good enough to alert someone" is the thing most matching code gets wrong, because it treats one threshold as both. Pushing a wrong merge into a price alert is how you lose a user's trust in one tap, so making the bar to notify strictly higher is exactly right. The negative score when the price falls outside the range is a nice touch for the same reason, it actively votes against a bad match instead of just failing to support one.