Contents11

Trying to compare the same product across Rakuten’s and Yahoo!‘s APIs, the first wall wasn’t price — it was identity. Before you can compute which is cheaper, a machine has to decide that this listing on one marketplace and that listing on the other are the same product. Without that, there’s no comparison table to build.

Where it got stuck was the JAN — the 13- or 8-digit barcode number on a product. Yahoo! Shopping looks one up directly with jan_code. Rakuten’s item search API never returns it. So the obvious plan — pull both marketplaces by the same JAN and tie them together in one shot — breaks on one side.

I built and shipped a price watcher on my own: yasugoro, which compares point-inclusive prices across marketplaces and pings you when a watched staple gets cheaper. To make the cross-marketplace part hold together, I split matching into three certainty tiers, assuming up front that JANs won’t always line up. Here’s that design.

Rakuten and Yahoo! are asymmetric about the JAN

The two marketplaces simply differ in how easily you can get a JAN, and designing as if they were symmetric guarantees the Rakuten side falls apart.

Yahoo! Shopping’s item search takes a dedicated jan_code parameter, so you can query by JAN directly, and the response sometimes carries janCode back. If you know the JAN, you can resolve the listing.

Rakuten’s item search has no JAN parameter. You end up putting the JAN string into keyword — an indirect match. Worse, the JAN isn’t a structured field; it tends to be buried inside the item description (itemCaption). I saw exactly this on model-numbered goods like Anker. So on the Rakuten side you always need a follow-up step: did the keyword=JAN hit actually correspond to that JAN’s product?

That asymmetry is the starting point. Yahoo! you can trust into a confirmed JAN match; Rakuten you can’t use without proof.

Collapse it into three certainty tiers

I split product identity into three tiers, ordered by how certain the match is. Instead of forcing every item through a JAN match, only the ones I can tie down with confidence sit at the top.

TierMethodUsed for comparisonSends alerts
① JAN matchYahoo! direct lookup + Rakuten keyword=JAN scored for confidenceYes (confirmed only)Yes
② Name fuzzy matchScored on model no. / size / brand → candidates → user confirmsAfter user confirmsConfirmed only
③ No matchTreated as single-store (e.g. pasted URL)NoOwn price drops only

The lower the tier, the weaker the evidence, so the right to enter comparison and to fire alerts narrows with it. Only tier ① lands in the table as “the same product to everyone”; ② and ③ are demoted in proportion to how sure the match is.

flowchart TD
  In[Input: JAN / name / URL]
  Kind{Detect kind}
  Jan[① JAN match]
  Score{Confidence score}
  Fuzzy[② Name fuzzy match]
  Cand{Candidates?}
  Single[③ Single-store]
  Cross[Enters comparison, can notify]
  Pending[Candidates, awaiting user confirm]
  Watch[Track own price drops only]

  In --> Kind
  Kind -->|JAN| Jan
  Kind -->|name| Fuzzy
  Kind -->|URL| Single
  Jan --> Score
  Score -->|high ≥0.8| Cross
  Score -->|mid ≥0.5| Pending
  Score -->|low <0.5| Single
  Fuzzy --> Cand
  Cand -->|yes| Pending
  Cand -->|no| Single
  Pending -->|confirm| Cross
  Single --> Watch

A JAN input enters at ①, a name at ②, a URL at ③ — and a weak ① hit falls through to ② or ③. It’s a two-stage funnel, not a single gate.

① JAN match: prove Rakuten hits by scoring “JAN-ness”

Even inside the JAN tier, Rakuten needs proof that a keyword=JAN hit is really the same product. I made that an additive confidence score.

Yahoo! comes in via jan_code lookup, so it goes straight to confirmed (alerts allowed). Rakuten is the awkward one — each hit accumulates a score:

  • JAN string appears in the item name or caption … +0.5 (the most direct evidence)
  • price falls inside the product-search API’s price range … +0.3 / outside it … −0.4
  • a model number or size can be read off the hit … +0.2

Thresholds split the score three ways. ≥0.8 is confirmed (jan, can notify), ≥0.5 is needs-review (demoted to fuzzy, no notify), and anything under 0.5 is discarded (kept out of comparison). Refusing to treat a weak hit as confirmed turned out to be the single most effective guard against wrong merges.

// Rakuten single-hit JAN confidence (excerpt)
let s = 0;
if (hit.itemCaption.includes(jan) || hit.itemName.includes(jan)) s += 0.5;
if (range) {
  const lo = range.minPrice * 0.7;   // allow 30% below (old models, parallel imports)
  const hi = range.maxPrice * 1.5;   // allow 50% above (shipping-included, bundles)
  if (hit.itemPrice >= lo && hit.itemPrice <= hi) s += 0.3;
  else s -= 0.4;                      // far outside = likely a bad match
}
if (modelOrCapacityConsistent(hit)) s += 0.2;

The price range comes from Rakuten’s product search API (productCode=JAN), which returns a min and max price. I allow 0.7× below for old stock and parallel imports, and 1.5× above for shipping-included or bundled listings. A price far outside that band gets penalized — it’s probably a different size, a different color, or another product entirely.

The catch: in practice that product search API returned 404 a lot. The price range is missing for many hits, so I can’t lean on the range term. To cover that, I added a check anchored on the cross-marketplace counterpart — the same-JAN Yahoo! hit’s representative price. If a model-number token pulled from the Yahoo! side (alphanumeric, 5+ chars, so effectively unique) appears in the Rakuten hit’s name, and the price is within ±40% of the counterpart’s price, the hit is promoted to confirmed. The model token alone could collide by chance; the price band screens that out. Both conditions have to hold.

You can see the comparison screen running in yasugoro’s live app — free, no login. When the same product lines up across marketplaces, that’s a tier ① confirmation.

② Name fuzzy match: stop at candidates, don’t auto-confirm

When all you have is a product name, I score it on model number, size, and brand and stop at surfacing candidates. Nothing auto-confirms as the same product.

The fuzzy score breaks down like this — the model number is the strongest signal by far:

  • model number matches … +0.5
  • brand matches … +0.2
  • size matches … +0.2
  • overlap of remaining tokens (Jaccard) … +0.1 × coefficient

A model number like EH-NA0J is pulled from the name by regex, and a size like 54 sheets × 3 is decomposed into “54 per pack × 3 packs = 162 total” before comparing. Only candidates scoring ≥0.6 are surfaced; if none clear it, the single best one drops to tier ③.

The person confirms the candidate, not the algorithm. Only what a user marks as “same product” gets mastered. A wrong pairing can be rejected in one tap, which removes that pair from the candidate pool for good (it won’t resurface). For cross-marketplace comparison of household staples, putting a plain confirm path and a plain undo path in front of the user beats endlessly chasing higher auto-match accuracy.

③ No match: fall back to single-store drop alerts

Anything that couldn’t be tied down at any tier becomes a single-store item. It stays out of the cheapest-across-stores ranking, and only its own page’s price drops are tracked.

A single item added by pasting a URL starts here from the outset. It isn’t on the comparison footing, so it never claims “cheaper than another store” — but it can still say “cheaper than before.” I’d rather do the honest, smaller thing than force an incomparable item into the ranking and emit a wrong “cheapest.”

What stops wrong merges is “only confirmed tiers notify”

The real reason for three tiers was to never fire an alert on a bad identity. Notifications can only fire on JAN-confirmed (jan) and user-confirmed (user_confirmed) matches. The fuzzy needs-review state and single-store items are both kept out of the trigger condition.

Locking the right to notify behind identity certainty — before any price math — matters because otherwise a price drop on a different product gets reported as “your watched item got cheaper.” The trust in a cross-marketplace comparison rests, in the end, on that identity footing.

How I compute the point-inclusive real price (splitting price into three certainty layers) is its own post: the effective-price design notes. That one is “how the price is computed,” this one is “how the same product is identified” — you need both for the comparison to hold. The running thing is yasugoro if you want to poke at it.

FAQ

Can you not get the JAN from Rakuten’s API at all?

There’s no JAN field in the item search response and no JAN-specific search parameter. In practice you throw the JAN string into the keyword parameter, and the JAN itself usually sits inside the item caption rather than a structured field. Yahoo! Shopping, by contrast, has a jan_code parameter that looks it up directly and returns janCode in the response — and that asymmetry is where the whole cross-marketplace problem starts.

Why not just match everything by JAN?

Force a JAN match on items where the JAN isn’t available and you start merging different products as one — the failure I most wanted to avoid. Rakuten doesn’t return the JAN directly, so a keyword=JAN hit needs separate proof that it’s really that product. Only items I can tie down by JAN go into the confirmed tier; weaker evidence drops to candidate suggestions, and unmatchable items drop to single-store alerts.

Why doesn’t the fuzzy match confirm automatically?

A matching model number, size, and brand still only give you a probability, and auto-confirming a guess means alerts fire on a wrong merge. The fuzzy tier surfaces candidates and only masters the ones a person confirms; a wrong pairing can be rejected in one tap, which permanently drops that pair from the candidate pool. Notifications fire only on JAN-confirmed and user-confirmed matches.

What happens to items that don’t match?

They become single-store items: dropped from cross-marketplace comparison, with only their own page’s price drops tracked. Anything you add by pasting a URL starts here too. They never enter the cheapest-across-stores ranking — they’re watched as a standalone price, and alerts only say “cheaper than before”, never “cheaper than the other store”.