Why AI price estimates lie (and how to anchor them to real market data)

#ai #javascript #opensource #indiehackers

A user of the resale tool I built sent me a message last week: "Estimates aren't accurate."

He'd scouted a vintage clock at an estate sale. The AI said it would sell for $15-30. He found a comp on eBay selling for $50.

The problem wasn't the AI's reasoning. The problem was where the answer came from.

The naive approach
The first version of my pricing logic was embarrassingly simple. I asked OpenAI's gpt-4o, in the same prompt as the item identification, to return:

{
"estimated_resale_range_usd": "$25-$45",
"max_buy_price_usd": "$12",
"days_to_sell": "1-2 weeks"
}
This works… some of the time. For commodity items in a category the model has lots of training data on (sneakers, popular electronics, common collectibles), the estimates are decent.

For everything else — vintage tools, regional brands, obscure collectibles, anything with limited public market data — the AI is guessing from training data that's 18+ months stale.

That's not estimation. That's hallucination dressed up as a price.

What I actually wanted
Real-time, real market data. Specifically, eBay's sold-listings history — the gold standard of resale pricing because it tells you what items actually transact at, not what sellers hope to get.

eBay has an API for this: the Marketplace Insights API, specifically the getItemSalesHistory endpoint.

I applied for access. eBay reviewed it. They denied it.

"The Marketplace Insights API is protected by specific OAuth scopes and requires approval of business use cases following eBay's internal processes. Access to this API is highly limited and generally reserved for eBay's approved partners only."

Fair enough. Time for plan B.

Anchoring to what eBay DOES give you
The eBay Browse API is publicly accessible (with an OAuth2 token from a registered developer account). It returns currently active listings — asking prices, not sold prices.

These two prices are not the same. On average:

Asking prices tend to be optimistic. Many sellers list high hoping someone bites.
Sold prices typically run 65-85% of the median asking price for the same item.
The ratio varies by demand: hot categories sell closer to asking, slow categories sell further below.
So while I don't have direct sold-data access, I can infer it from a representative sample of current asking prices, adjusted for category demand.

The math
Here's the actual function from my codebase:

function refineEstimateFromEbay(ebay, demand) {
if (!ebay) return null;
const sampled = Number(ebay.sampled || 0);
const median = Number(ebay.median || 0);
const min = Number(ebay.min || 0);

// Need a reasonable sample to override the AI
if (sampled < 5 || median <= 0) return null;

// Sold-to-asking multipliers by demand level
const mult = {
high: { low: 0.80, high: 0.98 },
medium: { low: 0.65, high: 0.88 },
low: { low: 0.50, high: 0.78 },
};
const m = mult[demand] || mult.medium;

let lowEst = Math.round(median * m.low);
let highEst = Math.round(median * m.high);

// Floor at 85% of cheapest asking — protects against sold being weirdly lower
if (min > 0) lowEst = Math.max(lowEst, Math.round(min * 0.85));

// Ensure low < high after the floor
if (lowEst >= highEst) highEst = Math.round(lowEst * 1.15);

return { range: $${lowEst}-$${highEst}, lowValue: lowEst, highValue: highEst };
}
A few things worth calling out:

The 5-listing threshold. If fewer than 5 currently-listed items match, I bail out and fall back to the AI's estimate. With too small a sample, outliers wreck the median.
Demand-adjusted multipliers. The AI returns a demand level (low/medium/high) as part of its identification. I use that signal to widen or tighten the multiplier — because a high-demand item moves closer to asking, while a low-demand item often sits and discounts.
The min-price floor. If the cheapest current listing is $40 and my math produces $20, something's off — either my multiplier is wrong or the floor is right. I take 85% of min as a safety floor.
Outlier filtering on the sample
The other half of the fix: don't trust a raw median.

eBay listings have outliers in both directions — the $1 misprice, the $500 listing of a $30 item. I drop the top and bottom 10% before computing the median when I have 10+ samples:

prices.sort((a, b) => a - b);

if (prices.length >= 10) {
const cut = Math.floor(prices.length * 0.1);
prices = prices.slice(cut, prices.length - cut);
}

const median = prices[Math.floor(prices.length / 2)];
I also filter "junk listings" where shipping cost is more than 2× the item price (scammer pattern):

const items = rawItems.filter((it) => {
const itemPrice = parseFloat(it?.price?.value);
const opts = it?.shippingOptions || [];
const minShipping = opts
.map((o) => parseFloat(o?.shippingCost?.value))
.filter((v) => !isNaN(v) && v >= 0)
.sort((a, b) => a - b)[0];
if (!isNaN(itemPrice) && itemPrice > 0 && !isNaN(minShipping) && minShipping > itemPrice * 2) {
return false;
}
return true;
});
What changed in production
Before this refactor, my user's vintage clock estimate was the AI-hallucinated $15-30 range.

After: the same item with 10 active eBay listings (median $30, demand medium) produces an eBay-anchored range of $20-26. Tighter, more accurate, anchored to today's reality rather than 2024's training cutoff.

Max-buy is recalculated from the refined low end times 0.40 (a 50%-after-fees flipper margin):

function calculateMaxBuy(lowResaleValue) {
if (!lowResaleValue || lowResaleValue <= 0) return null;
return $${Math.max(1, Math.floor(lowResaleValue * 0.40))};
}
The pattern
If you're building anything that uses AI to estimate a market-driven number — prices, demand, salaries, valuations — AI alone is not a market signal. It's a guess about what the market signal might be, made from data that was already old when the model was trained.

The real signal lives in real-time APIs. Use those when you can. Anchor to them when you can. Treat the AI as a categorization layer and a narrative generator — not as a source of truth for time-sensitive market data.

The hybrid works: AI for what the thing is, real-time API for what it's worth.

Live demo: getresalescout.com — snap a photo of any thrift store item to see this logic in action. Free, no signup for the first 5 scouts.

If you've solved this problem differently, or know a workaround for restricted sold-data APIs, I'd love to hear it.