Daniel Rozin

Posted on Apr 8 • Originally published at aversusb.net

Building a Product Trust Score from 50+ Review Sources

#tutorial #ai #machinelearning #webdev

How much should you trust a product's 4.5-star rating on Amazon?

Probably less than you think. Single-source ratings are noisy — influenced by incentivized reviews, review bombing, selection bias, and platform-specific quirks. A product can be 4.8 on Amazon and 3.2 on Reddit.

At SmartReview, we built a trust score that aggregates ratings from 50+ sources into a single, weighted number. Here's how it works and why multi-source aggregation produces more reliable product assessments.

The Problem with Single-Source Ratings

Every review platform has biases:

Platform	Typical Bias	Why
Amazon	Skews high (4.0-4.5)	Incentivized reviews, seller pressure
Reddit	Skews negative	Self-selection — people post when frustrated
RTINGS	Neutral but narrow	Lab-tested, limited to measurable specs
YouTube	Varies by creator	Sponsorship influence, entertainment value
G2/Capterra	Skews high (4.0+)	Vendors incentivize reviews with gift cards

No single source tells the whole story. A product with 4.8 stars on Amazon might have genuine quality issues that only surface in Reddit discussions or RTINGS lab tests.

Our Trust Score Architecture

The trust score is a weighted average across sources, where each source's weight reflects its reliability for the product category.

Step 1: Source Collection

We collect reviews and ratings from multiple source types:

interface ReviewSource {
  platform: string;          // amazon, reddit, rtings, youtube, etc.
  rating: number | null;     // normalized to 0-5 scale
  reviewCount: number;       // volume of reviews
  sentimentScore: number;    // NLP-derived from text, -1 to 1
  recency: Date;            // when reviews were collected
  verified: boolean;         // whether platform verifies purchases
}

Step 2: Source Weighting

Not all sources are equal. We weight them based on three factors:

Verification weight — platforms that verify purchases get higher weight:

const verificationMultiplier = source.verified ? 1.5 : 1.0;

Volume weight — more reviews = more statistical confidence:

const volumeWeight = Math.min(Math.log10(source.reviewCount + 1) / 4, 1.0);

Recency weight — recent reviews matter more (products change):

const daysSinceCollection = differenceInDays(new Date(), source.recency);
const recencyWeight = Math.max(1 - (daysSinceCollection / 365), 0.3);

Category-specific weight — RTINGS matters more for headphones than for coffee makers:

const categoryWeights: Record<string, Record<string, number>> = {
  headphones: { rtings: 1.8, amazon: 1.0, reddit: 1.3, youtube: 1.2 },
  coffee_makers: { amazon: 1.4, reddit: 1.2, youtube: 1.5, rtings: 0.5 },
  mattresses: { reddit: 1.5, sleepfoundation: 1.6, amazon: 0.8 },
};

Step 3: Combining Scores

The final trust score combines the numerical rating with sentiment analysis:

function calculateTrustScore(sources: ReviewSource[], category: string): number {
  let weightedSum = 0;
  let totalWeight = 0;

  for (const source of sources) {
    const catWeight = categoryWeights[category]?.[source.platform] ?? 1.0;
    const verWeight = source.verified ? 1.5 : 1.0;
    const volWeight = Math.min(Math.log10(source.reviewCount + 1) / 4, 1.0);
    const recWeight = Math.max(
      1 - differenceInDays(new Date(), source.recency) / 365, 0.3
    );

    const weight = catWeight * verWeight * volWeight * recWeight;

    // Blend numerical rating (70%) with sentiment (30%)
    const normalizedSentiment = (source.sentimentScore + 1) * 2.5; // -1..1 -> 0..5
    const blendedScore = source.rating !== null
      ? source.rating * 0.7 + normalizedSentiment * 0.3
      : normalizedSentiment;

    weightedSum += blendedScore * weight;
    totalWeight += weight;
  }

  return totalWeight > 0 ? weightedSum / totalWeight : 0;
}

Step 4: Confidence Level

A trust score without confidence is misleading. We calculate confidence based on source diversity and volume:

function calculateConfidence(sources: ReviewSource[]): "high" | "medium" | "low" {
  const uniquePlatforms = new Set(sources.map(s => s.platform)).size;
  const totalReviews = sources.reduce((sum, s) => sum + s.reviewCount, 0);

  if (uniquePlatforms >= 4 && totalReviews >= 500) return "high";
  if (uniquePlatforms >= 2 && totalReviews >= 50) return "medium";
  return "low";
}

We display this alongside the score: "4.3/5 trust score (high confidence, 12 sources)" vs "4.1/5 trust score (low confidence, 2 sources)".

Sentiment Analysis: Beyond Star Ratings

Star ratings miss nuance. A 4-star review might say "Great sound but terrible battery." We extract attribute-level sentiment:

interface AttributeSentiment {
  attribute: string;      // "battery_life", "sound_quality", etc.
  sentiment: number;      // -1 to 1
  mentions: number;       // how many reviews mention this
  sampleQuotes: string[]; // representative quotes
}

This powers our comparison pages — instead of just showing "Product A: 4.3 vs Product B: 4.1", we can show:

Sound quality: A wins (0.82 vs 0.71 sentiment)
Battery life: B wins (0.65 vs 0.31 sentiment)
Comfort: Tie (0.73 vs 0.70 sentiment)

This attribute-level breakdown is what makes comparison content genuinely useful.

Handling Edge Cases

Products with few reviews

New products may only have YouTube first-look reviews. We lower confidence but still generate a provisional score, clearly labeled.

Conflicting sources

When Amazon says 4.8 but Reddit says 2.5, we don't just average — we flag the disagreement in the UI. Conflict itself is valuable information.

Review freshness

Product quality changes over time (firmware updates, manufacturing changes). We decay old reviews and re-collect quarterly for active comparison pages.

Gaming detection

Sudden spikes in 5-star reviews with similar language patterns trigger a flag. We don't remove them, but we reduce their weight.

Results

After implementing multi-source trust scores across 10,000+ products:

User engagement increased 40% on pages showing trust scores vs raw ratings
Time on page increased 25% — users explore attribute breakdowns
Affiliate CTR improved 15% — confident scores drive purchase decisions
Trust score diverged from Amazon rating by >0.5 stars in 23% of products — these are the cases where aggregation adds the most value

The biggest insight: the products where our trust score differs most from Amazon's rating are the products where users find our comparisons most valuable. Disagreement between sources is where the signal lives.

Try It

Every comparison page on aversusb.net shows trust scores with confidence levels. Compare any two products and you'll see attribute-level sentiment breakdowns drawn from real user reviews.

Part 5 of our "Building SmartReview" series. Previous: Part 4: JSON-LD for Product Comparisons

DEV Community