DEV Community

Daniel Rozin
Daniel Rozin

Posted on • Originally published at aversusb.net

Building a Product Trust Score from 50+ Review Sources

How much should you trust a product's 4.5-star rating on Amazon?

Probably less than you think. Single-source ratings are noisy — influenced by incentivized reviews, review bombing, selection bias, and platform-specific quirks. A product can be 4.8 on Amazon and 3.2 on Reddit.

At SmartReview, we built a trust score that aggregates ratings from 50+ sources into a single, weighted number. Here's how it works and why multi-source aggregation produces more reliable product assessments.

The Problem with Single-Source Ratings

Every review platform has biases:

Platform Typical Bias Why
Amazon Skews high (4.0-4.5) Incentivized reviews, seller pressure
Reddit Skews negative Self-selection — people post when frustrated
RTINGS Neutral but narrow Lab-tested, limited to measurable specs
YouTube Varies by creator Sponsorship influence, entertainment value
G2/Capterra Skews high (4.0+) Vendors incentivize reviews with gift cards

No single source tells the whole story. A product with 4.8 stars on Amazon might have genuine quality issues that only surface in Reddit discussions or RTINGS lab tests.

Our Trust Score Architecture

The trust score is a weighted average across sources, where each source's weight reflects its reliability for the product category.

Step 1: Source Collection

We collect reviews and ratings from multiple source types:

interface ReviewSource {
  platform: string;          // amazon, reddit, rtings, youtube, etc.
  rating: number | null;     // normalized to 0-5 scale
  reviewCount: number;       // volume of reviews
  sentimentScore: number;    // NLP-derived from text, -1 to 1
  recency: Date;            // when reviews were collected
  verified: boolean;         // whether platform verifies purchases
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Source Weighting

Not all sources are equal. We weight them based on three factors:

Verification weight — platforms that verify purchases get higher weight:

const verificationMultiplier = source.verified ? 1.5 : 1.0;
Enter fullscreen mode Exit fullscreen mode

Volume weight — more reviews = more statistical confidence:

const volumeWeight = Math.min(Math.log10(source.reviewCount + 1) / 4, 1.0);
Enter fullscreen mode Exit fullscreen mode

Recency weight — recent reviews matter more (products change):

const daysSinceCollection = differenceInDays(new Date(), source.recency);
const recencyWeight = Math.max(1 - (daysSinceCollection / 365), 0.3);
Enter fullscreen mode Exit fullscreen mode

Category-specific weight — RTINGS matters more for headphones than for coffee makers:

const categoryWeights: Record<string, Record<string, number>> = {
  headphones: { rtings: 1.8, amazon: 1.0, reddit: 1.3, youtube: 1.2 },
  coffee_makers: { amazon: 1.4, reddit: 1.2, youtube: 1.5, rtings: 0.5 },
  mattresses: { reddit: 1.5, sleepfoundation: 1.6, amazon: 0.8 },
};
Enter fullscreen mode Exit fullscreen mode

Step 3: Combining Scores

The final trust score combines the numerical rating with sentiment analysis:

function calculateTrustScore(sources: ReviewSource[], category: string): number {
  let weightedSum = 0;
  let totalWeight = 0;

  for (const source of sources) {
    const catWeight = categoryWeights[category]?.[source.platform] ?? 1.0;
    const verWeight = source.verified ? 1.5 : 1.0;
    const volWeight = Math.min(Math.log10(source.reviewCount + 1) / 4, 1.0);
    const recWeight = Math.max(
      1 - differenceInDays(new Date(), source.recency) / 365, 0.3
    );

    const weight = catWeight * verWeight * volWeight * recWeight;

    // Blend numerical rating (70%) with sentiment (30%)
    const normalizedSentiment = (source.sentimentScore + 1) * 2.5; // -1..1 -> 0..5
    const blendedScore = source.rating !== null
      ? source.rating * 0.7 + normalizedSentiment * 0.3
      : normalizedSentiment;

    weightedSum += blendedScore * weight;
    totalWeight += weight;
  }

  return totalWeight > 0 ? weightedSum / totalWeight : 0;
}
Enter fullscreen mode Exit fullscreen mode

Step 4: Confidence Level

A trust score without confidence is misleading. We calculate confidence based on source diversity and volume:

function calculateConfidence(sources: ReviewSource[]): "high" | "medium" | "low" {
  const uniquePlatforms = new Set(sources.map(s => s.platform)).size;
  const totalReviews = sources.reduce((sum, s) => sum + s.reviewCount, 0);

  if (uniquePlatforms >= 4 && totalReviews >= 500) return "high";
  if (uniquePlatforms >= 2 && totalReviews >= 50) return "medium";
  return "low";
}
Enter fullscreen mode Exit fullscreen mode

We display this alongside the score: "4.3/5 trust score (high confidence, 12 sources)" vs "4.1/5 trust score (low confidence, 2 sources)".

Sentiment Analysis: Beyond Star Ratings

Star ratings miss nuance. A 4-star review might say "Great sound but terrible battery." We extract attribute-level sentiment:

interface AttributeSentiment {
  attribute: string;      // "battery_life", "sound_quality", etc.
  sentiment: number;      // -1 to 1
  mentions: number;       // how many reviews mention this
  sampleQuotes: string[]; // representative quotes
}
Enter fullscreen mode Exit fullscreen mode

This powers our comparison pages — instead of just showing "Product A: 4.3 vs Product B: 4.1", we can show:

  • Sound quality: A wins (0.82 vs 0.71 sentiment)
  • Battery life: B wins (0.65 vs 0.31 sentiment)
  • Comfort: Tie (0.73 vs 0.70 sentiment)

This attribute-level breakdown is what makes comparison content genuinely useful.

Handling Edge Cases

Products with few reviews

New products may only have YouTube first-look reviews. We lower confidence but still generate a provisional score, clearly labeled.

Conflicting sources

When Amazon says 4.8 but Reddit says 2.5, we don't just average — we flag the disagreement in the UI. Conflict itself is valuable information.

Review freshness

Product quality changes over time (firmware updates, manufacturing changes). We decay old reviews and re-collect quarterly for active comparison pages.

Gaming detection

Sudden spikes in 5-star reviews with similar language patterns trigger a flag. We don't remove them, but we reduce their weight.

Results

After implementing multi-source trust scores across 10,000+ products:

  • User engagement increased 40% on pages showing trust scores vs raw ratings
  • Time on page increased 25% — users explore attribute breakdowns
  • Affiliate CTR improved 15% — confident scores drive purchase decisions
  • Trust score diverged from Amazon rating by >0.5 stars in 23% of products — these are the cases where aggregation adds the most value

The biggest insight: the products where our trust score differs most from Amazon's rating are the products where users find our comparisons most valuable. Disagreement between sources is where the signal lives.

Try It

Every comparison page on aversusb.net shows trust scores with confidence levels. Compare any two products and you'll see attribute-level sentiment breakdowns drawn from real user reviews.


Part 5 of our "Building SmartReview" series. Previous: Part 4: JSON-LD for Product Comparisons

Top comments (0)