How We Rebuilt 500+ Product Pages Using AI, Pipelines, and a Modular Content Backend

Diggi — Sun, 20 Apr 2025 20:25:52 +0000

At Digdep, our goal is to help people find supplements that actually work — not just by claims, but by scientific research and user-reported outcomes.

The catch? We had over 30000+ product-condition combinations (e.g. Vitamin A for acne, Omega 3 for ADHD) and needed to generate trustworthy, dynamic, evolving pages — without hiring a hundred content writers.

So we did what any backend-leaning team would do:

We built a pipeline-first, AI-assisted content system, structured around research data, user reviews, and intent-based modules.

🧱 Architecture Overview
We split the problem into three systems:

Content Orchestration Layer A scheduled ETL engine (Airflow + custom workers) that:

Fetches new research data from PubMed, clinical trial APIs, and internal annotations

Pulls structured review data from reputable sellers.

Normalizes supplement metadata (dosage, source, purity, etc.)

ML/NLP Layer This is where the raw data gets meaning:

Clinical research is chunked, embedded (SBERT), and summarized using a hybrid of GPT-4 + in-house fine-tuned classifiers

Reviews are clustered by condition + sentiment, scored, and tagged (e.g. “2-week results”, “used with zinc”)

FAQ candidates are extracted from natural language queries, Reddit, Quora, and Digdep’s internal search logs

Headless CMS + API Delivery The processed content lives in a GraphQL-accessible store (we use Strapi but heavily extended)

Each page is assembled dynamically on the frontend via metadata-driven composition: which sections to show, what order, how they’re prioritized

Content updates are non-destructive and versioned — users get fresh insights without pages losing their SEO/indexing

🧠 AI Where It Makes Sense
We were careful not to overfit with LLMs. Here’s how we actually use them:

Summarization: Input = abstract + result + cohort size; Output = 2-sentence result with risk qualifiers

Semantic clustering: We embed every user review and map it into symptom categories and conditions (some users don’t say “acne” — they say “skin bumps”)

Question synthesis: LLMs turn query logs into human-readable FAQs, then we pass them through filters for duplication, bias, and hallucination

We built a confidence scoring layer to decide when to show or suppress LLM output. If the model’s not sure, it defers to rules or hides the result.

📦 How Pages Are Built
Each product page is made of composable modules, injected via API:

from the ML pipeline

from review tagging

from research weighting

generated dynamically

based on co-purchase graph

The backend controls what renders, and the frontend just assembles.

We also exposed a JSON manifest for each page so QA/devs can debug pipeline decisions without inspecting raw DB rows.

📊 Feedback Loops
This system let us do things we couldn’t before:

Trigger model re-training when new research changes a supplement’s score

Use search and review logs to automatically discover emerging use-cases (e.g. berberine + PCOS suddenly rising)

Log anonymized click paths to see which modules drive trust, then tune the page structure accordingly

🚀 Results & Takeaways
We scaled to thousands of pages within 2 weeks without bottlenecks

Pages adapt over time as new data/reviews/research arrives

Everything is traceable, explainable, and testable — no “black box content”

If you’re building content at scale in a high-trust domain (health, legal, finance), structured pipelines + LLM-assisted augmentation is a sweet spot. It’s not sexy, but it’s robust.

💬 Curious how we handle edge cases (e.g. conflicting research, multi-supplement effects), cold-start products, or data validation? Drop a question below — always happy to nerd out.

Building DigDep.com: A Dev’s Quest to Open Source Supplement Science

Diggi — Fri, 11 Apr 2025 00:47:26 +0000

If you’ve ever searched for “best supplements for arthritis” or tried to decode ingredient lists on health blogs, you’ve probably landed on Examine.com or Healthline-style articles. They’re useful—but often limited by slow updates, paywalls, or one-size-fits-all summaries.

That’s exactly the problem I’m trying to solve with DigDep.com — a developer-led project to map supplement products directly to clinical research, using AI pipelines and transparent data logic.

🧪 From Ingredients to Research in One Click
Take this for example:
NOW Supplements, Glucosamine & Chondroitin with MSM – Joint Health & Comfort

On that page, you’ll find:

A list of relevant clinical trials on glucosamine, chondroitin, and MSM

Direct citations to PubMed and other research databases

A breakdown of which studies link the supplement to outcomes like reduced joint pain or improved mobility

User reviews, so you can contrast anecdotal experiences with peer-reviewed findings

It’s not just a product page — it’s a research navigator with structured science behind it.

🤖 The AI Behind It
I use a multi-model LLM pipeline to parse research papers, identify connections between ingredients and outcomes (like “arthritis relief”), and then validate those connections with human-like accuracy.

The Stack (Simplified):
Discovery: Lightweight open models scan abstracts for substance–outcome–dosage signals

Validation: GPT-4 or Claude reviews excerpts to eliminate false positives

Summary Matching: A final model cross-references the claim against the research excerpt

All this data is normalized across thousands of entries, so users can go from health goal → compound → product, or the other way around.

🧠 Why Not Just Use Examine?
Because Examine doesn’t link to actual products, and doesn’t let you filter for clinical evidence per product.
DigDep does.

Also:

Examine is paywalled; DigDep is free

Examine is slow to update; DigDep refreshes regularly via automation

Examine doesn’t map individual supplements to reviews and research; DigDep is built for it

And as developers, we can appreciate when a system is built modularly, using pipelines that evolve as the models get smarter.

🧱 It’s a Work in Progress, but Already Useful
So far, I’ve indexed:

20,000+ research papers

Hundreds of common health outcomes (e.g. arthritis, anxiety, weight loss, ADHD)

5,000+ supplements, matched by ingredients and dose

Each listing gets smarter as new research is added. The ultimate goal?
To make DigDep the most trusted and usable research-backed supplement directory out there.

💬 Try It and Tell Me What’s Missing
Here’s that example again:
NOW Glucosamine & Chondroitin – Arthritis Research & Reviews

If you're into LLM applications, health tech, or just curious about turning messy biomedical data into structured, navigable knowledge — I’d love feedback or ideas.

This is open-source in spirit (and maybe soon in code too). If you'd like to collaborate, critique, or just discuss model design — hit me up.

Thanks for reading 🙌

DEV Community: Diggi

How We Rebuilt 500+ Product Pages Using AI, Pipelines, and a Modular Content Backend

Building DigDep.com: A Dev’s Quest to Open Source Supplement Science