Most analytics pipelines measure what happened. I wanted to measure why it matters, using LLM-powered semantic enrichment to understand content quality, not just view counts.
Here's the architecture that makes it possible: a medallion-style YouTube analytics pipeline with AWS Bedrock for semantic intelligence.

A layered approach: EventBridge orchestration → AWS Glue processing → Bedrock semantic enrichment → Athena analytics
Key Design Decisions
1. **Medallion Architecture (Bronze → Silver → Gold)**
Bronze: Raw YouTube API snapshots (append-only historical record)
Silver: Cleaned, normalized data with growth metrics
Gold: Behavioral metrics + LLM-enriched semantic attributes
2. **Semantic Enrichment as a Separate Layer**
The critical choice: enrich in Gold, not Silver.
Why? Content attributes (educational depth, emotional tone, clickbait score) are static. View counts change daily. Enriching in Gold means:
Enrich once, not on every daily run
30x cost savings on Bedrock API calls
Can backfill semantic analysis without reprocessing historical data
3. Constrained LLM Outputs
Structured prompts that return JSON with bounded fields:
{
"educational_depth": 7,
"sensationalism": 3,
"emotional_tone": "positive",
"clickbait_coefficient": 2
}
This reduced parsing errors from 35% to under 5%.
What This Enables
Instead of asking "How many views did this video get?"
I can ask:
Do sensational videos grow faster but decay quicker?
Does educational depth predict long-term stability?
Is clickbait sustainable?
This is about measuring content quality vs. growth sustainability—not just counting clicks.
The Stack
Orchestration: EventBridge + Lambda
Storage: S3 (partitioned by date)
Processing: AWS Glue (PySpark)
LLM Enrichment: Amazon Bedrock (Claude)
Analytics: Athena + SQL
Read More
Want the full technical breakdown including:
Schema evolution failures and fixes
IAM debugging for Bedrock
Cost optimization strategies
What I'd do differently (dbt, feature stores)
📝 Full article: Medium
📝 The product thinking behind it: Mood Meets Media
*💻 GitHub: *Live code
Top comments (1)