The Problem with Rigid Schemas for Analytics
Video analytics data is inherently messy. One day you are tracking view counts by region, the next you need device breakdowns, then watch-time percentiles. Adding a column for every new metric leads to a table with 50 nullable columns and constant migrations. At DailyWatch, we solved this with PostgreSQL's JSONB column type.
Schema Design
We keep structured data in normal columns and flexible analytics in JSONB:
CREATE TABLE video_analytics (
id SERIAL PRIMARY KEY,
video_id TEXT NOT NULL REFERENCES videos(video_id),
captured_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
views INTEGER NOT NULL,
likes INTEGER NOT NULL,
metrics JSONB NOT NULL DEFAULT '{}'
);
The metrics column stores everything that might change shape over time:
{
"regions": {"US": 45000, "GB": 12000, "DE": 8500, "FR": 6200},
"devices": {"mobile": 0.62, "desktop": 0.31, "tablet": 0.07},
"watch_time": {"avg_seconds": 142, "p50": 98, "p95": 340},
"engagement": {"like_rate": 0.034, "comment_rate": 0.008},
"traffic_sources": {"search": 0.35, "suggested": 0.42, "external": 0.23}
}
Indexing JSONB for Fast Queries
A GIN index on the JSONB column lets PostgreSQL search inside the JSON efficiently:
-- General-purpose GIN index
CREATE INDEX idx_analytics_metrics ON video_analytics USING GIN(metrics);
-- Targeted index for a specific path (faster, smaller)
CREATE INDEX idx_analytics_regions ON video_analytics
USING GIN((metrics -> 'regions'));
The general GIN index supports containment (@>) and existence (?) operators. The path-specific index is smaller and faster if you know which keys you query most.
Querying JSONB Data
Here is how we extract analytics for the admin dashboard at DailyWatch:
Get regional view breakdown:
SELECT video_id,
metrics -> 'regions' ->> 'US' AS us_views,
metrics -> 'regions' ->> 'DE' AS de_views,
metrics -> 'watch_time' ->> 'avg_seconds' AS avg_watch
FROM video_analytics
WHERE captured_at > NOW() - INTERVAL '24 hours'
ORDER BY views DESC
LIMIT 20;
Find videos where mobile traffic exceeds 70%:
SELECT v.title, a.metrics -> 'devices' ->> 'mobile' AS mobile_share
FROM video_analytics a
JOIN videos v ON v.video_id = a.video_id
WHERE (a.metrics -> 'devices' ->> 'mobile')::float > 0.70
ORDER BY a.captured_at DESC;
Aggregate regional views across all videos:
SELECT
key AS region,
SUM(value::integer) AS total_views
FROM video_analytics,
jsonb_each_text(metrics -> 'regions')
WHERE captured_at > NOW() - INTERVAL '7 days'
GROUP BY key
ORDER BY total_views DESC;
The jsonb_each_text function unpacks a JSONB object into rows — incredibly useful for aggregation across dynamic keys.
Appending to JSONB Without Full Replacement
PostgreSQL's jsonb_set function updates specific paths without replacing the entire object:
-- Add a new traffic source
UPDATE video_analytics
SET metrics = jsonb_set(metrics, '{traffic_sources,direct}', '0.15')
WHERE video_id = 'abc123';
-- Merge new data into existing object
UPDATE video_analytics
SET metrics = metrics || '{"cdn_cost": 0.0023}'::jsonb
WHERE video_id = 'abc123';
The || operator merges objects. Existing keys are overwritten, new keys are added. This is how we incrementally enrich analytics records as new data arrives.
Building a Time-Series View
For dashboard charts, we need metrics over time:
SELECT
date_trunc('hour', captured_at) AS hour,
AVG(views) AS avg_views,
AVG((metrics -> 'watch_time' ->> 'avg_seconds')::float) AS avg_watch_time,
AVG((metrics -> 'devices' ->> 'mobile')::float) AS mobile_share
FROM video_analytics
WHERE video_id = 'abc123'
AND captured_at > NOW() - INTERVAL '7 days'
GROUP BY hour
ORDER BY hour;
This gives us hourly data points for charting view trends, watch time, and device mix — all from a single JSONB column.
When NOT to Use JSONB
JSONB is not a replacement for proper schema design. We use normal columns for data we always query (video_id, views, likes, captured_at) and JSONB for data that is optional, variable-shape, or infrequently filtered.
Bad use of JSONB: storing video_id or title inside JSON when you need to JOIN on them.
Good use of JSONB: storing analytics breakdowns that vary by source and evolve over time.
Performance Notes
On our dataset of ~50,000 analytics records with GIN indexes:
- Path extraction (
->>) queries: 1-3ms - Containment (
@>) queries: 2-5ms -
jsonb_each_textaggregation: 10-20ms - Full-table JSONB aggregation: 40-80ms
The flexibility of JSONB let us iterate on our analytics dashboard without a single migration after the initial table creation.
This article is part of the Building DailyWatch series. Check out DailyWatch to see these techniques in action.
Top comments (0)