DEV Community

Cover image for TimescaleDB Outperforms MongoDB for JSON Logs (100M Document Benchmark)
Polliog
Polliog

Posted on

TimescaleDB Outperforms MongoDB for JSON Logs (100M Document Benchmark)

When building Logtide, we needed to store millions of JSON log documents. The "obvious" choice seemed to be MongoDB — it's a document database designed for JSON, right?

We chose TimescaleDB instead. Here's why.

The "Boring Tech" Philosophy

We follow a simple rule: use proven, stable technology unless you have a specific reason not to.

TimescaleDB is PostgreSQL (battle-tested for 25+ years) with automatic time-series optimizations. It's reliable, well-documented, and deploys anywhere PostgreSQL runs.

Adding MongoDB would mean:

  • Another database to manage
  • Another skill set for the team
  • Another backup strategy
  • Another monitoring system

We wanted to keep our stack simple. TimescaleDB. That's it.

But we needed proof it could handle our workload.

The Benchmark Setup

I loaded 100 million JSON log documents into both databases to compare:

  • Write throughput (logs/second)
  • Query performance (complex filters)
  • Storage efficiency (compression)
  • Resource usage (RAM, CPU)

Test Data Schema

Real production log structure:

{
  "timestamp": "2026-02-16T10:30:45.123Z",
  "level": "error",
  "service": "api-gateway",
  "message": "Database connection timeout",
  "metadata": {
    "user_id": "usr_abc123",
    "request_id": "req_xyz789",
    "duration_ms": 5234,
    "endpoint": "/api/v1/users",
    "status_code": 500
  },
  "tags": ["production", "critical"],
  "context": {
    "environment": "prod-eu-west-1",
    "version": "v2.3.1"
  }
}
Enter fullscreen mode Exit fullscreen mode

TimescaleDB Schema (PostgreSQL Extension)

-- Create hypertable (automatic time-based partitioning)
CREATE TABLE logs (
  id BIGSERIAL,
  time TIMESTAMPTZ NOT NULL,
  level TEXT NOT NULL,
  service TEXT NOT NULL,
  message TEXT,
  metadata JSONB,  -- JSON documents
  tags TEXT[],
  context JSONB,
  project_id UUID NOT NULL
);

-- Convert to hypertable (TimescaleDB magic)
SELECT create_hypertable('logs', 'time', chunk_time_interval => INTERVAL '1 day');

-- GIN indexes for JSONB queries
CREATE INDEX idx_logs_metadata_gin ON logs USING GIN (metadata);
CREATE INDEX idx_logs_context_gin ON logs USING GIN (context);

-- Composite index for common filters
CREATE INDEX idx_logs_project_service_time 
  ON logs (project_id, service, time DESC);

-- Automatic compression (90% space reduction)
ALTER TABLE logs SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'project_id, service',
  timescaledb.compress_orderby = 'time DESC'
);

-- Compress data older than 1 day
SELECT add_compression_policy('logs', INTERVAL '1 day');
Enter fullscreen mode Exit fullscreen mode

Key differences from plain PostgreSQL:

  • Hypertables: Automatic partitioning by time (daily chunks)
  • Compression: Columnar compression (like Parquet) on old data
  • Chunk management: Drop old chunks instantly (vs slow DELETE queries)

MongoDB Schema

db.createCollection("logs", {
  validator: {
    $jsonSchema: {
      required: ["timestamp", "level", "service"],
      properties: {
        timestamp: { bsonType: "date" },
        level: { bsonType: "string" },
        service: { bsonType: "string" },
        message: { bsonType: "string" },
        metadata: { bsonType: "object" },
        tags: { bsonType: "array" },
        context: { bsonType: "object" }
      }
    }
  }
});

// Indexes
db.logs.createIndex({ "timestamp": -1 });
db.logs.createIndex({ "service": 1, "timestamp": -1 });
db.logs.createIndex({ "metadata.user_id": 1 });
Enter fullscreen mode Exit fullscreen mode

Benchmark Results

1. Write Throughput

Test: Insert 100M logs as fast as possible, batched (1000 logs per batch).

Database Logs/Second Time to 100M
TimescaleDB 18,500 1h 30m
MongoDB 14,200 1h 57m

TimescaleDB wins by 30% on writes.

Why? Hypertable chunking. TimescaleDB automatically partitions data into daily chunks. Inserts hit smaller, more manageable tables instead of one giant table. MongoDB's WiredTiger engine is fast, but TimescaleDB's PostgreSQL COPY protocol for bulk inserts + chunking strategy is brutal.

Example: Inserting logs for February 16, 2026 only touches the _hyper_1_chunk_47 chunk, not the entire 100M row table.

2. Query Performance

Test 1: Simple filter (last 24 hours, service = "api-gateway")

-- TimescaleDB
SELECT * FROM logs 
WHERE time > NOW() - INTERVAL '24 hours'
  AND service = 'api-gateway'
ORDER BY time DESC 
LIMIT 100;
Enter fullscreen mode Exit fullscreen mode
// MongoDB
db.logs.find({
  timestamp: { $gte: new Date(Date.now() - 86400000) },
  service: "api-gateway"
}).sort({ timestamp: -1 }).limit(100);
Enter fullscreen mode Exit fullscreen mode
Database P50 Latency P95 Latency P99 Latency
TimescaleDB 42ms 68ms 95ms
MongoDB 58ms 103ms 187ms

TimescaleDB's chunk exclusion is the secret. The query planner knows "last 24 hours" = only scan yesterday's and today's chunks (2 chunks), not all 100 chunks. MongoDB scans indexes across the entire collection.

Test 2: Complex JSONB query (nested field filter)

-- TimescaleDB
SELECT * FROM logs 
WHERE metadata->>'user_id' = 'usr_abc123'
  AND (metadata->>'duration_ms')::int > 5000
  AND level = 'error'
ORDER BY time DESC 
LIMIT 100;
Enter fullscreen mode Exit fullscreen mode
// MongoDB
db.logs.find({
  "metadata.user_id": "usr_abc123",
  "metadata.duration_ms": { $gt: 5000 },
  level: "error"
}).sort({ timestamp: -1 }).limit(100);
Enter fullscreen mode Exit fullscreen mode
Database P50 Latency P95 Latency P99 Latency
TimescaleDB 78ms 142ms 223ms
MongoDB 94ms 198ms 341ms

PostgreSQL's GIN indexes on JSONB (inherited by TimescaleDB) are insanely fast. MongoDB's multi-key indexes are good, but the PostgreSQL query planner optimizes JSONB queries aggressively.

Test 3: Aggregation (count errors per service, last 7 days)

-- TimescaleDB
SELECT service, COUNT(*) 
FROM logs 
WHERE time > NOW() - INTERVAL '7 days'
  AND level = 'error'
GROUP BY service 
ORDER BY COUNT(*) DESC;
Enter fullscreen mode Exit fullscreen mode
// MongoDB
db.logs.aggregate([
  { $match: { 
    timestamp: { $gte: new Date(Date.now() - 604800000) },
    level: "error" 
  }},
  { $group: { _id: "$service", count: { $sum: 1 } }},
  { $sort: { count: -1 }}
]);
Enter fullscreen mode Exit fullscreen mode
Database Query Time
TimescaleDB 1.2s
MongoDB 2.8s

TimescaleDB leverages PostgreSQL's parallel query execution (max_parallel_workers_per_gather=4) + chunk exclusion. Only 7 chunks scanned instead of 100.

3. Storage Efficiency

Test: 100M logs stored, with compression enabled.

Database Raw Size Compressed Size Compression Ratio
TimescaleDB 150 GB 15 GB 10x
MongoDB (WiredTiger) 148 GB 42 GB 3.5x

TimescaleDB compression is the killer feature. It uses columnar compression on hypertable chunks, similar to Parquet files.

When you compress a chunk, TimescaleDB:

  1. Converts row-based storage → column-based
  2. Applies compression algorithms per column (different algos for text, numbers, JSONB)
  3. Stores compressed chunk separately
  4. Query planner transparently decompresses on-the-fly

MongoDB's WiredTiger compression is good (snappy/zlib), but it's row-based. TimescaleDB's columnar approach + TOAST (The Oversized-Attribute Storage Technique) for JSONB fields is brutal.

Our compression policy:

SELECT add_compression_policy('logs', INTERVAL '1 day');
Enter fullscreen mode Exit fullscreen mode

Logs older than 1 day are automatically compressed. Query performance on compressed data? Still fast (50-150ms for most queries) because the query planner only decompresses relevant chunks.

4. Resource Usage

Test: 100M logs, 10 concurrent query clients.

Database RAM Usage CPU Usage (avg)
TimescaleDB 8 GB 45%
MongoDB 12 GB 62%

TimescaleDB's shared_buffers (2 GB) + OS page cache made it more memory-efficient. MongoDB's WiredTiger cache was set to 6 GB, but it still used more overall RAM.

Chunk-based architecture helps: Only active chunks (recent logs) stay hot in memory. Old compressed chunks are rarely accessed.

Where MongoDB Wins

I'm not saying TimescaleDB is always better. MongoDB has real advantages:

1. Horizontal Scaling (Sharding)

MongoDB's sharding is production-ready out of the box. TimescaleDB has multi-node (distributed hypertables), but it's enterprise-only and more complex than MongoDB's sharding.

If you need to scale beyond a single node (100M+ logs/day), MongoDB's architecture is designed for this. TimescaleDB single-node can handle a lot (we're doing 500k/day on 8GB RAM), but MongoDB wins for massive scale.

2. Schema Flexibility

MongoDB truly doesn't care about schema. TimescaleDB (PostgreSQL) requires a table definition. If your JSON structure is wildly unpredictable, MongoDB might be easier.

That said, JSONB columns give you schema flexibility where you need it. We define fixed columns (time, level, service) and dump everything else into metadata JSONB.

3. Change Streams

MongoDB's change streams (real-time notifications) are built-in. TimescaleDB uses PostgreSQL's LISTEN/NOTIFY, which works but is more limited.

However, TimescaleDB has continuous aggregates — pre-computed materialized views that refresh automatically. For dashboards, this is often better than change streams.

Where TimescaleDB Wins

1. Time-Series Superpowers

Hypertables, compression, continuous aggregates, retention policies — all automatic.

Example: Delete logs older than 90 days:

SELECT add_retention_policy('logs', INTERVAL '90 days');
Enter fullscreen mode Exit fullscreen mode

TimescaleDB drops entire chunks (instant). MongoDB needs to run DELETE queries (slow, index updates required).

Continuous aggregates:

CREATE MATERIALIZED VIEW logs_hourly_stats
WITH (timescaledb.continuous) AS
SELECT 
  time_bucket('1 hour', time) AS bucket,
  service,
  level,
  COUNT(*) as count
FROM logs
GROUP BY bucket, service, level;
Enter fullscreen mode Exit fullscreen mode

Dashboard queries hit the pre-computed aggregate (10-50x faster).

2. Query Flexibility

SQL is more powerful than MongoDB's query language. Complex joins, window functions, CTEs, lateral joins — TimescaleDB can do it all.

Example: "Show me errors with their previous 5 logs for context":

WITH context_logs AS (
  SELECT *, 
    LAG(message, 1) OVER (PARTITION BY service ORDER BY time) AS prev_1,
    LAG(message, 2) OVER (PARTITION BY service ORDER BY time) AS prev_2
  FROM logs
  WHERE time > NOW() - INTERVAL '1 day'
)
SELECT * FROM context_logs WHERE level = 'error';
Enter fullscreen mode Exit fullscreen mode

Good luck doing window functions in MongoDB.

3. ACID Guarantees

TimescaleDB inherits PostgreSQL's full ACID transactions. MongoDB improved this, but PostgreSQL's transaction handling is rock-solid.

4. Operational Simplicity

TimescaleDB is PostgreSQL. One extension, one database.

We use TimescaleDB for:

  • User data (relational tables)
  • Log data (hypertables + JSONB)
  • Session storage (regular tables)
  • Job queues (pg_cron, graphile-worker)
  • Metrics (continuous aggregates)

Everything in one database. One backup strategy. One monitoring system.

5. Cost (Cloud Hosting)

Managed PostgreSQL (RDS, Cloud SQL, Azure Database) + TimescaleDB Cloud is significantly cheaper than MongoDB Atlas at scale.

For 500 GB storage + 8 GB RAM:

  • AWS RDS PostgreSQL: ~$150/month
  • MongoDB Atlas M30: ~$500/month

Real-World Production Numbers (Logtide)

Our current deployment (TimescaleDB):

  • 30 million logs stored (growing)
  • 500k logs/day ingestion rate
  • P95 query latency: 50ms
  • Storage: 15 GB (compressed from 150 GB, 10x ratio)
  • Server: 4 vCPU, 8 GB RAM (single node)
  • Retention: 90 days automatic (drop_chunks policy)
  • Compression: Daily (1 day old chunks auto-compress)

TimescaleDB handles this effortlessly. We haven't needed to think about sharding, scaling, or performance tuning beyond the initial setup.

The Decision Matrix

Choose MongoDB if:

  • You need horizontal sharding NOW (multi-region, 100M+ docs/day)
  • Schema is truly unpredictable (no structure at all)
  • You're already a MongoDB shop with expertise

Choose TimescaleDB if:

  • You're storing time-series data (logs, metrics, events)
  • You want operational simplicity (PostgreSQL ecosystem)
  • You need complex queries (SQL > MongoDB query language)
  • Storage cost matters (10x compression vs 3.5x)
  • You value data consistency (ACID transactions)
  • You want automatic retention/compression (set-and-forget policies)

Conclusion

TimescaleDB (PostgreSQL + time-series superpowers) beats MongoDB for JSON logs.

We chose it for Logtide because:

  1. Time-series native (hypertables, compression, retention)
  2. Query power (SQL + JSONB operators)
  3. Operational simplicity (one database, PostgreSQL ecosystem)
  4. Storage efficiency (10x compression ratio)
  5. Cost (managed hosting is cheaper)

MongoDB is great for general-purpose document storage, but TimescaleDB is purpose-built for time-series JSON.

If your data has timestamps and you're querying by time ranges, TimescaleDB is the obvious choice.


Try it yourself:

# Logtide (TimescaleDB + JSONB)
git clone https://github.com/logtide-dev/logtide
cd logtide/docker
docker compose up -d
Enter fullscreen mode Exit fullscreen mode

TimescaleDB is automatically configured in the Docker setup. Check docker-compose.yml for the PostgreSQL + TimescaleDB image configuration.

Full benchmarks scripts: github.com/logtide-dev/logtide/tree/main/benchmarks


What's your experience with TimescaleDB vs MongoDB? I'd love to hear your benchmarks in the comments.

Top comments (4)

Collapse
 
zizaco profile image
Zizaco • Edited

I really enjoy your posts, but this one missed the mark. I found several issues. Here are the most important ones:

1) The Benchmark Setup. If you're storing logs in MongoDB you should start by creating a Clustered Collection, not a standard collection. That alone will crush PostgreSQL / TimescaleDB in every aspect you presented.

2) In your benchmark, MongoDB is enforcing the schema with validator: { $jsonSchema: ... }. Although you stated "MongoDB truly doesn't care about schema," your example contradicts this, debunking the misconception that MongoDB lacks schemas.

Additionally, if throughput is the main concern, schema validation within the database is usually not advisable. That's why, for example, Kafka employs clients to validate schemas before writing, rather than relying on the broker to do so. Moving the $jsonSchema validation to the client would further improve MongoDB performance.

3)

"Time-Series Superpowers
Hypertables, compression, continuous aggregates, retention policies — all automatic.
Example: Delete logs older than 90 days:

MongoDB also has Clustered Collections and Time Series Collections. Both have built-in TTL (expireAfterSeconds field)
MongoDB also has aggregations, window functions, materialized views, lucene search, etc.

4)

"ACID Guarantees TimescaleDB inherits PostgreSQL's full ACID transactions."

MongoDB has provided ACID guarantees for almost 10 years now. Also, MongoDB ACID transactions work seamlessly in sharded/distributed setups natively. In contrast, PostgreSQL loses ACID properties once data is distributed, which also means you need to rely on third-party solutions and/or juggle multiple connections.

5)

"Operational Simplicity: TimescaleDB is PostgreSQL. One extension, one database."

Using a third-party extension instead of a native feature, along with a fragmented ecosystem for horizontal scaling, does not align with my definition of operational simplicity.

6)

"Choose TimescaleDB if:
You need complex queries (SQL > MongoDB query language)"

MongoDB has made significant progress in handling SQL-like queries. It even has a way to run SQL queries that translate them into MongoDB's query language. Since 2017, I haven't seen a SQL query that cannot be expressed in MongoDB's query language in a similar or simpler way.

7)

"For 500 GB storage + 8 GB RAM:

AWS RDS PostgreSQL: ~$150/month
MongoDB Atlas M30: ~$500/month"

You're comparing 3x MongoDB nodes (99.995% uptime SLA)
vs a single AZ PostgreSQL instance (99.5% uptime SLA)

Not even AWS Aurora PostgreSQL would be as highly available as MongoDB, and it would also be more expensive.

MongoDB is also free, and you can set up a single node if you're comfortable with sacrificing some uptime (which I think is the case, given you're using PostgreSQL).

8)

"MongoDB is great for general-purpose document storage, but TimescaleDB is purpose-built for time-series JSON."

This is objectively false. There's a reason why TimescaleDB and InfluxDB do not update their 10-year-old "vs MongoDB" benchmarks. MongoDB has closed the gap over the years and now outperforms them significantly in time series workloads.


If you require high availability (99.995% uptime), reliable long-term storage, future-proofing scalability with sharding and live-resharding, and cost savings, MongoDB would be a better choice for the very use case you presented, Logtide.

Collapse
 
polliog profile image
Polliog • Edited

Thanks for the detailed feedback, I really appreciate you taking the time. You're right on several fronts, and I want to address them honestly.

On the benchmark setup (Clustered Collections): Fair point. I should have used Clustered Collections for a better comparison with TimescaleDB hypertables. Both are optimized for time-series workloads with similar indexing strategies. I'll update this in a next article with this correction all the other correction (Time series, ACID etc will also be covered)

On schema validation: You caught a contradiction in my writing. I used $jsonSchema in the benchmark but then claimed MongoDB "doesn't care about schema" as you said that's inconsistent. Also we keep validation server-side because we accept logs from multiple SDKs (Python, PHP, Kotlin) and want a single source of truth, but that's a design choice, not a MongoDB limitation.

But I still believe TimescaleDB fits our use case better for these reasons:

Our team already runs PostgreSQL for application data. Using TimescaleDB means one database to operate, one backup strategy, one set of skills.
We're processing around 5m logs/day (when i'm writing this) on a single 8GB node. At this scale, both databases perform excellently

TimescaleDB's materialized views that auto-refresh are a killer feature for our dashboard queries. MongoDB has materialized views, but TimescaleDB's implementation is deeply integrated with the chunk architecture (so it's a design limitation at the moment)

But also I want to point out our new feature that aligns with your feedback (currenty experimental but already available in our last stable version). We are developing @logtide/reservoir a pluggable storage layer that supports multiple databases (in this moment clickhouse and timescale, but i will surely consider mongodb)

The idea is: pick the storage engine that matches your scale and existing infrastructure.

Thanks again for keeping me honest, this is the kind of feedback that makes technical writing better.

Collapse
 
zizaco profile image
Zizaco

Thank you for the response! I appreciate the intellectual honesty.

Our team already runs PostgreSQL for application data. Using TimescaleDB means one database to operate, one backup strategy, one set of skills.
We're processing around 5m logs/day (when i'm writing this) on a single 8GB node. At this scale, both databases perform excellently

This is a good reason to go with TimescaleDB. Thank you for the honesty and for sharing some interesting insights (such as the auto-refreshing materialized views).

Looking forward to your next article! : )

Collapse
 
zizaco profile image
Zizaco

Also, the benchmark script link is broken