<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Team Tiger Data</title>
    <description>The latest articles on DEV Community by Team Tiger Data (@tigerdata_dev).</description>
    <link>https://dev.to/tigerdata_dev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2547418%2F1859b44f-d7f7-47c9-9ca2-082bae60b949.png</url>
      <title>DEV Community: Team Tiger Data</title>
      <link>https://dev.to/tigerdata_dev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tigerdata_dev"/>
    <language>en</language>
    <item>
      <title>How TimescaleDB Outperforms ClickHouse and MongoDB for LogTide's Observability Platform</title>
      <dc:creator>Team Tiger Data</dc:creator>
      <pubDate>Wed, 15 Apr 2026 12:24:18 +0000</pubDate>
      <link>https://dev.to/tigerdata_dev/how-timescaledb-outperforms-clickhouse-and-mongodb-for-logtides-observability-platform-29gl</link>
      <guid>https://dev.to/tigerdata_dev/how-timescaledb-outperforms-clickhouse-and-mongodb-for-logtides-observability-platform-29gl</guid>
      <description>&lt;p&gt;Giuseppe “Polliog” Pollio started writing code for LogTide in September 2025. By early 2026, the platform was handling five million logs per day for alpha users, compressing 220GB of production data down to 25GB.&lt;/p&gt;

&lt;h2&gt;
  
  
  LogTide
&lt;/h2&gt;

&lt;p&gt;Most enterprise log management tools are built for enterprises. Datadog and Splunk far exceed small operation budgets. For developers running a self-hosted stack, there is no clear alternative for affordable log observability.&lt;/p&gt;

&lt;p&gt;LogTide addresses this gap as an open-source log management and SIEM platform built specifically for teams who need serious observability without serious hardware. Sigma rule-based detection, structured log search, alerting, and notifications, the same capabilities that make Datadog and Splunk useful, run in two gigabytes of RAM with Logtide.&lt;/p&gt;

&lt;p&gt;"That's because our target is small agencies and home labs," Giuseppe explains. "I wanted to create an ecosystem with low impact on RAM, something you can host on a really old machine."&lt;/p&gt;

&lt;p&gt;LogTide launched its cloud alpha in early 2026, with around 100 companies stress-testing the platform for free. One of them sends five million logs per day.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenge
&lt;/h2&gt;

&lt;p&gt;When Giuseppe set out to build LogTide, he targeted home labs and small businesses who cannot afford enterprise infrastructure, let alone enterprise pricing.&lt;/p&gt;

&lt;p&gt;ELK - Elasticsearch, Logstash, Kibana typically require multiple nodes and significant RAM. Grafana Loki is lighter but still has indexing and query limitations that make full-text log search painful at scale. ClickHouse is fast and compresses well, but is built for analytics clusters, not Raspberry Pis. Datadog and Splunk simply cost too much. &lt;/p&gt;

&lt;p&gt;LogTide needed a reliable database to underpin its OSS log observability that could scale to production without split architecture or excessive budget spend. &lt;/p&gt;

&lt;h2&gt;
  
  
  Why TimescaleDB
&lt;/h2&gt;

&lt;p&gt;Giuseppe found TimescaleDB while searching for Postgres with additional support for high ingest of event data.&lt;/p&gt;

&lt;p&gt;"There are lots of alternatives, but most are too resource-intensive," Giuseppe explains. "TimescaleDB was a perfect choice."&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;There are lots of alternatives, but most are too resource-intensive. TimescaleDB was a perfect choice. - Giuseppe Pollio, Founder, LogTide&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The appeal was both technical and practical. TimescaleDB is Postgres. It uses the same wire protocol, the same SQL syntax, the same tooling, and the same extension ecosystem. For a solo developer building a platform that has to run on minimal hardware, that meant no operational surprises, no vendor-specific APIs, and no migration work if users already had Postgres running. &lt;/p&gt;

&lt;p&gt;“If Postgres can run on your machine, TimescaleDB can run,” notes Giuseppe,”and you can deploy LogTide for inexpensive observability at scale.” &lt;/p&gt;

&lt;h2&gt;
  
  
  The LogTide Stack
&lt;/h2&gt;

&lt;p&gt;LogTide's architecture is simple by design. “Simple architecture means it's easier to manage, easier to maintain,” said Giuseppe.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Simple architecture means it’s easier to manage, easier to maintain. - Giuseppe Pollio&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Logs enter the system from one of three client sources: OpenTelemetry-instrumented services, Fluent Bit agents, or one of LogTide's native SDKs. All three routes converge on a single ingest endpoint. The endpoint handles format variations including OTEL format and a handful of special-case adapters so the ingestion path stays unified regardless of how the log was generated.&lt;/p&gt;

&lt;p&gt;From the ingest endpoint, log payloads enter a job queue backed by Redis. Redis is optional: if it is not available, the ingestion path routes directly to the worker. The worker is where the platform earns its SIEM designation. It evaluates Sigma rules against incoming logs, generates alerts, dispatches notifications, and runs the full analysis pipeline. &lt;/p&gt;

&lt;p&gt;After processing, logs pass through what Giuseppe calls the LogTide Reservoir: a storage abstraction layer that keeps the backend pluggable. In practice, only one backend is truly necessary.&lt;/p&gt;

&lt;p&gt;"TimescaleDB is our unique persistent database," Giuseppe explains. "All the aggregation that populates our dashboards is powered by TimescaleDB."&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;TimescaleDB is our unique persistent database. All the aggregation that populates our dashboards is powered by TimescaleDB. - Giuseppe Pollio&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Inside TimescaleDB, LogTide maintains three hypertable families: raw logs, distributed traces (spans), and detection events. Retention policies run automatically with no manual intervention or cron jobs. Continuous aggregates sit on top of the raw log hypertable and are what make the platform fast at scale.&lt;/p&gt;

&lt;p&gt;From &lt;code&gt;packages/backend/src/modules/retention/service.ts&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="cm"&gt;/**
 * Execute retention cleanup for all organizations.
 *
 * Strategy (scales with number of distinct retention values, not orgs):
 * 1. drop_chunks for max retention — instant, drops entire files
 * 2. Group orgs by retention_days, collect all project_ids per group
 * 3. For each group with retention &amp;lt; max: batch-delete their logs
 */&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;executeRetentionForAllOrganizations&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;RetentionExecutionSummary&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;logging&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;isInternalLoggingEnabled&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// Get all organizations with their retention + projects&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;organizations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;selectFrom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;organizations&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;retention_days&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;orgProjects&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;selectFrom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;projects&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;organization_id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// Build org -&amp;gt; projectIds map&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;projectsByOrg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;orgProjects&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;projectsByOrg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;organization_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="nx"&gt;list&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;projectsByOrg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;organization_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;list&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Find max retention (used for drop_chunks)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;maxRetention&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="nx"&gt;organizations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;o&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;retention_days&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;maxCutoff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;maxRetention&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Step 1: drop_chunks older than max retention (TimescaleDB only — instant, no decompression)&lt;/span&gt;
  &lt;span class="c1"&gt;// For ClickHouse, TTL policies handle this natively or deleteByTimeRange in step 3&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;chunksDropped&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;reservoir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getEngineType&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;timescale&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dropResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="s2"&gt;`
        SELECT drop_chunks('logs', older_than =&amp;gt; &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;maxCutoff&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;::timestamptz)
      `&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;chunksDropped&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;dropResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="cm"&gt;/* v8 ignore next 6 -- telemetry, disabled in tests */&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunksDropped&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;hub&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;captureLog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;info&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`Dropped &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;chunksDropped&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; chunks older than &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;maxRetention&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; days`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;maxRetentionDays&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;maxRetention&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;cutoffDate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;maxCutoff&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
          &lt;span class="nx"&gt;chunksDropped&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// drop_chunks may fail if no chunks to drop — that's fine&lt;/span&gt;
      &lt;span class="cm"&gt;/* v8 ignore next 4 -- telemetry, disabled in tests */&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nb"&gt;Error&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;hub&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;captureLog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;debug&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`drop_chunks: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Step 2: Group orgs by retention_days (only those with retention &amp;lt; max need per-row deletes)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;retentionGroups&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;orgs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;organizations&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;projectIds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;org&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;organizations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;org&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;retention_days&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;maxRetention&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// already handled by drop_chunks&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;group&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;retentionGroups&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;org&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;retention_days&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;orgs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="na"&gt;projectIds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="nx"&gt;group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orgs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;org&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;orgProjectIds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;projectsByOrg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;org&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="nx"&gt;group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;projectIds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="nx"&gt;orgProjectIds&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;retentionGroups&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;org&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;retention_days&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;group&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Step 3: Batch-delete per retention group&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;RetentionExecutionResult&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;totalDeleted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;failedCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;retentionDays&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;group&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;retentionGroups&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;projectIds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;org&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orgs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
          &lt;span class="na"&gt;organizationId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;org&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;organizationName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;org&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="nx"&gt;retentionDays&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;logsDeleted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;executionTimeMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;groupStart&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cutoffDate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;retentionDays&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;oldestResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;reservoir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;projectId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;projectIds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cutoffDate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;sortOrder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;asc&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;

      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;oldestResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;org&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orgs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="na"&gt;organizationId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;org&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;organizationName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;org&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nx"&gt;retentionDays&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;logsDeleted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;executionTimeMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;groupStart&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;deleted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;batchDeleteLogs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nx"&gt;group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;projectIds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;cutoffDate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;oldestResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;totalDeleted&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;deleted&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;failedCount&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orgs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;"The aggregates are necessary," said Giuseppe. "If you have five million, ten million logs every day, and you need to see how many logs you received every hour, you can't run that query on 10 million logs. The aggregates give you query results in milliseconds instead of 30 or 40 seconds."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Continuous aggregate definition&lt;/strong&gt;, from &lt;code&gt;packages/backend/migrations/004_performance_optimization.sql&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZED&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;logs_hourly_stats&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timescaledb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;continuous&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;time_bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'1 hour'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;log_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="k"&gt;NO&lt;/span&gt; &lt;span class="k"&gt;DATA&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Refreshes automatically every hour&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;add_continuous_aggregate_policy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'logs_hourly_stats'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;start_offset&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'3 hours'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;end_offset&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 hour'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;schedule_interval&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 hour'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;if_not_exists&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;TRUE&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;idx_logs_hourly_stats_project_bucket&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;logs_hourly_stats&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Hybrid query at runtime&lt;/strong&gt;, from &lt;code&gt;packages/backend/src/modules/dashboard/service.ts&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;todayAggregateStats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;recentTotal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;recentErrors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;recentServices&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;yesterdayAggregateStats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;prevHourCount&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="c1"&gt;// Today's historical stats from aggregate (today start to 1 hour ago)&lt;/span&gt;
  &lt;span class="nx"&gt;db&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;selectFrom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;logs_hourly_stats&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
      &lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;`COALESCE(SUM(log_count), 0)`&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;total&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;`COALESCE(SUM(log_count) FILTER (WHERE level IN ('error', 'critical')), 0)`&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;errors&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;`COUNT(DISTINCT service)`&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;services&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;project_id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;in&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;projectIds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bucket&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;todayStart&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bucket&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;&amp;lt;&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;lastHourStart&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;executeTakeFirst&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;

  &lt;span class="c1"&gt;// Recent stats from reservoir (last hour)&lt;/span&gt;
  &lt;span class="nx"&gt;reservoir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;projectId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;projectIds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lastHourStart&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="nx"&gt;reservoir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;projectId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;projectIds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lastHourStart&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="na"&gt;level&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;critical&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="nx"&gt;reservoir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;distinct&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;field&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;service&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;projectId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;projectIds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lastHourStart&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;

  &lt;span class="c1"&gt;// Yesterday's stats from aggregate&lt;/span&gt;
  &lt;span class="nx"&gt;db&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;selectFrom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;logs_hourly_stats&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
      &lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;`COALESCE(SUM(log_count), 0)`&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;total&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;`COALESCE(SUM(log_count) FILTER (WHERE level IN ('error', 'critical')), 0)`&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;errors&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;`COUNT(DISTINCT service)`&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;services&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;project_id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;in&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;projectIds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bucket&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;yesterdayStart&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bucket&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;&amp;lt;&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;todayStart&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;executeTakeFirst&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;

  &lt;span class="c1"&gt;// Previous hour from reservoir (for throughput trend)&lt;/span&gt;
  &lt;span class="nx"&gt;reservoir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;projectId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;projectIds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prevHourStart&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lastHourStart&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgarohwfxrg7oerz5xvo4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgarohwfxrg7oerz5xvo4.png" alt="LogTide's architecture. Logs flow from client SDKs and agents through a single ingest endpoint, into a processing worker, and into TimescaleDB hypertables via the LogTide Reservoir storage abstraction." width="800" height="485"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;LogTide's architecture. Logs flow from client SDKs and agents through a single ingest endpoint, into a processing worker, and into TimescaleDB hypertables via the LogTide Reservoir storage abstraction.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What We've Seen
&lt;/h2&gt;
&lt;h3&gt;
  
  
  220GB Down to 25GB
&lt;/h3&gt;

&lt;p&gt;In production, LogTide's TimescaleDB deployment compressed 220GB of raw log data, 135GB of row data plus 85GB of indexes, down to 25GB. That is an 88.6% reduction, achieved using TimescaleDB's native columnar compression with a segmentby configuration on project_id and log level, ordered by timestamp descending. Chunks older than seven days compress automatically.&lt;/p&gt;

&lt;p&gt;From &lt;code&gt;packages/backend/migrations/001_initial_schema.sql&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Enable compression on logs hypertable&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;timescaledb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compress&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;timescaledb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compress_segmentby&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'project_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;timescaledb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compress_orderby&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'time DESC'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Add compression policy for logs (compress chunks older than 7 days)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;add_compression_policy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'logs'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'7 days'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;if_not_exists&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;TRUE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Global retention safety net&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;add_retention_policy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'logs'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'90 days'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;if_not_exists&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;TRUE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Query performance did not degrade. Time-range filtering got 33% faster after compression. Aggregations got 41% faster. Only30 full-text search slowed slightly, by about 12%, because columnar storage requires scanning additional columns to reconstruct text fields. For a log management platform where engineers are far more likely to query a time window than to search a raw string, the tradeoff strongly favors compression.&lt;/p&gt;

&lt;p&gt;In practice, 30 million logs stored in 15GB on a single 4-vCPU, 8GB RAM node, with a P95 query latency of 50ms. Learn more in Giuseppe’s &lt;a href="https://dev.to/polliog/timescaledb-compression-from-150gb-to-15gb-90-reduction-real-production-data-bnj"&gt;&lt;u&gt;dev.to post on TimescaleDB compression&lt;/u&gt;&lt;/a&gt;. &lt;/p&gt;

&lt;h3&gt;
  
  
  TimescaleDB Bested MongoDB and ClickHouse in Head-to-Head Performance Benchmarks
&lt;/h3&gt;

&lt;p&gt;Giuseppe built an open benchmark suite and ran it across 1K to 1M records, as outlined in his &lt;a href="https://builder.aws.com/content/3Aoryr85VEVzFKrFjDmzXpwRLkU/i-benchmarked-timescaledb-vs-clickhouse-vs-mongodb-for-observability-data" rel="noopener noreferrer"&gt;&lt;u&gt;AWS Builder Center article benchmarking ClickHouse and MongoDB vs TimescaleDB&lt;/u&gt;&lt;/a&gt;. The ingestion story is straightforward: at batch sizes typical of real-world observability (100 events per call), TimescaleDB handles 14,200 inserts per second. ClickHouse handles 250 at the same batch size. The gap exists because ClickHouse buffers small writes and flushes on a 400ms timer, the right design for bulk analytics, the wrong design when a dozen microservices are logging in real time.&lt;/p&gt;

&lt;p&gt;The query results are the main story. At 100,000 log records, TimescaleDB answers a filtered service query in 0.47ms. MongoDB answers the same query in 304ms, a 650x difference. Under 50 concurrent queries, TimescaleDB holds at 6.2ms whether the dataset is 1,000 or 1,000,000 records. The mechanism is hypertable partitioning: queries filter by time range and service, TimescaleDB routes them to the active chunk instead of scanning the full table, and continuous aggregates make count and dashboard queries nearly free because the work is already done at write time.&lt;/p&gt;

&lt;h3&gt;
  
  
  A 2GB RAM Requirement Keeps Operations Lean
&lt;/h3&gt;

&lt;p&gt;The most important number is not the compression ratio or the write throughput. It is the 2GB RAM figure that defines where LogTide can actually run.&lt;/p&gt;

&lt;p&gt;"If you have log management that can work with 2GB of RAM, it's really magic," Giuseppe says. "Because you can't do that with Datadog or Splunk or the other self-hosted programs and containers."&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;If you have log management that can work with 2GB of RAM, it's really magic.  You can't do that with Datadog or Splunk or the other self-hosted programs and containers. - Giuseppe Pollio&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That 2GB ceiling is what makes LogTide viable for home labs running NAS, small businesses on shared hosting, or a developer who wants to know when their Raspberry Pi's services throw errors. The entire LogTide platform, including API, worker, dashboard, and TimescaleDB storage, runs on the same hardware that already runs Postgres. &lt;/p&gt;

&lt;h2&gt;
  
  
  Looking Ahead
&lt;/h2&gt;

&lt;p&gt;The LogTide Cloud Platform alpha prototype is now open to trial users.  Meanwhile, LogTide’s open-source project is growing fast. Hundreds of GitHub stars and 1k+ clones per day signal a developer community that has found the project and is actively building with it. The next phase is expanding SDK coverage and continuing to stress-test the storage layer. TimescaleDB runs anywhere Postgres runs. The goal is to make sure LogTide does too.&lt;/p&gt;

</description>
      <category>monitoring</category>
      <category>timescaledb</category>
      <category>postgres</category>
      <category>database</category>
    </item>
    <item>
      <title>ClickHouse Is Fast. Your Pipeline Isn't.</title>
      <dc:creator>Team Tiger Data</dc:creator>
      <pubDate>Tue, 14 Apr 2026 18:15:43 +0000</pubDate>
      <link>https://dev.to/tigerdata_dev/clickhouse-is-fast-your-pipeline-isnt-27am</link>
      <guid>https://dev.to/tigerdata_dev/clickhouse-is-fast-your-pipeline-isnt-27am</guid>
      <description>&lt;p&gt;ClickHouse is fast. The benchmarks aren't lying. If you've &lt;a href="https://www.tigerdata.com/blog/what-is-clickhouse-how-does-it-compare-to-postgresql-and-timescaledb-and-how-does-it-perform-for-time-series-data" rel="noopener noreferrer"&gt;&lt;u&gt;run a comparison against vanilla Postgres&lt;/u&gt;&lt;/a&gt; on the same dataset, the results aren't close. ClickHouse wins by 10x-100x on typical analytical patterns.&lt;/p&gt;

&lt;p&gt;That benchmark is also only measuring one dimension of your decision. It tells you how fast queries run on static data. It doesn't tell you anything about data freshness, transactional correctness, pipeline reliability, or the operational cost of keeping two systems synchronized.&lt;/p&gt;

&lt;p&gt;This post isn't about whether ClickHouse is fast. It's about the full cost of getting your data there, and keeping it correct once it arrives.&lt;/p&gt;

&lt;h2&gt;
  
  
  What ClickHouse is actually good at
&lt;/h2&gt;

&lt;p&gt;ClickHouse is a columnar OLAP database built for analytical scan performance. It's great at aggregations over large datasets, column-oriented scans that skip irrelevant data, compression that keeps big datasets resident in memory, and query parallelism across cores.&lt;/p&gt;

&lt;p&gt;For batch analytics on historical data where "fresh" means "reflects the last ETL run," ClickHouse is a solid choice. Data warehousing, offline reporting, retrospective analysis. These are real ClickHouse strengths and I'm not going to pretend otherwise.&lt;/p&gt;

&lt;p&gt;The question you need to answer for yourself is whether your use case is actually batch analytics, or whether it's operational analytics that needs to be fresh and correct.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pipeline tax
&lt;/h2&gt;

&lt;p&gt;Here's the thing about ClickHouse: your data doesn't teleport there from Postgres. You need a pipeline.&lt;/p&gt;

&lt;p&gt;You've got options. &lt;a href="https://www.tigerdata.com/blog/implementation-of-change-data-capture-using-timescaledb-for-shoplogix-industrial-monitoring-services" rel="noopener noreferrer"&gt;&lt;u&gt;CDC via Debezium&lt;/u&gt;&lt;/a&gt;, scheduled ETL jobs, Kafka-based streaming, application-level dual-writes. Each one introduces costs that won't show up in any benchmark you've read.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lag.&lt;/strong&gt; There's always a gap between a row being committed in Postgres and being queryable in ClickHouse. CDC pipelines typically add 5-30 seconds. Batch ETL adds minutes to hours. Dual-writes add milliseconds, but now you've got a consistency problem: when one write succeeds and the other fails, your two systems are telling different stories about what's true.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Drift.&lt;/strong&gt; Every schema change in Postgres needs to propagate to ClickHouse. Column additions, type changes, table restructuring: all of it requires pipeline updates. Every migration is now a coordinated change across two systems. Good luck.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure modes.&lt;/strong&gt; Pipelines break. Kafka consumers fall behind. CDC slots get dropped. Backfills happen after outages. Each of these failure modes needs its own monitoring, alerting, and runbook. All of this overhead exists purely because your data lives in two places.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Correctness gaps.&lt;/strong&gt; ClickHouse uses eventual consistency. Rows arrive out of order. Late-arriving data might not appear in already-computed aggregations. Deduplication requires explicit schema decisions (ReplacingMergeTree and friends). When a dashboard query runs during a pipeline hiccup, the results are wrong, with no transaction isolation to tell you that.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you actually lose without ACID
&lt;/h2&gt;

&lt;p&gt;Let’s be specific about the &lt;a href="https://www.tigerdata.com/learn/understanding-acid-compliance" rel="noopener noreferrer"&gt;&lt;u&gt;ACID&lt;/u&gt;&lt;/a&gt; trade-off, because it matters more in practice than it sounds in theory.&lt;/p&gt;

&lt;p&gt;ClickHouse doesn't support multi-row transactions. A batch INSERT either succeeds or fails as a batch, but you can't roll back a logical transaction that spans multiple inserts across tables. If your analytics join orders, payments, and inventory, the lack of transactional consistency means your results can reflect different points in time. (Whether that matters depends on your use case, but you should know it before you commit to the architecture.)&lt;/p&gt;

&lt;p&gt;Updates work differently than you expect. ClickHouse mutations are background operations. When source data gets corrected in Postgres (a sensor recalibration, a price adjustment, a retroactive fix), getting that correction into ClickHouse means re-ingesting the affected data or running an async mutation that finishes whenever the system gets around to it. In Postgres, a corrected value is immediately correct. In ClickHouse, it's eventually correct.&lt;/p&gt;

&lt;p&gt;There are no foreign keys, constraints, or triggers. Data integrity is your pipeline's problem now. If bad data gets through, ClickHouse will store it faithfully. Garbage in, garbage queryable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real cost: operating two systems
&lt;/h2&gt;

&lt;p&gt;Two databases means two sets of &lt;em&gt;everything&lt;/em&gt;. Monitoring dashboards, alerting rules, and backup strategies. Capacity planning, version upgrade procedures, and security patching schedules.&lt;/p&gt;

&lt;p&gt;You also need two mental models. When a dashboard shows unexpected numbers, the engineer debugging it has to figure out: is the data wrong in Postgres, or is it right in Postgres but stale in ClickHouse? Is the pipeline behind? Did a schema change not propagate? Is deduplication working correctly? So many questions.&lt;/p&gt;

&lt;p&gt;And the pipeline itself is a third system with its own maintenance burden. Kafka clusters, CDC connectors, ETL orchestrators. None of these are zero-maintenance infrastructure.&lt;/p&gt;

&lt;p&gt;Total cost of ownership isn't "Postgres cost + ClickHouse cost." It's Postgres cost plus ClickHouse cost plus pipeline cost plus coordination overhead plus the debugging time you spend every time the two systems disagree. That last one is harder to budget for.&lt;/p&gt;

&lt;h2&gt;
  
  
  When the split is actually worth it
&lt;/h2&gt;

&lt;p&gt;Here's a useful test before we go further. Have your stakeholders ever asked "why is the dashboard showing old data?" If yes, you have a freshness requirement. If the answer to that question has ever been "because the pipeline was behind," then a faster query engine isn't going to solve your problem.&lt;/p&gt;

&lt;p&gt;I want to be honest here, because this is where a lot of competitive posts fall apart. There are legitimate reasons to run ClickHouse alongside Postgres.&lt;/p&gt;

&lt;p&gt;The split makes sense if your analytics are batch-oriented and hours of lag is acceptable. If your queries are read-only historical scans and you already have Kafka running for other reasons. If analytical query volume would overwhelm your operational database.&lt;/p&gt;

&lt;p&gt;It doesn't make sense if your stakeholders want to see current data. If a correction in Postgres needs to show up immediately in your dashboards. If the only reason you'd build a pipeline is to feed ClickHouse. Or if your team is small enough that the operational burden of running three systems isn't worth the query speed gain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The alternative
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.tigerdata.com/timescaledb" rel="noopener noreferrer"&gt;&lt;u&gt;TimescaleDB&lt;/u&gt;&lt;/a&gt; extends Postgres so analytical queries perform well on the same data, in the same database, with the same transactional guarantees.&lt;/p&gt;

&lt;p&gt;Hypertables with&lt;a href="https://www.tigerdata.com/blog/building-columnar-compression-in-a-row-oriented-database" rel="noopener noreferrer"&gt;&lt;u&gt;columnar compression&lt;/u&gt;&lt;/a&gt; give you analytical scan performance on time-series data without moving it anywhere.&lt;a href="https://www.tigerdata.com/learn/olap-workloads-on-postgresql-a-guide" rel="noopener noreferrer"&gt;&lt;u&gt;Continuous aggregates&lt;/u&gt;&lt;/a&gt; pre-compute common rollups incrementally, so dashboards stay fast without batch jobs. FlightAware dropped a 6.4-second query to 30 milliseconds using continuous aggregates alone, without changing their data model or moving to a separate system. Real-time aggregates layer the newest raw data on top of those precomputed rollups in a single query, so results stay current without waiting for a refresh cycle.&lt;/p&gt;

&lt;p&gt;Your data is always fresh because nothing moved it. Corrections are immediate because there's no second system to propagate them to. And there's no pipeline paging you at 3am, because there's no pipeline.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"TimescaleDB strikes a phenomenal balance between the simplicity of storing your analytical data under the same roof as your configuration data, while also gaining much of the impressive performance of a specialized OLAP system." — Robert Cepa, Senior Software Engineer, Cloudflare (&lt;a href="https://www.tigerdata.com/blog/how-timescaledb-helped-us-scale-analytics-and-reporting" rel="noopener noreferrer"&gt;&lt;u&gt;How TimescaleDB helped Cloudflare scale analytics — and why they chose it over ClickHouse&lt;/u&gt;&lt;/a&gt;)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Worth being straight with you: for pure OLAP workloads on petabyte-scale historical data, a dedicated columnar store like ClickHouse will outperform TimescaleDB on raw scan throughput. That gap is real. For batch analytics on historical data where freshness and correctness aren't the point, ClickHouse is a reasonable choice.&lt;/p&gt;

&lt;p&gt;But for most teams building operational analytics on live data, the architectural cost of moving that data doesn't justify the query speed gain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing the benchmark doesn't tell you
&lt;/h2&gt;

&lt;p&gt;The fastest query engine in the world doesn't help when the data it's querying is stale. And "the pipeline was behind" is a terrible answer to give your stakeholders at 2am.&lt;/p&gt;

&lt;p&gt;ClickHouse is fast. The benchmarks are real. The trade-off is also real: pipelines, lag, drift, eventual consistency, and a second system to operate forever.&lt;/p&gt;

&lt;p&gt;If your analytics can tolerate staleness and your team has the infrastructure to keep two systems in sync, ClickHouse is worth serious consideration. If your analytics need to be fresh, correct, and transactional, the architecture that gets you there matters more than the query speed of any single component.&lt;/p&gt;

&lt;p&gt;The benchmark tells you one thing. The architecture is what you'll live with.&lt;/p&gt;

&lt;p&gt;If you want to see what analytics on your live Postgres data actually looks like, &lt;a href="https://console.cloud.tigerdata.com/signup" rel="noopener noreferrer"&gt;&lt;u&gt;start a free Tiger Cloud database&lt;/u&gt;&lt;/a&gt;. Your existing schema works as-is. No pipeline required.&lt;/p&gt;

</description>
      <category>performance</category>
      <category>webdev</category>
      <category>clickhouse</category>
      <category>database</category>
    </item>
    <item>
      <title>Document Databases: Be Honest</title>
      <dc:creator>Team Tiger Data</dc:creator>
      <pubDate>Wed, 01 Apr 2026 17:22:30 +0000</pubDate>
      <link>https://dev.to/tigerdata_dev/document-databases-be-honest-2l0h</link>
      <guid>https://dev.to/tigerdata_dev/document-databases-be-honest-2l0h</guid>
      <description>&lt;p&gt;MongoDB gets a bad reputation in certain engineering circles that it doesn't entirely deserve. It ships fast. Schema flexibility is real. The developer experience for document-shaped data is good. A lot of teams made a reasonable call when they chose it.&lt;/p&gt;

&lt;p&gt;But there's a version of this story that ends badly, and it follows a recognizable pattern. The team picks MongoDB for a new system. The system works. Then the data starts looking less like documents and more like a stream of timestamped events. Queries start filtering by time range. Write volume climbs. Performance degrades in ways that feel familiar if you've read about this problem, and deeply confusing if you haven't.&lt;/p&gt;

&lt;p&gt;This post isn't here to relitigate the MongoDB decision. It's here to help you figure out whether the pain you're feeling is a MongoDB problem, a document database problem, or a workload problem that would follow you to Postgres.&lt;/p&gt;

&lt;p&gt;The answer matters because the fix is different in each case.&lt;/p&gt;

&lt;h2&gt;
  
  
  What MongoDB is actually good at
&lt;/h2&gt;

&lt;p&gt;Flexible schema for variable data that's actually variable. Product catalogs where every SKU has different attributes. User profiles where fields vary by account type. Content management where article structure differs by category. These are real document shapes, and MongoDB handles them without the ceremony Postgres requires.&lt;/p&gt;

&lt;p&gt;Rapid iteration without migration overhead. Early-stage products change their data model constantly. In Postgres, every schema change is an &lt;code&gt;ALTER TABLE&lt;/code&gt;. In MongoDB, you just write different fields. For teams that are still figuring out the shape of their data, this is a real advantage.&lt;/p&gt;

&lt;p&gt;Nested and hierarchical data. Some data is naturally a tree. A purchase order with line items with sub-components. A configuration object with nested sections. Postgres can model this with JSONB, but MongoDB's native document model fits it more naturally and queries it more cleanly.&lt;/p&gt;

&lt;p&gt;Horizontal scaling for document reads. MongoDB's sharding model was designed for document workloads. For read-heavy document access at scale, it's a mature and well-understood architecture.&lt;/p&gt;

&lt;p&gt;These aren't consolation prizes. They're real reasons MongoDB is the right choice for a lot of workloads.&lt;/p&gt;

&lt;p&gt;The trouble starts when the data changes shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  What time-series data actually looks like
&lt;/h2&gt;

&lt;p&gt;Time-series data has a specific shape, and it's not a document shape. Every row is a measurement. It has a timestamp, a source identifier, and a value or set of values. The schema doesn't vary between rows. There's nothing hierarchical about it. The document model isn't adding anything.&lt;/p&gt;

&lt;p&gt;What time-series data has instead: enormous volume, strict ordering requirements, queries that almost always filter by time range, and retention policies that drop entire time windows at once.&lt;/p&gt;

&lt;p&gt;A wind turbine sensor reporting every five seconds doesn't produce documents. It produces a flat stream of readings: timestamp, sensor ID, RPM, temperature, vibration. A financial trade feed isn't a document store. It's a sequence of immutable events. An APM platform collecting metrics from a distributed system is generating hundreds of thousands of measurements per second, all with the same shape.&lt;/p&gt;

&lt;p&gt;The test is simple. Look at your most-written collection. Does each document have a different structure? Or does every document look essentially the same, with a timestamp and some measurements?&lt;/p&gt;

&lt;p&gt;If it's the latter, you're storing time-series data in a document database, and the document model is providing zero value while the storage engine works against you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where MongoDB struggles with this workload
&lt;/h2&gt;

&lt;p&gt;WiredTiger (MongoDB's default storage engine) uses a B-tree structure optimized for a workload that includes updates to existing documents. For high-frequency append-only writes, it faces a fundamental mismatch. Consider a single sensor reading: one document insert triggers a write to the primary collection, a write to the oplog, and a separate B-tree update for every index on that collection. Three indexes means five writes for one data point. At 10,000 inserts per second, that's 50,000 storage operations per second before you've run a single query. The engine was designed for mixed read-write workloads with in-place updates, not an endless append stream where no document is ever modified after creation.&lt;/p&gt;

&lt;p&gt;MongoDB has no native time-based partitioning. Postgres has declarative range partitioning. &lt;a href="https://www.tigerdata.com/blog/postgres-optimization-treadmill" rel="noopener noreferrer"&gt;&lt;u&gt;TimescaleDB automates it entirely with hypertables&lt;/u&gt;&lt;/a&gt;. MongoDB has no equivalent primitive. Teams end up implementing time-based collection bucketing manually: separate collections per day or week, application-level routing logic, custom cleanup scripts. It works, but it's the same operational burden as manual Postgres partitioning, without the tooling ecosystem that exists on the Postgres side.&lt;/p&gt;

&lt;p&gt;MongoDB's aggregation pipeline is expressive. But for time-series workloads, the queries that matter are time-range aggregations: hourly averages, daily maximums, week-over-week comparisons. These queries scan large volumes of documents and aggregate across fields. Without columnar storage and purpose-built time-series compression, performance degrades with data volume in the same way it does in vanilla Postgres.&lt;/p&gt;

&lt;p&gt;MongoDB did add a native time-series collection type in 5.0. It's a real improvement for simple append-only use cases. But it doesn't support secondary indexes the same way regular collections do, restricts certain aggregation stages and update operations, and is still relatively new compared to the Postgres ecosystem. Worth knowing about. Not a full answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why moving to vanilla Postgres isn't automatically the fix
&lt;/h2&gt;

&lt;p&gt;This is the section most competitive content skips entirely. If you're evaluating a migration, you deserve the full picture.&lt;/p&gt;

&lt;p&gt;If the workload is continuous high-frequency time-series ingestion with long retention and operational query requirements, vanilla Postgres has its own version of this problem. &lt;a href="https://www.tigerdata.com/blog/postgres-optimization-treadmill" rel="noopener noreferrer"&gt;&lt;u&gt;The MVCC overhead, write amplification, autovacuum contention, and index maintenance costs that create the Optimization Treadmill&lt;/u&gt;&lt;/a&gt; exist in Postgres too. The storage model is different from MongoDB's, but the outcome at scale is the same: performance degrades with data volume, maintenance overhead accumulates, and each optimization cycle buys time without changing the trajectory.&lt;/p&gt;

&lt;p&gt;Moving from MongoDB to vanilla Postgres solves the schema flexibility problem (you probably don't need it for this workload anyway). You get a mature partitioning ecosystem, a better query planner, and a richer extension ecosystem. These are real improvements.&lt;/p&gt;

&lt;p&gt;It doesn't solve the core time-series storage problem, because that problem lives in the storage model, not the database brand.&lt;/p&gt;

&lt;p&gt;The question isn't MongoDB vs. Postgres. It's document store vs. purpose-built time-series storage. That's the actual axis the decision should sit on.&lt;/p&gt;

&lt;h2&gt;
  
  
  The decision framework
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Your data is actually documents.&lt;/strong&gt; Variable schema, nested structures, hierarchical relationships, read-heavy access patterns. MongoDB is the right tool. The pain you're feeling is probably a schema design or indexing problem, not a fundamental architectural mismatch. Fix the schema.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your data is time-series but volume is modest.&lt;/strong&gt; Sub-10K inserts per second, retention under 90 days, no hard operational latency requirements on the full retention window. Vanilla Postgres with good partitioning and indexing handles this fine. The Optimization Treadmill exists, but the ceiling is far enough away that standard tuning keeps you ahead of it. Move to Postgres, implement partitioning early, and&lt;a href="https://www.tigerdata.com/blog/six-signs-postgres-tuning-wont-fix-performance-problems" rel="noopener noreferrer"&gt;&lt;u&gt;monitor the warning signs&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your data is time-series at sustained high volume.&lt;/strong&gt; Continuous ingestion, long retention, operational query requirements, growing data volume. This is the workload that breaks both MongoDB and vanilla Postgres through the same class of mechanisms. Purpose-built time-series storage on Postgres (same SQL, same wire protocol, same tooling) is the right answer.&lt;a href="https://www.tigerdata.com/complete-guide-migrating-from-mongodb-to-tiger-data-step-by-step" rel="noopener noreferrer"&gt;&lt;u&gt;Migration from MongoDB to TimescaleDB follows a well-documented path&lt;/u&gt;&lt;/a&gt;: you keep everything Postgres-compatible and gain the storage architecture that matches the workload.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do next
&lt;/h2&gt;

&lt;p&gt;MongoDB didn't fail you if you're reading this. Your workload evolved past what document storage was designed for. That's a different thing.&lt;/p&gt;

&lt;p&gt;Most database choices are right at the time they're made and wrong eighteen months later when the system looks nothing like it did at launch. Sensor data that started as a feature became the core product. The document store that handled early prototyping became the production system for a time-series pipeline.&lt;/p&gt;

&lt;p&gt;The question now is whether the fix is tuning, migration, or architecture. The framework above gives you a clear read on which one applies. If it's architecture, the good news is that moving from MongoDB to a Postgres-compatible time-series database is less disruptive than it sounds. Your application SQL stays the same. Your tooling stays the same. The storage engine underneath is the thing that changes.&lt;/p&gt;

&lt;p&gt;That's the right scope for the change. Not the whole stack. Just the part that was always wrong for this workload.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.tigerdata.com/blog/postgres-optimization-treadmill" rel="noopener noreferrer"&gt;&lt;u&gt;Read the full technical breakdown of why vanilla Postgres hits these limits&lt;/u&gt;&lt;/a&gt;, or&lt;a href="https://console.cloud.tigerdata.com/signup" rel="noopener noreferrer"&gt;&lt;u&gt;start a Tiger Cloud trial&lt;/u&gt;&lt;/a&gt; and see how TimescaleDB handles your workload directly.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
      <category>mongodb</category>
    </item>
    <item>
      <title>pg_textsearch 1.0: How We Built a BM25 Search Engine on Postgres Pages</title>
      <dc:creator>Team Tiger Data</dc:creator>
      <pubDate>Tue, 31 Mar 2026 13:09:03 +0000</pubDate>
      <link>https://dev.to/tigerdata/pgtextsearch-10-how-we-built-a-bm25-search-engine-on-postgres-pages-42cc</link>
      <guid>https://dev.to/tigerdata/pgtextsearch-10-how-we-built-a-bm25-search-engine-on-postgres-pages-42cc</guid>
      <description>&lt;p&gt;&lt;em&gt;Design, implementation, and benchmarks of a native BM25 index for Postgres. Now generally available to all&lt;/em&gt; &lt;a href="https://www.tigerdata.com/cloud" rel="noopener noreferrer"&gt;&lt;em&gt;&lt;u&gt;Tiger Cloud&lt;/u&gt;&lt;/em&gt;&lt;/a&gt; &lt;em&gt;customers and freely available via open source.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you have used Postgres's built-in ts_rank for full-text search at any meaningful scale, you already know the limitations. Ranking quality degrades as your corpus grows. There is no inverse document frequency, so common words carry the same weight as rare ones. There is no term frequency saturation, so a document that mentions "database" 50 times outranks one that mentions it once. There is no efficient top-k path: scoring requires touching every matching row.&lt;/p&gt;

&lt;p&gt;Most teams work around this by bolting on Elasticsearch or Typesense as a sidecar. That works, but now you are syncing data between two systems, operating two clusters, and debugging consistency issues when they diverge.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.tigerdata.com/docs/use-timescale/latest/extensions/pg-textsearch" rel="noopener noreferrer"&gt;&lt;u&gt;pg_textsearch&lt;/u&gt;&lt;/a&gt; takes a different approach: real BM25 scoring, built from scratch in C on top of Postgres's own storage layer. You create an index, write a query, and get results ranked by relevance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;bm25&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;@&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'database ranking'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;@&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'database ranking'&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;&amp;lt;@&amp;gt;&lt;/code&gt; operator returns a BM25 relevance score. Scores are negated so that Postgres's default ascending ORDER BY returns the most relevant results first. The index is stored entirely in standard Postgres pages managed by the buffer cache. It participates in WAL, works with pg_dump and streaming replication, and requires no external storage or special backup procedures.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What shipped in 1.0&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;From preview to production. In October 2025, we released a preview that held the entire inverted index in shared memory, rebuilt from the heap on restart (preview blog). In the five months and 180+ commits since, the extension has been substantially rewritten:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;• Disk-based segments replaced the memory-only architecture&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;• Block-Max WAND + WAND optimization for fast top-k queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;• Posting list compression with SIMD-accelerated decoding (41% smaller indexes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;• Parallel index builds (138M documents in under 18 minutes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;• 2.4x to 6.5x faster than ParadeDB/Tantivy for 2-4 term queries at 138M scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;• 8.7x higher concurrent throughput&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;This post covers the architecture, query optimization strategy, and benchmark results. We include a candid discussion of where ParadeDB is faster and a full accounting of current limitations.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Background: Why BM25 in Postgres?
&lt;/h2&gt;

&lt;p&gt;Postgres ships &lt;code&gt;tsvector/tsquery&lt;/code&gt; with &lt;code&gt;ts_rank&lt;/code&gt; for full-text ranking. &lt;code&gt;ts_rank&lt;/code&gt; uses an ad-hoc scoring function that lacks the three properties that make BM25 effective:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inverse document frequency (IDF):&lt;/strong&gt; downweights common terms so that rarer, more informative terms drive the ranking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Term frequency saturation:&lt;/strong&gt; prevents a document from scoring arbitrarily high by repeating a term many times. A document mentioning "database" 50 times is not 50 times more relevant than one mentioning it once.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document length normalization:&lt;/strong&gt; accounts for the fact that a term match in a short document is more informative than the same match in a long one [1].&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For applications where ranking quality matters (RAG pipelines, search-driven UIs, hybrid retrieval), this is a material limitation. At scale, &lt;code&gt;ts_rank&lt;/code&gt; also has no top-k optimization path: ranking by relevance requires scoring every matching row.&lt;/p&gt;

&lt;p&gt;The primary existing BM25 extension for Postgres is ParadeDB/pg_search, which wraps the Tantivy search library written in Rust. Early versions stored the index in auxiliary files outside the WAL; current versions use Postgres pages.&lt;/p&gt;

&lt;p&gt;pg_textsearch takes a different approach: rather than wrapping an external search library, the entire search engine (tokenization, compression, query optimization) is built from scratch in C on top of Postgres's storage layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fex8hr08ubhffvj31eb79.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fex8hr08ubhffvj31eb79.png" alt="Fig. 1: pg_textsearch Architecture diagram" width="800" height="1249"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig. 1: pg_textsearch Architecture diagram&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The hybrid memtable + segment design
&lt;/h3&gt;

&lt;p&gt;pg_textsearch uses an LSM-tree-inspired architecture [4]. Incoming writes go to an in-memory inverted index (the memtable), which periodically spills to immutable on-disk segments. Segments compact in levels: when a level accumulates enough segments (default 8), they merge into the next level. Fewer segments means fewer posting lists to consult per query term, which directly reduces query latency. This is the same write-optimized-memtable / read-optimized-segment pattern used in RocksDB [5] and other LSM-based engines, adapted here for Postgres's page-based storage.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;The write path: memtable&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The memtable lives in Postgres shared memory, one per index, accessible to all backends. It contains a string-interning hash table that stores each unique term exactly once; per-term posting lists recording document IDs and term frequencies; and corpus statistics (document count and average document length) maintained incrementally so that BM25 scores can be computed without a separate pass over the index.&lt;/p&gt;

&lt;p&gt;When the memtable exceeds a configurable threshold (default: 32M posting entries), it spills to a Level-0 disk segment at transaction commit. A secondary trigger (default: 100K unique terms per transaction) handles large single-transaction loads like bulk imports.&lt;/p&gt;

&lt;p&gt;The memtable is rebuilt from the heap on startup. Since the heap is WAL-logged, no data is lost if Postgres crashes before a spill completes. This is analogous to how a write-ahead log protects an LSM memtable, except here the WAL is Postgres's own. The rebuild cost is proportional to the amount of data not yet spilled to segments; for indexes where most data has been spilled, startup is fast.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3opgjv8tk3srcg31n64y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3opgjv8tk3srcg31n64y.png" alt="Fig. 2: pg_textsearch memtable write path" width="800" height="923"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig. 2: pg_textsearch memtable write path&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The read path: segments
&lt;/h3&gt;

&lt;p&gt;Segments are immutable and stored in standard Postgres pages. Each segment contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A term dictionary:&lt;/strong&gt; a sorted array of offsets into a string pool, binary-searchable for O(log n) term lookup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Posting blocks&lt;/strong&gt; of up to 128 documents each, containing delta-encoded doc IDs, packed term frequencies, and quantized document lengths (fieldnorms). A separate skip index stores one entry per posting block with upper-bound score metadata used by Block-Max WAND optimization (described below).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A fieldnorm table&lt;/strong&gt; mapping document lengths to 1-byte quantized values using Lucene/Tantivy's SmallFloat encoding [6]. This encoding is exact for lengths 0-39 (covering most short documents); for longer documents, quantization error increases from ~5% to ~11%. In practice, the impact on ranking is smaller than these numbers suggest: BM25 scores depend on the ratio of document length to average document length, which dampens quantization error, and the b parameter (default 0.75) further reduces length's influence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A doc ID to CTID mapping&lt;/strong&gt; that translates internal document IDs to Postgres tuple identifiers for heap fetches.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmoua6q56wmqbqx7knt5f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmoua6q56wmqbqx7knt5f.png" alt="Fig. 3: pg_textsearch segment internal structure" width="800" height="1304"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig. 3: pg_textsearch segment internal structure&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Minimizing page access
&lt;/h3&gt;

&lt;p&gt;Storing data in Postgres pages means every access goes through the buffer manager. Even for pages already in cache, each access involves a buffer table lookup, pin acquisition, and lock handling. That overhead adds up in a scoring loop processing millions of postings. This constraint shaped several design decisions.&lt;/p&gt;

&lt;p&gt;Each segment assigns compact 4-byte, segment-local document IDs (0 to N-1), which map to Postgres's 6-byte CTIDs (heap tuple identifiers). After collecting all documents for a segment, doc IDs are reassigned so that doc_id order matches CTID order. Sequential iteration through posting lists then produces sequential access to the CTID mapping, maximizing cache locality. CTIDs themselves are stored as two separate arrays (4-byte page numbers and 2-byte offsets) rather than interleaved 6-byte records, doubling cache line utilization.&lt;/p&gt;

&lt;p&gt;The scoring loop works entirely with doc IDs, term frequencies, and fieldnorms. It never touches the CTID arrays. CTIDs are resolved only for the final top-k results in a single batched pass. A top-10 query that scores thousands of candidates resolves ten CTIDs, not thousands.&lt;/p&gt;
&lt;h3&gt;
  
  
  Postgres integration
&lt;/h3&gt;

&lt;p&gt;Because the index is stored in standard buffer-managed pages, pg_textsearch participates in Postgres infrastructure without special handling: MVCC visibility, proper rollback on abort, WAL and physical replication, &lt;code&gt;pg_dump / pg_upgrade&lt;/code&gt;, VACUUM with correct dead-entry removal, and planner hooks that detect the &lt;code&gt;&amp;lt;@&amp;gt;&lt;/code&gt; operator and select index scans automatically. Logical replication works in the usual way: row changes are replicated and the index is rebuilt on the subscriber.&lt;/p&gt;
&lt;h2&gt;
  
  
  Query Optimization: Block-Max WAND
&lt;/h2&gt;
&lt;h3&gt;
  
  
  The top-k problem
&lt;/h3&gt;

&lt;p&gt;Naive BM25 evaluation scores every document matching any query term. For a 3-term query on MS-MARCO v2 (138M documents), this means decoding and scoring posting lists with tens of millions of entries. Most applications need only the top 10 or 100 results. The challenge is finding them without scoring everything.&lt;/p&gt;
&lt;h3&gt;
  
  
  Block-Max WAND
&lt;/h3&gt;

&lt;p&gt;pg_textsearch implements Block-Max WAND (BMW) [2], which uses block-level upper bounds to skip non-contributing posting blocks during top-k evaluation. Lucene adopted a similar approach in version 8.0 [7]. The core idea: maintain the score of the k-th best result seen so far as a threshold, and skip any posting block whose upper-bound score cannot exceed it.&lt;/p&gt;

&lt;p&gt;Each 128-document posting block has a corresponding skip entry storing the maximum term frequency in the block and the minimum fieldnorm (the shortest document, which would score highest for a given term frequency). From these two values, BMW can compute a tight upper bound on the block's BM25 contribution without decompressing it. If the upper bound falls below the current threshold, the entire block (all 128 documents) is skipped.&lt;/p&gt;

&lt;p&gt;To illustrate: consider a single-term top-10 query on a large corpus. After scanning a few thousand postings, the algorithm has accumulated 10 results with a minimum score of, say, 12.3. It now encounters a block where the upper-bound BM25 score (computed from the block's stored metadata) is 9.1. Since 9.1 &amp;lt; 12.3, no document in this block can enter the top 10, and the entire block is skipped without decompression. For short queries on large corpora, the vast majority of blocks are skipped this way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpjzcaaou8sgoxsmo0q3b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpjzcaaou8sgoxsmo0q3b.png" alt="Fig. 4: pg_textsearch Block-Max WAND visualization" width="800" height="591"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig. 4: pg_textsearch Block-Max WAND visualization&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  WAND pivot selection
&lt;/h3&gt;

&lt;p&gt;For multi-term queries, pg_textsearch adds the WAND algorithm [3] for cross-term skipping. Terms are ordered by their current document ID, and the algorithm identifies a pivot term: the first term whose cumulative maximum score exceeds the current threshold. All terms before the pivot advance to at least the pivot's current doc ID, skipping entire ranges of documents across multiple posting lists simultaneously, before block-level BMW bounds are even checked. For multi-term queries, BMW compares the sum of per-term block upper bounds against the threshold, extending the single-term logic described above.&lt;/p&gt;

&lt;p&gt;The combination of WAND (cross-term skipping) and BMW (within-list block skipping) is most effective for short queries (1-4 terms), which account for the majority of real-world search traffic. In the full MS-MARCO v1 query set (1,010,916 queries from Bing), 72.6% have 2-4 lexemes after English stemming and stopword removal, with a mean of 3.7 and a mode of 3. The speedup narrows for longer queries, where more blocks contain at least one term with a potentially high-scoring document. Grand et al. [7] observe the same pattern in Lucene's BMW implementation.&lt;/p&gt;
&lt;h2&gt;
  
  
  Compression and Storage
&lt;/h2&gt;

&lt;p&gt;Posting blocks use a compression scheme designed for fast random-access decoding. Doc IDs are delta-encoded (storing differences between consecutive IDs rather than absolute values), then packed with variable-width bitpacking: the maximum delta in the block determines the bit width, and all deltas use that width. Term frequencies are packed separately with their own bit width. Fieldnorms are the 1-byte SmallFloat values described above.&lt;/p&gt;

&lt;p&gt;The bitpack decode path uses branchless direct-indexed uint64 loads rather than a byte-at-a-time accumulator, eliminating branch misprediction in the inner decode loop. Where available, SIMD intrinsics (SSE2 on x86-64, NEON on ARM64) accelerate the mask-and-store step. A scalar fallback handles other platforms.&lt;/p&gt;

&lt;p&gt;Compression reduces index size by 41% compared to uncompressed storage. Decode overhead is approximately 6% of query time (measured by profiling), which is more than offset by reduced buffer cache pressure. The scheme prioritizes decode speed over compression ratio.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A note on index size comparisons:&lt;/strong&gt; pg_textsearch does not store term positions, so it cannot support phrase queries natively (see Limitations). This makes its indexes inherently smaller than engines like Tantivy that store positions by default. The 19-26% size advantage reported in our benchmarks reflects both compression and this feature difference.&lt;/p&gt;
&lt;h2&gt;
  
  
  Parallel Index Build
&lt;/h2&gt;

&lt;p&gt;For large tables, serial index construction can take hours. pg_textsearch uses Postgres's built-in parallel worker infrastructure to distribute the work.&lt;/p&gt;

&lt;p&gt;The leader launches workers and assigns each a range of heap blocks. Workers scan their assigned blocks, tokenize documents via &lt;code&gt;to_tsvector&lt;/code&gt;, build local in-memory indexes, and write intermediate segments to temporary BufFiles. The leader then performs an N-way merge of all worker output, writing a single merged segment directly to index pages.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F61y9a8j5equ8ngyu0z4d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F61y9a8j5equ8ngyu0z4d.png" alt="Fig. 5: pg_textsearch Parallel Index Build" width="800" height="994"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig. 5: pg_textsearch Parallel Index Build&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Workers run concurrently in the scan/tokenize/build phase; the leader merges sequentially. The expensive part (heap scanning, tokenization, posting list assembly) is CPU-bound and parallelizes well. The merge/write phase is comparatively cheap, so a serial merge captures most of the speedup with minimal complexity. It also produces a single fully-compacted segment that is optimal for query performance.&lt;/p&gt;

&lt;p&gt;On MS-MARCO v2 (138M passages), 15 workers complete the build in 17 minutes 37 seconds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;max_parallel_maintenance_workers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;maintenance_work_mem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'256MB'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;passages&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;bm25&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Benchmarks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Methodology
&lt;/h3&gt;

&lt;p&gt;All benchmarks use the MS-MARCO passage ranking dataset [8], a standard information retrieval benchmark drawn from real Bing search queries. We compare pg_textsearch against ParadeDB v0.21.6 (which wraps Tantivy). Both extensions use their default configurations; Postgres tuning is specified per experiment. Both systems configure English stemming and stopword removal.&lt;/p&gt;

&lt;p&gt;Queries are drawn uniformly from 8 token-count buckets (100 queries per bucket on v1; up to 100 per bucket on v2). Weighted-average metrics use the MS-MARCO v1 lexeme distribution as weights, reflecting real search traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cache state.&lt;/strong&gt; All query benchmarks are warm-cache: a warmup pass runs before timing begins, and the working set fits in the OS page cache and shared_buffers for all configurations tested. Results reflect CPU and algorithmic efficiency, not I/O. We have not benchmarked memory-constrained configurations where the index exceeds available cache.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ranking.&lt;/strong&gt; Both systems produce BM25 rankings using the same tokenization (English stemming and stopwords). We have not performed a systematic ranking equivalence comparison; both implement standard BM25 with the same default parameters (k1 = 1.2, b = 0.75), but differences in IDF computation and tokenization edge cases may produce different orderings for some queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  MS-MARCO query length distribution
&lt;/h3&gt;

&lt;p&gt;The following histogram shows the distribution of query lengths in the full MS-MARCO v1 query set (1,010,916 queries), measured in lexemes after English stopword removal and stemming via Postgres &lt;code&gt;to_tsvector('english')&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9uhedx6bps3xuzxjkgny.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9uhedx6bps3xuzxjkgny.png" alt="Fig. 6: MS-MARCO query length histogram" width="800" height="432"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig. 6: MS-MARCO query length histogram&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This distribution is broadly consistent with web search query length studies [9, 10]. The MS-MARCO mean of 3.7 lexemes (after stemming/stopword removal) corresponds to roughly 5–6 raw words, consistent with the corpus statistics reported by Nguyen et al. [8]. We use the v1 distribution for weighting throughout as it provides the largest sample.&lt;/p&gt;
&lt;h3&gt;
  
  
  Results: MS-MARCO v2 (138M passages)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Environment.&lt;/strong&gt; Dedicated c6i.4xlarge EC2 instance: Intel Xeon Platinum 8375C, 8 cores / 16 threads, 123 GB RAM, NVMe SSD. Postgres 17.4 with shared_buffers = 31 GB. Both indexes fit in the buffer cache.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Index build:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;pg_textsearch&lt;/th&gt;
&lt;th&gt;ParadeDB&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Index size&lt;/td&gt;
&lt;td&gt;17 GB&lt;/td&gt;
&lt;td&gt;23 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Build time&lt;/td&gt;
&lt;td&gt;17 min 37 sec&lt;/td&gt;
&lt;td&gt;8 min 55 sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Documents&lt;/td&gt;
&lt;td&gt;138,364,158&lt;/td&gt;
&lt;td&gt;138,364,158&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parallel workers&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;pg_textsearch index is 26% smaller. ParadeDB builds approximately 2x faster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Single-client query latency (p50 median, top-10 queries):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Lexemes&lt;/th&gt;
&lt;th&gt;pg_textsearch (ms)&lt;/th&gt;
&lt;th&gt;ParadeDB (ms)&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;5.11&lt;/td&gt;
&lt;td&gt;59.83&lt;/td&gt;
&lt;td&gt;11.7x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;9.14&lt;/td&gt;
&lt;td&gt;59.65&lt;/td&gt;
&lt;td&gt;6.5x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;20.04&lt;/td&gt;
&lt;td&gt;77.62&lt;/td&gt;
&lt;td&gt;3.9x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;41.92&lt;/td&gt;
&lt;td&gt;98.89&lt;/td&gt;
&lt;td&gt;2.4x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;67.76&lt;/td&gt;
&lt;td&gt;125.38&lt;/td&gt;
&lt;td&gt;1.9x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;102.82&lt;/td&gt;
&lt;td&gt;148.78&lt;/td&gt;
&lt;td&gt;1.4x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;159.37&lt;/td&gt;
&lt;td&gt;169.65&lt;/td&gt;
&lt;td&gt;1.1x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8+&lt;/td&gt;
&lt;td&gt;177.95&lt;/td&gt;
&lt;td&gt;190.47&lt;/td&gt;
&lt;td&gt;1.1x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The same pattern holds: pg_textsearch is fastest on short queries and the systems converge at longer lengths. Weighted by the MS-MARCO v1 query length distribution, the overall p50 is 40.6 ms for pg_textsearch vs. 94.4 ms for ParadeDB, a 2.3x advantage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concurrent throughput.&lt;/strong&gt; We ran pgbench with 16 parallel clients for 60 seconds (after a 5-second warmup). Each client repeatedly executes a query drawn at random from a weighted pool of 1,000 queries:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;pg_textsearch&lt;/th&gt;
&lt;th&gt;ParadeDB&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Transactions/sec&lt;/td&gt;
&lt;td&gt;198.7&lt;/td&gt;
&lt;td&gt;22.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Average latency&lt;/td&gt;
&lt;td&gt;81 ms&lt;/td&gt;
&lt;td&gt;701 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total transactions (60s)&lt;/td&gt;
&lt;td&gt;11,969&lt;/td&gt;
&lt;td&gt;1,387&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;pg_textsearch sustains 8.7x higher throughput under concurrent load.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Results: MS-MARCO v1 (8.8M passages)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;On the smaller dataset (GitHub Actions runner, 7 GB RAM, Postgres 17), the advantages are more pronounced: 26x speedup for single-token queries, 14x for 2-token, 7.3x for 4-token. Total sequential execution time for all 800 queries: 6.5 seconds for pg_textsearch vs. 25.2 seconds for ParadeDB. Full results and methodology are available at the &lt;a href="https://timescale.github.io/pg_textsearch/benchmarks/" rel="noopener noreferrer"&gt;&lt;u&gt;benchmarks&lt;/u&gt;&lt;/a&gt; page.&lt;/p&gt;
&lt;h2&gt;
  
  
  Discussion
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Latency vs. query length
&lt;/h3&gt;

&lt;p&gt;The speedup correlates strongly with query length: 11.7x for single-token queries on v2, narrowing to 1.1x at 8+ tokens. This is the expected behavior of dynamic pruning algorithms like BMW and WAND. Grand et al. [7] observe the same pattern in Lucene's BMW implementation.&lt;/p&gt;

&lt;p&gt;The practical significance depends on the workload's query length distribution. 72.6% of MS-MARCO queries have 2-4 lexemes, the range where pg_textsearch shows its largest advantage (6.5x to 2.4x on v2). Weighted by this distribution, the overall speedup is 2.3x on v2 and 3.9x on v1.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Concurrent throughput&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The concurrent throughput advantage (8.7x) substantially exceeds the single-client advantage (2.3x weighted p50). pg_textsearch queries execute as C code operating on Postgres buffer pages, with all memory management handled by Postgres's buffer cache. ParadeDB routes queries through Rust/C FFI into Tantivy, which manages its own memory and I/O outside the buffer pool. We have not profiled ParadeDB's internals, so we cannot attribute the concurrency gap to specific causes, but the architectural difference (shared buffer cache vs. separate memory management) is a plausible contributor. ParadeDB's concurrent performance may also improve in future versions.&lt;/p&gt;
&lt;h3&gt;
  
  
  Where ParadeDB is faster
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Index build time.&lt;/strong&gt; ParadeDB builds indexes 1.6-2x faster across both datasets. Tantivy's indexer is highly optimized Rust code with its own I/O management, not constrained by Postgres's page-based storage. Build time is a one-time cost per index (or per REINDEX); it does not affect query performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Long queries.&lt;/strong&gt; At 7+ lexemes, the two systems converge. On v2, the 8+ lexeme p50 is 178 ms for pg_textsearch vs. 190 ms for ParadeDB. These long queries represent ~3.7% of the MS-MARCO distribution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Index size caveat.&lt;/strong&gt; pg_textsearch indexes are 19-26% smaller, but this comparison is not apples-to-apples: pg_textsearch does not store term positions, while ParadeDB stores positions to support phrase queries.&lt;/p&gt;
&lt;h3&gt;
  
  
  Benchmark limitations
&lt;/h3&gt;

&lt;p&gt;All measurements are warm-cache on datasets that fit in memory. The 100-query sample per bucket provides directional results but limited statistical power for tail latencies. ParadeDB v0.21.6 was current at time of testing; future versions may improve. We compare against ParadeDB because it is the primary Postgres-native BM25 alternative; standalone engines like Elasticsearch operate in a different deployment model. We have not benchmarked write-heavy workloads with concurrent queries.&lt;/p&gt;
&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;p&gt;We want to be clear about what pg_textsearch does not support in 1.0.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No phrase queries.&lt;/strong&gt; The index stores term frequencies but not term positions, so it cannot natively evaluate queries like "database system" as a phrase. Phrase matching can be done with a post-filter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
  &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;@&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'database system'&lt;/span&gt;
  &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="c1"&gt;-- over-fetch to compensate for post-filter&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;sub&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;ILIKE&lt;/span&gt; &lt;span class="s1"&gt;'%database system%'&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;OR-only query semantics.&lt;/strong&gt; All query terms are implicitly OR'd. A query for "database system" matches documents containing either term. We plan to add AND/OR/NOT operators via a dedicated boolean query syntax in a post-1.0 release.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No highlighting or snippet generation.&lt;/strong&gt; Use Postgres's &lt;code&gt;ts_headline()&lt;/code&gt; on the result set for highlighting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No expression indexing.&lt;/strong&gt; Each BM25 index covers a single text column. Workaround: create a generated column concatenating multiple fields.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Partition-local statistics.&lt;/strong&gt; Each partition maintains its own IDF and average document length. Cross-partition queries return scores computed independently per partition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No background compaction.&lt;/strong&gt; Segment compaction runs synchronously during memtable spill. Write-heavy workloads may observe compaction latency. Background compaction is planned.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PL/pgSQL requires explicit index names.&lt;/strong&gt; The implicit text &lt;code&gt;&amp;lt;@&amp;gt; 'query'&lt;/code&gt; syntax relies on planner hooks that do not fire inside PL/pgSQL, DO blocks, or stored procedures. Use &lt;code&gt;to_bm25query('query', 'index_name')&lt;/code&gt; explicitly. This is a practical limitation many developers will hit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;shared_preload_libraries required.&lt;/strong&gt; pg_textsearch must be listed in shared_preload_libraries, requiring a server restart to install. On Tiger Cloud, this is handled automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No fuzzy matching or typo tolerance.&lt;/strong&gt; pg_textsearch uses Postgres's standard text search configurations for tokenization and stemming but does not provide built-in fuzzy matching. Typo-tolerant search requires a separate approach (e.g., pg_trgm).&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Planned work for post-1.0 releases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Boolean query operators: AND, OR, NOT via a dedicated query syntax&lt;/li&gt;
&lt;li&gt;Background compaction: decouple compaction from the write path&lt;/li&gt;
&lt;li&gt;Expression index support: index computed expressions, not just bare columns&lt;/li&gt;
&lt;li&gt;Dictionary compression: front-coding for terms, reducing dictionary size&lt;/li&gt;
&lt;li&gt;Improved write concurrency: better throughput for sustained insert-heavy workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;pg_textsearch requires Postgres 17 or 18. The fastest way to try it is on &lt;a href="https://www.tigerdata.com/search" rel="noopener noreferrer"&gt;&lt;u&gt;Tiger Cloud&lt;/u&gt;&lt;/a&gt;, where it is already installed and configured. No setup, no shared_preload_libraries. Create a service and run the example below.&lt;/p&gt;

&lt;p&gt;For self-hosted installations, pre-built binaries for Linux and macOS (amd64, arm64) are available on the &lt;a href="https://github.com/timescale/pg_textsearch/releases" rel="noopener noreferrer"&gt;&lt;u&gt;GitHub Releases page&lt;/u&gt;&lt;/a&gt;. Add it to shared_preload_libraries and restart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;shared_preload_libraries&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'pg_textsearch'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Source code and full documentation: &lt;a href="https://github.com/timescale/pg_textsearch" rel="noopener noreferrer"&gt;&lt;u&gt;github.com/timescale/pg_textsearch&lt;/u&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Part 2 of this series covers getting started with pg_textsearch, hybrid search with pgvectorscale, and production patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;[1] Robertson et al. "Okapi at TREC-3." 1994. See also: Robertson, Zaragoza. "The Probabilistic Relevance Framework: BM25 and Beyond." Foundations and Trends in IR, 3(4):333-389, 2009.&lt;/p&gt;

&lt;p&gt;[2] Ding, Suel. "Faster top-k document retrieval using block-max indexes." SIGIR 2011, pp. 993-1002.&lt;/p&gt;

&lt;p&gt;[3] Broder et al. "Efficient query evaluation using a two-level retrieval process." CIKM 2003, pp. 426-434.&lt;/p&gt;

&lt;p&gt;[4] O'Neil et al. "The log-structured merge-tree (LSM-tree)." Acta Informatica, 33(4):351-385, 1996.&lt;/p&gt;

&lt;p&gt;[5] Facebook. "RocksDB: A Persistent Key-Value Store for Fast Storage Environments." &lt;a href="https://rocksdb.org/" rel="noopener noreferrer"&gt;https://rocksdb.org/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[6] SmallFloat encoding: Apache Lucene SmallFloat.java. Tantivy uses an equivalent implementation.&lt;/p&gt;

&lt;p&gt;[7] Grand et al. "From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance." ECIR 2020.&lt;/p&gt;

&lt;p&gt;[8] Nguyen et al. "MS MARCO: A Human Generated MAchine Reading COmprehension Dataset." 2016.&lt;/p&gt;

&lt;p&gt;[9] Statista. "Distribution of online search queries in the US, February 2020, by number of search terms."&lt;/p&gt;

&lt;p&gt;[10] Dean. "We Analyzed 306M Keywords." Backlinko, 2024.&lt;/p&gt;

</description>
      <category>database</category>
      <category>webdev</category>
      <category>postgres</category>
      <category>searchengine</category>
    </item>
    <item>
      <title>How to Break Your PostgreSQL IIoT Database and Learn Something in the Process</title>
      <dc:creator>Team Tiger Data</dc:creator>
      <pubDate>Mon, 30 Mar 2026 17:42:43 +0000</pubDate>
      <link>https://dev.to/tigerdata/how-to-break-your-postgresql-iiot-database-and-learn-something-in-the-process-n2d</link>
      <guid>https://dev.to/tigerdata/how-to-break-your-postgresql-iiot-database-and-learn-something-in-the-process-n2d</guid>
      <description>&lt;p&gt;As engineers, we're taught to design for reliability. We do design calculations, run simulations, build and test prototypes, and even then we recognize that these are imperfect, so we include safety factors. When it comes to the Industrial Internet of Things (IIoT) though, we rarely give the same level of scrutiny to the components that we rely on.&lt;/p&gt;

&lt;p&gt;What if we treated our IIoT database the same way we treated the physical things we produce? We build and design a prototype database, and then  &lt;a href="https://www.tigerdata.com/blog/postgres-optimization-treadmill/" rel="noopener noreferrer"&gt;put it through some serious testing&lt;/a&gt;, even to failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Value (and Perils) of Stress Testing
&lt;/h2&gt;

&lt;p&gt;Think of database stress testing as a destructive materials test for your data storage. You wouldn't trust a bridge made of untested steel, so don’t trust your database until you know its limits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Value:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Identify Bottlenecks:&lt;/strong&gt;  Stress testing reveals the weak links—what is likely to fail first? Will you run out of storage? Will your queries get bogged down? Or will you hit the dreaded ingest wall (when data comes in faster than it can be stored)?&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Determine Real-World Behaviour:&lt;/strong&gt;  You'll find out exactly how your database performance changes as the amount of data increases. What issues are future-you going to struggle with?&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://www.tigerdata.com/blog/postgres-optimization-treadmill/" rel="noopener noreferrer"&gt;&lt;strong&gt;Optimize Configuration&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;:&lt;/strong&gt;  Just like you might build a few different prototypes and see how it affects failure modes, changing your database configuration, especially when it comes to indices, can dramatically affect how it behaves. Building a rigorous stress testing framework provides a safe way to optimize your design.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I hope it goes without saying, but please, please don’t run this on your production environment. Even if it’s technically a different database but the same hardware, this test can wreak havoc on your resources and crash your system. You’ve been warned.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Measure?
&lt;/h2&gt;

&lt;p&gt;There’s no point going through all the effort to break your system if you don’t learn anything. Assuming you’re using a PostgreSQL database (&lt;a href="https://www.tigerdata.com/blog/its-2026-just-use-postgres" rel="noopener noreferrer"&gt;It’s 2026, Just Use PostgreSQL&lt;/a&gt;), here is a decent set of metrics to keep track of while you’re putting your database through its paces.&lt;/p&gt;

&lt;h3&gt;
  
  
  Table Size
&lt;/h3&gt;

&lt;p&gt;The size of a Postgresql table is generally measured by number of rows, but the actual space on disk that it occupies is a sum of the heap (the main relational table), the indices, and the TOAST (storage for large objects).&lt;/p&gt;

&lt;p&gt;The following query will give the number or rows as well as the size of each component of the table in bytes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
      &lt;span class="n"&gt;reltuples&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;bigint&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;row_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;pg_relation_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'iiot_history'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;heap_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;pg_indexes_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'iiot_history'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;indices_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;pg_table_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'iiot_history'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;
            &lt;span class="n"&gt;pg_relation_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'iiot_history'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;toast_size&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_class&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;relname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'iiot_history'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The reason for the odd row_count is that counting rows the standard way, with COUNT(*), requires scanning the whole table, which is going to be painfully slow when we’re building a table big enough to break things.&lt;/p&gt;

&lt;h3&gt;
  
  
  Table Performance
&lt;/h3&gt;

&lt;p&gt;The best way to measure table performance is to use the actual queries that your production system will use. At a minimum, this should include your batched INSERT (you always batch, right?) and at least one common SELECT. Keep in mind that for a table with N rows, the timing for queries tend to be either constant, log(N), N or worse depending on how the indices are structured.&lt;/p&gt;

&lt;p&gt;You can get very accurate timing info from running your queries with the prefix EXPLAIN ANALYZE, and it’s worth doing this at least once to see what the database is doing under the hood. However, I recommend running the whole test with a scripting language and then just timing the execution of that particular step.&lt;/p&gt;

&lt;h3&gt;
  
  
  Server Performance
&lt;/h3&gt;

&lt;p&gt;Don’t forget the engine that’s driving all this machinery. You’ll need to watch the CPU, Memory, Storage, and Network Bandwidth. People in the IT world tend to talk about headroom for a server, and that’s what you’re really looking at: how much spare capacity do you have? Your CPU and Memory usage might spike at times, but the important thing is that it’s not always running at max capacity.&lt;/p&gt;

&lt;p&gt;There are a lot of free and paid tools to monitor these variables. I almost always do this type of test in a VM (easier to clean up the mess when it all breaks) and I like to use  &lt;a href="https://prometheus.io/" rel="noopener noreferrer"&gt;Prometheus&lt;/a&gt;  but honestly Perfmon in Windows or Top in Linux gives you all you really need.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting Limits
&lt;/h3&gt;

&lt;p&gt;It’s helpful to set some limits on these parameters so you know when to stop the test. For database size, it might be some measurement like a year's worth of data, or when the drive is 80% full. For ingest timing, I suggest stopping when inserting takes longer than the desired ingest frequency—this is the ingest bottleneck and something you really want to avoid in production. Scan times can be limited by the time it takes for a specific query. Maybe calculating the average value from one tag over the past hour must be less than 10s.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Simulate Data?
&lt;/h2&gt;

&lt;p&gt;There are lots of ways to insert data, but it’s usually a tradeoff between how well the data represents real scenarios and how long it takes to run the test.&lt;/p&gt;

&lt;p&gt;The following is one of my favourite methods for injecting large amounts of data into an IIoT database:&lt;/p&gt;

&lt;p&gt;Say you have a classic IIoT history table like the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;iiot_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tag_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="nb"&gt;DOUBLE&lt;/span&gt; &lt;span class="nb"&gt;PRECISION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tag_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you expect to ingest 10,000 tags at 1s intervals, you can use the following INSERT query to add a day’s worth of history to the back end of your table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;iiot_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tag_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;min_date&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1day'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;min_date&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1s'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1s'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;LEAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="k"&gt;MIN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;min_date&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;iiot_history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tag_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will generate random data values for every second during a day and for every tag_id from 1 to 10,000. Not exactly as interesting as real data, but enough to fill up your table.&lt;/p&gt;

&lt;p&gt;The nice thing about this query is that you should be able to run it in parallel to your real-time data pipeline and it won’t mess with your data (aside from potentially locking your table while it runs). It’s also easy to modify this query to inject more or less tags as well as change the time interval if you’re playing around with different configurations.&lt;/p&gt;

&lt;p&gt;If you use this query, or whichever one you prefer, in a script (I usually use Python), then you can automate the whole test. Something along the lines of:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Get database size&lt;/li&gt;
&lt;li&gt; Run select queries, measure execution time&lt;/li&gt;
&lt;li&gt; Run insert queries several times, measure and average execution time&lt;/li&gt;
&lt;li&gt; Artificially grow database size&lt;/li&gt;
&lt;li&gt; Repeat 1-3 until one of the failure conditions is reached.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How to Interpret Results and What to Expect in the Real World?
&lt;/h2&gt;

&lt;p&gt;Your test results will give you some clear data points, but you still need to do some interpreting.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Identify the Limiting Component:&lt;/strong&gt;  Where did the database fail? If it’s a query that took too long, you might be able to speed things up with a clever index. If it’s an insert that took too long, you might be able to speed things up by removing that clever index you added earlier.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Optimize:&lt;/strong&gt;  There’s a lot you can do to improve table performance before throwing the whole thing out in frustration:

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Proper Indexing:&lt;/strong&gt;  Choosing an index is almost always a tradeoff, for example: Indexing the tag_id column before the time column will speed up most queries, at the cost of slower inserts as the table grows. Indexing the time column first will avoid the ‘ingest wall’ at the cost of slower queries. Figure out which solution is best.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Plan for the future:&lt;/strong&gt;  Will you need more hardware in a few months or a few years? Being able to estimate the life of your existing architecture means you won’t be caught unawares when it no longer suffices.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Partitioning/Chunking:&lt;/strong&gt;  For very large tables, you may need to partition appropriately (see PostgreSQL extensions like  &lt;a href="https://www.tigerdata.com/timescaledb" rel="noopener noreferrer"&gt;TimescaleDB&lt;/a&gt;). How great would it be to learn you’ll need this before you actually need this.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Add a Safety Factor:&lt;/strong&gt;  If your test showed a maximum reliable throughput of 15,000 rows/sec, set your operational limit to 10,000 rows/sec. The real world has peaks, unexpected queries, and background maintenance tasks that will steal resources. Like we do with all engineering products, design with margin.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you treat your database like a prototype and really put it through its paces, you’ll get a preview of how it’ll behave in the future and make good, proactive design decisions instead of struggling in the future. Now, go break something (and learn).&lt;/p&gt;

</description>
      <category>iot</category>
      <category>postgres</category>
      <category>industrial</category>
      <category>database</category>
    </item>
    <item>
      <title>What Developers Get Wrong About Storing Sensor Data</title>
      <dc:creator>Team Tiger Data</dc:creator>
      <pubDate>Thu, 19 Mar 2026 14:08:03 +0000</pubDate>
      <link>https://dev.to/tigerdata/what-developers-get-wrong-about-storing-sensor-data-4e4m</link>
      <guid>https://dev.to/tigerdata/what-developers-get-wrong-about-storing-sensor-data-4e4m</guid>
      <description>&lt;h2&gt;
  
  
  Sensor Data Looks Simple Until It Isn’t
&lt;/h2&gt;

&lt;p&gt;Sensor data appears straightforward. It just has timestamps, numeric readings, and maybe a device identifier. Compared to transactional application data, sensor data feels uniform and predictable. Teams often assume they can store it using familiar relational database schemas and grow from there.&lt;/p&gt;

&lt;p&gt;That assumption falls apart instantly when scale explodes. Devices multiply, sampling rates rise, and historical data accumulates indefinitely. Queries shift from single-row lookups to time windows and aggregations. Data arrives out of order. Storage costs grow exponentially. Systems designed around transactional assumptions crack in ways that are difficult to correct once data volume locks architecture in place.&lt;/p&gt;

&lt;p&gt;The root problem is conceptual. Sensor data looks like rows but behaves like a time-ordered stream whose value declines with age. Engineers must design the database as a time-series log with decay from the outset, rather than adapting it from a transactional model later. The following sections show how relational database approaches are inadequate for handling sensor data, and what a more suitable architecture looks like.&lt;/p&gt;

&lt;h2&gt;
  
  
  Default Model: Treating Sensor Data Like Rows
&lt;/h2&gt;

&lt;p&gt;Most database developers approach sensor data with a transactional mindset. They design normalized schemas, enforce relational integrity, and add indexes for point queries. They only work for mutable business entities such as users or orders.&lt;/p&gt;

&lt;p&gt;Sensor data, however, is append-only. New measurements arrive continuously and are rarely updated. Sustained ingestion and time-range retrieval are dominant, not row mutation or lookup. When schemas assume row-oriented access, data ingestion becomes join-heavy, indexing costs grow with volume, and write throughput falls behind input data flow.&lt;/p&gt;

&lt;p&gt;Treating sensor data as rows creates problems precisely where sensor systems spend most of their effort: writing and scanning time-ordered streams.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where That Model Breaks
&lt;/h2&gt;

&lt;p&gt;As the system grows, several problems appear simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First&lt;/strong&gt; , ingestion is continuous and bursty. Devices reconnect and flush buffers, producing spikes rather than steady flows. Row-oriented schemas struggle to absorb these bursts efficiently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second&lt;/strong&gt; , growth compounds across multiple axes: more devices, higher sampling frequency, additional metrics, and longer retention. Storage volume grows quickly, turning early schema choices into long-term constraints because migrating historical time-series data is costly and risky.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third&lt;/strong&gt; , queries shift toward time windows. Monitoring, analytics, and diagnostics rely on ranges, aggregates, and rates over time rather than individual rows. Row-optimized indexing performs poorly for these scans.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fourth&lt;/strong&gt; , operational realities inevitably create problems. Timestamps arrive late or out of sequence. Data must be replayed or corrected. Systems designed for ordered inserts encounter fragmentation and duplication under these conditions.&lt;/p&gt;

&lt;p&gt;Each constraint highlights the same reality. Sensor workloads are shaped by time and continuity, not by relational identity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Insight: Sensor Data Is a Log With Decay
&lt;/h2&gt;

&lt;p&gt;Sensor data has two defining properties.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It is a log: append-only, time-indexed, and rarely modified after arrival.&lt;/li&gt;
&lt;li&gt;It decays: its value decreases as it ages, even as its volume accumulates.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Recent data require high-resolution monitoring and debugging. Older data supports trends and aggregates. Very old data is rarely queried except in a summarized form. Yet without lifecycle awareness, systems retain all data at equal resolution and cost.&lt;/p&gt;

&lt;p&gt;Once teams understand that sensor data is a &lt;strong&gt;log with decay&lt;/strong&gt; , the correct architecture becomes clear. Storage must optimize for append throughput and time-range access while permitting data to evolve in resolution and tier as it ages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Time-Series Architecture
&lt;/h2&gt;

&lt;p&gt;Time-series data that loses value over time requires the database architecture to have a few key properties.&lt;/p&gt;

&lt;h3&gt;
  
  
  Log-optimized ingestion
&lt;/h3&gt;

&lt;p&gt;Writes must be sequential and batched, minimizing per-row overhead. Storage engines and schemas should favor append operations over update operations so ingestion scales with device fleets and burst conditions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Time-partitioned organization
&lt;/h3&gt;

&lt;p&gt;Data should be grouped primarily by time, corresponding its physical storage with dominant query patterns. Time partitioning keeps recent data localized and keeps historical segments compact and independent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lifecycle tiering
&lt;/h3&gt;

&lt;p&gt;Because sensor data’s value declines with age, resolution, and storage cost should decline as well. High-resolution recent data is hot, and older data is compressed, downsampled, or moved to cheaper storage tiers while preserving analytical performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Role separation
&lt;/h3&gt;

&lt;p&gt;Operational monitoring, historical analytics, and archival retention create different latency and throughput challenges. Separating these roles prevents continuous ingestion from degrading analytical performance and allows each layer to evolve independently.&lt;/p&gt;

&lt;p&gt;These properties are not optimizations layered onto transactional storage. Instead, they are intentional design choices needed to handle the key aspects of time-series data: continuous append, time-range access, and aging value.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Enables for Developers
&lt;/h2&gt;

&lt;p&gt;Architectures aligned with time-series data change how systems scale and operate.&lt;/p&gt;

&lt;p&gt;Ingestion stays stable as fleets expand because write operations match append patterns rather than row mutation. Query cost stays predictable because time-range scans match with storage layout. Storage growth stays bounded relative to insight because data resolution declines with age. Operational corrections and replays become routine rather than disruptive because logs tolerate disorder.&lt;/p&gt;

&lt;p&gt;Developers spend less effort compensating for schema problems and more effort deriving insight from data. Systems stay adaptable as deployments grow from prototypes to global fleets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Time-Series Architecture Becomes Inevitable
&lt;/h2&gt;

&lt;p&gt;Engineers only design transactional database models for mutable records whose value stays relatively stable over time. Sensor data is the opposite. It is filled with immutable events whose volume grows continuously while their value declines with age. As ingestion becomes constant, queries become time-range-driven, and history accumulates indefinitely, databases built on transactional assumptions develop write bottlenecks, inefficient scans, and rising storage costs.&lt;/p&gt;

&lt;p&gt;Once teams understand that sensor data is just an append-only data stream with aging value, the architectural solution becomes clear. Systems must ingest sequentially, organize primarily by time, reduce resolution as data ages, and separate operational and historical workloads. These structures stem directly from how sensor data behaves, not a preference for any particular technology.&lt;/p&gt;

&lt;p&gt;Treating sensor data as rows delays problems but does not fix them. As scale grows, transactional models diverge further from workload reality, while time-series architectures stay matched to it. Database design, therefore, can’t be retrofitted late without cost and disruption. It must start from the correct model: sensor data as a time-series log with decay.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>database</category>
      <category>iot</category>
      <category>performance</category>
    </item>
    <item>
      <title>Your Rails App Isn’t Slow—Your Database Is</title>
      <dc:creator>Team Tiger Data</dc:creator>
      <pubDate>Tue, 06 May 2025 12:23:00 +0000</pubDate>
      <link>https://dev.to/tigerdata/your-rails-app-isnt-slow-your-database-is-o57</link>
      <guid>https://dev.to/tigerdata/your-rails-app-isnt-slow-your-database-is-o57</guid>
      <description>&lt;p&gt;In case you missed the quiet launch of our timescaledb-ruby gem, we’re here to remind you that you can now &lt;a href="https://www.timescale.com/blog/connecting-ruby-and-postgresql-timescale-integrations-expand" rel="noopener noreferrer"&gt;connect PostgreSQL and Ruby when using TimescaleDB&lt;/a&gt;. 🎉 This integration delivers a deeply integrated experience that will feel natural to Ruby and Rails developers.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Scale Your Rails App Analytics with TimescaleDB
&lt;/h2&gt;

&lt;p&gt;If you’ve worked with Rails for any length of time, you’ve probably hit the wall when dealing with time-series data. I know I did. &lt;/p&gt;

&lt;p&gt;Your app starts off smooth—collecting metrics, logging events, tracking usage. But one day, your dashboards start lagging. Page load times creep past 10 seconds. Pagination stops helping. Background jobs queue up as yesterday’s data takes too long to process.&lt;/p&gt;

&lt;p&gt;This isn’t a Rails problem. Or even a PostgreSQL problem. It’s a “using the wrong tool for the job” problem.&lt;/p&gt;

&lt;p&gt;In this post, I’ll show you how we solve these challenges at Timescale—and how you can too. I’ll walk through the real implementation patterns we use in production Rails apps, using practical code examples instead of abstract concepts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Growing Time-Series Data Challenge
&lt;/h2&gt;

&lt;p&gt;A few years ago, I was building analytics for a high-traffic Rails app. Despite adding indexes and optimizing queries, performance kept degrading as our data grew.&lt;/p&gt;

&lt;p&gt;Like most apps, we started with simple timestamp columns and standard ActiveRecord queries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Event&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;scope&lt;/span&gt; &lt;span class="ss"&gt;:recent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'created_at &amp;gt; ?'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;week&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ago&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;scope&lt;/span&gt; &lt;span class="ss"&gt;:by_day&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"DATE_TRUNC('day', created_at)"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works fine at first. But as your table grows to millions (or billions) of rows, things slow to a crawl:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;5ms when you have 10K rows&lt;/li&gt;
&lt;li&gt;2000ms when you have 10M rows
&lt;code&gt;Event.where(user_id: 123).by_day&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the problems compound when you need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Track high-volume events (like API calls or page views)&lt;/li&gt;
&lt;li&gt;Keep historical data accessible for trends&lt;/li&gt;
&lt;li&gt;Run complex aggregations across time&lt;/li&gt;
&lt;li&gt;Maintain dashboard performance as data scales&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Over the years, I tried all the usual tricks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Additional indexes: Helped at first, then hurt insert performance&lt;/li&gt;
&lt;li&gt;Manual partitioning: Fragile and hard to manage&lt;/li&gt;
&lt;li&gt;Pre-aggregation jobs: Complex and often stale&lt;/li&gt;
&lt;li&gt;Custom caching: Difficult to maintain, always a step behind&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It felt like fighting my database instead of working with it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why PostgreSQL Falls Short for Time-Series
&lt;/h2&gt;

&lt;p&gt;PostgreSQL is a fantastic general-purpose database. But time-series data introduces new demands that standard Postgres tables aren’t designed for. Let’s break that down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Insertion pattern: Data constantly arrives in time order, but old data rarely changes&lt;/li&gt;
&lt;li&gt;Query pattern: Most queries use time bounds (WHERE created_at BETWEEN x AND y)&lt;/li&gt;
&lt;li&gt;Aggregation pattern: You’re grouping by time (hourly, daily, monthly)&lt;/li&gt;
&lt;li&gt;Storage pattern: The dataset grows linearly—forever&lt;/li&gt;
&lt;li&gt;Access pattern: Recent (hot) data is queried far more than older (cold) data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These characteristics expose several pain points:.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No built-in partitioning for time&lt;/li&gt;
&lt;li&gt;Index bloat as tables grow&lt;/li&gt;
&lt;li&gt;Inefficient time-based queries&lt;/li&gt;
&lt;li&gt;Manual rollups and background jobs&lt;/li&gt;
&lt;li&gt;Difficulty managing large historical datasets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And that’s exactly where TimescaleDB comes in.&lt;/p&gt;

&lt;h2&gt;
  
  
  TimescaleDB: PostgreSQL, But Built for Time-Series
&lt;/h2&gt;

&lt;p&gt;TimescaleDB is a PostgreSQL extension built to handle time-series and real-time workloads—without giving up the safety and simplicity of Postgres.&lt;/p&gt;

&lt;p&gt;Now with the timescaledb Ruby gem, it integrates cleanly into Rails. You don’t have to leave behind ActiveRecord, or rewrite your models, or learn a whole new stack.&lt;/p&gt;

&lt;p&gt;Here’s what TimescaleDB brings to your Rails app:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hypertables: Automatic time-based partitioning, transparent to your queries&lt;/li&gt;
&lt;li&gt;Optimized time indexes: Stay fast even as your data grows&lt;/li&gt;
&lt;li&gt;Built-in compression: Reduce storage by 90–95%&lt;/li&gt;
&lt;li&gt;Continuous aggregates: Pre-computed rollups that stay fresh automatically&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And most importantly? You keep your Rails patterns.&lt;/p&gt;

&lt;p&gt;These work just like before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="no"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;user_id: &lt;/span&gt;&lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;created_at: &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;month&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ago&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="no"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group_by_day&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:created_at&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;  &lt;span class="c1"&gt;# using the groupdate gem&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real Performance Gains Without Rewriting Everything
&lt;/h2&gt;

&lt;p&gt;With Timescale, our analytics workflows went from laggy to fast—without adding new caching layers or complex ETL.&lt;/p&gt;

&lt;p&gt;Across production workloads, teams have seen:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sub-second queries on tens of millions of rows&lt;/li&gt;
&lt;li&gt;95%+ compression on time-series datasets&lt;/li&gt;
&lt;li&gt;Fewer background jobs, thanks to continuous aggregates&lt;/li&gt;
&lt;li&gt;Simplified code—no more rollup scripts or cache warmers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It feels like your app leveled up, without any extra complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Continuous Aggregates in One Line of Ruby
&lt;/h2&gt;

&lt;p&gt;One of TimescaleDB’s most powerful features is continuous aggregates—think materialized views that update automatically in the background.&lt;br&gt;
And with the timescaledb gem, defining them looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Download&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="kp"&gt;extend&lt;/span&gt; &lt;span class="no"&gt;Timescaledb&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ActsAsHypertable&lt;/span&gt;
  &lt;span class="kp"&gt;include&lt;/span&gt; &lt;span class="no"&gt;Timescaledb&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ContinuousAggregatesHelper&lt;/span&gt;

  &lt;span class="n"&gt;acts_as_hypertable&lt;/span&gt; &lt;span class="ss"&gt;time_column: &lt;/span&gt;&lt;span class="s1"&gt;'ts'&lt;/span&gt;

  &lt;span class="n"&gt;scope&lt;/span&gt; &lt;span class="ss"&gt;:total_downloads&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nb"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"count(*) as total"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;scope&lt;/span&gt; &lt;span class="ss"&gt;:downloads_by_gem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nb"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"gem_name, count(*) as total"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:gem_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="n"&gt;continuous_aggregates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="ss"&gt;timeframes: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:minute&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:hour&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:day&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:month&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="ss"&gt;scopes: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:total_downloads&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:downloads_by_gem&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This single model creates a cascade of continuously updated rollups—from minute to month—all while sticking to the ActiveRecord patterns you know and love.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It Matters
&lt;/h2&gt;

&lt;p&gt;If you're building a Rails app that tracks metrics, logs, events, or any kind of time-based data, TimescaleDB gives you a clear path to scale without duct tape and complexity.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduce load on your app servers—let the DB do the aggregating&lt;/li&gt;
&lt;li&gt;Eliminate complex background jobs—less moving parts to break&lt;/li&gt;
&lt;li&gt;Get predictable performance—even with billions of rows&lt;/li&gt;
&lt;li&gt;Stick with Rails conventions—write less custom SQL&lt;/li&gt;
&lt;li&gt;Continuous aggregates alone can replace dozens of lines of rollup - code and hours of maintenance work.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;Rails developers deserve a time-series database that just works. TimescaleDB gives you the performance and scale your app needs without giving up the elegance of ActiveRecord.&lt;/p&gt;

&lt;p&gt;If you’re curious, here’s how to get started:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Install TimescaleDB (it’s just a Postgres extension)&lt;/li&gt;
&lt;li&gt;Add the timescaledb gem to your Gemfile&lt;/li&gt;
&lt;li&gt;Identify models with time-based data&lt;/li&gt;
&lt;li&gt;Start with hypertables, then add continuous aggregates as needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can self-host, or try Timescale Cloud for a fully managed option.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ: TimescaleDB for Ruby on Rails Developers
&lt;/h2&gt;

&lt;p&gt;Q: Do I need to change how I use ActiveRecord?&lt;/p&gt;

&lt;p&gt;A: Nope! TimescaleDB works with your existing ActiveRecord models. Just add the timescaledb gem and use the acts_as_hypertable macro to enable time-series functionality.&lt;/p&gt;

&lt;p&gt;Q: How is TimescaleDB different from just using PostgreSQL?&lt;/p&gt;

&lt;p&gt;A: TimescaleDB is a PostgreSQL extension. It gives you automatic time-based partitioning (hypertables), faster time-based queries, built-in compression, and continuous aggregates—all while staying 100% SQL- and Rails-compatible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can I keep using the gems I already use for date grouping, like groupdate?
&lt;/h3&gt;

&lt;p&gt;A: Yes. TimescaleDB works seamlessly with gems like groupdate. You can continue using .group_by_day, .group_by_hour, etc., and get better performance under the hood.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: What kind of performance improvements can I expect?
&lt;/h3&gt;

&lt;p&gt;A: Teams have seen sub-second query times on tens of millions of rows and 95%+ storage savings using TimescaleDB’s compression. The biggest wins are in read-heavy, time-bounded queries (e.g., user activity, logs, metrics).&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: What’s the learning curve for continuous aggregates?
&lt;/h3&gt;

&lt;p&gt;A: It’s minimal. The timescaledb gem lets you define continuous aggregates using a simple DSL that reuses your existing scopes. You don’t need to learn new SQL or create custom rollup jobs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can I use this in production? Is it stable?
&lt;/h3&gt;

&lt;p&gt;A: Yes. TimescaleDB powers production workloads at companies like NetApp, Linktree, and RubyGems.org. It’s backed by years of performance and reliability improvements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Do I need to self-host? Or is there a managed option?
&lt;/h3&gt;

&lt;p&gt;A: Both! You can self-host TimescaleDB or use Timescale Cloud, a fully managed PostgreSQL service with built-in TimescaleDB, HA, backups, and usage-based pricing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Where can I learn more?
&lt;/h3&gt;

&lt;p&gt;A:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/timescale/timescaledb-ruby" rel="noopener noreferrer"&gt;Ruby Quickstart&lt;/a&gt; in Timescale Docs&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/timescale/timescaledb-ruby" rel="noopener noreferrer"&gt;timescaledb-ruby&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://console.cloud.timescale.com/signup" rel="noopener noreferrer"&gt;Fully Managed Timescale Cloud&lt;/a&gt; (free for 30 days)&lt;/li&gt;
&lt;li&gt;Install the &lt;a href="https://docs.timescale.com/self-hosted/latest/install/" rel="noopener noreferrer"&gt;open-source TimescaleDB extension&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>postgres</category>
      <category>ruby</category>
      <category>rails</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>We Listened: Pgai Vectorizer Now Works With Any Postgres Database</title>
      <dc:creator>Team Tiger Data</dc:creator>
      <pubDate>Mon, 05 May 2025 15:01:35 +0000</pubDate>
      <link>https://dev.to/tigerdata/we-listened-pgai-vectorizer-now-works-with-any-postgres-database-1e57</link>
      <guid>https://dev.to/tigerdata/we-listened-pgai-vectorizer-now-works-with-any-postgres-database-1e57</guid>
      <description>&lt;p&gt;TL;DR: &lt;br&gt;
We're excited to announce that pgai Vectorizer—the &lt;a href="https://www.timescale.com/blog/pgai-vectorizer-now-works-with-any-postgres-database" rel="noopener noreferrer"&gt;tool for robust embedding creation and management&lt;/a&gt;—is now available as a Python CLI and library, making it compatible with any Postgres database, whether it be self-hosted Postgres or cloud-hosted on Timescale Cloud, Amazon RDS for PostgreSQL, or Supabase. &lt;/p&gt;



&lt;p&gt;This expansion comes directly from developer feedback requesting broader accessibility while maintaining the Postgres integration that makes pgai Vectorizer the ideal solution for production-grade embedding creation, management, and experimentation. &lt;a href="https://github.com/timescale/pgai" rel="noopener noreferrer"&gt;&lt;u&gt;To get started, head over to the pgai GitHub&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why We Built Pgai Vectorizer for Postgres
&lt;/h2&gt;

&lt;p&gt;When we first &lt;a href="https://www.timescale.com/blog/vector-databases-are-the-wrong-abstraction" rel="noopener noreferrer"&gt;&lt;u&gt;launched pgai Vectorizer&lt;/u&gt;&lt;/a&gt;, we aimed to simplify vector embedding management for developers building AI systems with Postgres. We heard the horror stories of developers struggling with complex ETL (extract-transform-load) pipelines, embedding synchronization issues, and the constant battle to keep embeddings up-to-date when source data changes. Teams were spending more time maintaining infrastructure than building useful AI features.&lt;/p&gt;

&lt;p&gt;Many developers found themselves cobbling together custom solutions involving message queues, Lambda functions, and background workers just to handle the embedding creation workflow. Others faced the frustration of stale embeddings that no longer matched their updated content, leading to degraded search quality and hallucinations in their RAG applications.&lt;/p&gt;

&lt;p&gt;Pgai Vectorizer solved these problems with a declarative approach that automated the entire embedding lifecycle with a single SQL command, similar to how you'd create an index in Postgres. The &lt;a href="https://news.ycombinator.com/item?id=41985176" rel="noopener noreferrer"&gt;&lt;u&gt;tool resonated with developers&lt;/u&gt;&lt;/a&gt; and quickly gained traction among AI builders. However, we soon started hearing a consistent piece of feedback that would shape our next steps.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Change: Moving From Extension-Only to Python CLI and Library
&lt;/h2&gt;

&lt;p&gt;After our initial launch, we received consistent feedback from developers who wanted to use pgai Vectorizer with their existing managed Postgres databases. While our extension-based approach worked great for self-hosted Postgres and Timescale Cloud, users on platforms like Amazon RDS for PostgreSQL, Supabase, and other managed database services couldn't use pgai Vectorizer unless their cloud provider chose to make it available.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-rt.googleusercontent.com%2Fdocsz%2FAD_4nXee19HYvVr8vTsVmwdlCVLOqHM1_-0VzjQrSk-3ZWQETtAFb8q8CBb9SKPmikQFJCl9ZgdpcrftidajbruKCWvshO8AkVuJbK5tpqlj9PyDrwk6SKrWfbG-KaRXu4KKmQyWrkX6bA%3Fkey%3DBTo9RW9k3V54BU75a7UCXCke" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-rt.googleusercontent.com%2Fdocsz%2FAD_4nXee19HYvVr8vTsVmwdlCVLOqHM1_-0VzjQrSk-3ZWQETtAFb8q8CBb9SKPmikQFJCl9ZgdpcrftidajbruKCWvshO8AkVuJbK5tpqlj9PyDrwk6SKrWfbG-KaRXu4KKmQyWrkX6bA%3Fkey%3DBTo9RW9k3V54BU75a7UCXCke" width="1437" height="620"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-rt.googleusercontent.com%2Fdocsz%2FAD_4nXcau7ZF40A9PuXPL2Zbp60ymU-EK3MLtzeId2XpSittjRCcBxga3dFoBApqChi4cJTwXrD9Hw2lYoAPLv-5A6ehmbIbqU2_Bji1O39jVqSL-iAm5fVyKGiRexcfArnAj9X4KEtOgA%3Fkey%3DBTo9RW9k3V54BU75a7UCXCke" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-rt.googleusercontent.com%2Fdocsz%2FAD_4nXcau7ZF40A9PuXPL2Zbp60ymU-EK3MLtzeId2XpSittjRCcBxga3dFoBApqChi4cJTwXrD9Hw2lYoAPLv-5A6ehmbIbqU2_Bji1O39jVqSL-iAm5fVyKGiRexcfArnAj9X4KEtOgA%3Fkey%3DBTo9RW9k3V54BU75a7UCXCke" width="823" height="490"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-rt.googleusercontent.com%2Fdocsz%2FAD_4nXc7W5YksotaCkwdSfhpzWB1x6DmpkBnX5DQAvP1ahIknUEXFHjwM8ATzNFAoo76_mKKDT6MpvCc_aNjCi3HZ5T9qjkB7dLGvqNh7FifbYv---v9MJZf4fPp3mNEPKKTop4-h7zr4A%3Fkey%3DBTo9RW9k3V54BU75a7UCXCke" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-rt.googleusercontent.com%2Fdocsz%2FAD_4nXc7W5YksotaCkwdSfhpzWB1x6DmpkBnX5DQAvP1ahIknUEXFHjwM8ATzNFAoo76_mKKDT6MpvCc_aNjCi3HZ5T9qjkB7dLGvqNh7FifbYv---v9MJZf4fPp3mNEPKKTop4-h7zr4A%3Fkey%3DBTo9RW9k3V54BU75a7UCXCke" width="1435" height="997"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Requests for pgai Vectorizer support on Supabase, Azure PostgreSQL, and Amazon RDS.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We knew we needed to make pgai Vectorizer more accessible without compromising its seamless Postgres integration. The solution? Repackaging our core functionality as a Python CLI (command-line interface) and library that can work with any Postgres database while maintaining the same robustness and "set it and forget it" simplicity.&lt;/p&gt;

&lt;p&gt;This approach gives developers the best of both worlds: the powerful vectorization capabilities of pgai Vectorizer with the flexibility to use their existing database infrastructure, regardless of provider. The Python library handles the creation of database objects that house the pgai Vectorizer internals, and provides a SQL API that handles loading data, creating embeddings, and synchronizing changes, all while writing the results back to your Postgres database.&lt;/p&gt;

&lt;p&gt;The library maintains all the core functionality that made pgai Vectorizer valuable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Embedding creation and management:&lt;/strong&gt; Automatically create and synchronize vector embeddings from &lt;a href="https://www.timescale.com/blog/connecting-s3-and-postgres-automatic-synchronization-without-etl-pipelines" rel="noopener noreferrer"&gt;Postgres data and S3 documents&lt;/a&gt;. Embeddings update automatically as data changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production-ready out-of-the-box&lt;/strong&gt; : Supports batch processing for efficient embedding generation, with built-in handling for model failures, rate limits, and latency spikes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Experimentation and testing:&lt;/strong&gt; &lt;a href="https://www.timescale.com/blog/open-source-vs-openai-embeddings-for-rag" rel="noopener noreferrer"&gt;&lt;u&gt;Easily switch between embedding models&lt;/u&gt;&lt;/a&gt;, test different models, and compare performance without changing application code or manually reprocessing data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plays well with pgvector and pgvectorscale:&lt;/strong&gt; Once your embeddings are created, use them to power vector and semantic search with &lt;a href="https://github.com/pgvector/pgvector" rel="noopener noreferrer"&gt;&lt;u&gt;pgvector&lt;/u&gt;&lt;/a&gt; and &lt;a href="https://github.com/timescale/pgvectorscale" rel="noopener noreferrer"&gt;&lt;u&gt;pgvectorscale&lt;/u&gt;&lt;/a&gt;. Embeddings are stored in the pgvector data format. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;*&lt;em&gt;What this means for existing users: *&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Timescale Cloud customers:&lt;/strong&gt; Existing vectorizers running on Timescale Cloud will continue to work as is, so no immediate action is necessary. We encourage you to use the new &lt;a href="https://github.com/timescale/pgai" rel="noopener noreferrer"&gt;&lt;u&gt;pgai Python library&lt;/u&gt;&lt;/a&gt; to create and manage new vectorizers. To do so, you have to upgrade to the latest version of both the pgai extension in Timescale Cloud and the pgai Python library. Upgrading the extension decouples the vectorizer-related database objects from the extension, therefore allowing them to be managed by the Python library. Pgai Vectorizer remains in Early Access on Timescale Cloud. &lt;a href="https://github.com/timescale/pgai/blob/main/docs/vectorizer/migrating-from-extension.md" rel="noopener noreferrer"&gt;&lt;u&gt;See this guide&lt;/u&gt;&lt;/a&gt; for details and instructions on upgrading and migrating.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-hosted users:&lt;/strong&gt; Existing self-hosting vectorizers will also continue to work as is, so no immediate action is required. If you already have the pgai extension installed, you’ll need to upgrade to version 0.10.1. Upgrading the extension decouples the vectorizer-related database objects from the extension, therefore allowing them to be created and managed by the Python library. &lt;a href="https://github.com/timescale/pgai/blob/main/docs/vectorizer/migrating-from-extension.md" rel="noopener noreferrer"&gt;&lt;u&gt;See this guide&lt;/u&gt;&lt;/a&gt; for self-hosted upgrade and migration details and instructions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What this means for new users:&lt;/strong&gt; Whether you use pgai Vectorizer on Timescale Cloud or self-hosted, this change means a simplified installation process and more flexibility—you now have tighter integrations between pgai Vectorizer and your search and RAG backends in your AI applications. Self-hosted users no longer need to install the pgai extension to use pgai Vectorizer. Timescale Cloud customers will continue to get the pgai extension auto-installed for them. To try pgai Vectorizer for yourself, &lt;a href="https://github.com/timescale/pgai#quick-start" rel="noopener noreferrer"&gt;&lt;u&gt;here’s how you can get started&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Pgai Vectorizer Works With Any Postgres Database
&lt;/h2&gt;

&lt;p&gt;The new Python library implementation of pgai Vectorizer works with virtually any Postgres database, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://console.cloud.timescale.com/signup" rel="noopener noreferrer"&gt;&lt;u&gt;Timescale Cloud&lt;/u&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Self-hosted Postgres&lt;/li&gt;
&lt;li&gt;Amazon RDS for PostgreSQL&lt;/li&gt;
&lt;li&gt;Supabase&lt;/li&gt;
&lt;li&gt;Google Cloud SQL for PostgreSQL&lt;/li&gt;
&lt;li&gt;Azure Database for PostgreSQL&lt;/li&gt;
&lt;li&gt;Neon PostgreSQL&lt;/li&gt;
&lt;li&gt;Render PostgreSQL&lt;/li&gt;
&lt;li&gt;DigitalOcean Managed Databases&lt;/li&gt;
&lt;li&gt;Any other self-hosted or managed Postgres service running PostgreSQL 15 and later.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The new implementation addresses one of our most requested features from the community. Users were actively building AI applications with these managed services, but couldn't take advantage of pgai Vectorizer's powerful embedding management capabilities.&lt;/p&gt;
&lt;h2&gt;
  
  
  How to Use Pgai Vectorizer: A Quick Refresher
&lt;/h2&gt;

&lt;p&gt;A standout feature of the new Python library is its enhanced support for document processing directly from cloud storage. &lt;/p&gt;

&lt;p&gt;With the expanded Amazon S3 integration, you can now seamlessly load documents and generate embeddings based on file URLs stored in your Postgres table. Pgai Vectorizer automatically loads and parses each into an LLM-friendly format like Markdown, then generates the required chunks for embedding creation, all according to your specification.&lt;/p&gt;

&lt;p&gt;For document vectorization, we've included support for parsing multiple formats, including PDF, DOCX, XLSX, HTML, images, and more using &lt;a href="https://research.ibm.com/publications/docling-an-efficient-open-source-toolkit-for-ai-driven-document-conversion" rel="noopener noreferrer"&gt;&lt;u&gt;IBM Docling&lt;/u&gt;&lt;/a&gt;, which provides advanced document understanding capabilities. This makes it easy to build powerful document search and retrieval systems without leaving the Postgres ecosystem.&lt;/p&gt;

&lt;p&gt;Getting started with the pgai Vectorizer Python library is straightforward. Install pgai on your database via:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install pgai
pgai install -d postgresql://postgres:postgres@localhost:5432/postgres
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Afterward, your database is enhanced with pgai’s capabilities. Here's a simple example of how to create a vectorizer for processing text data from a database column named ‘text’:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_vectorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'wiki'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;regclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;if_not_exists&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;loading&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loading_column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;column_name&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'text'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding_openai&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'text-embedding-ada-002'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dimensions&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'1536'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;destination&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;destination_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;view_name&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'wiki_embedding'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For document processing, you can use this configuration, which shows a document metadata table in PostgreSQL with references to data in Amazon S3:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Document source table&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;document&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;uri&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content_type&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;owner_id&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;access_level&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Example with rich metadata&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;document&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;owner_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;access_level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Product Manual'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'s3://my-bucket/documents/product-manual.pdf'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'application/pdf'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'internal'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'product'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'reference'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'API Reference'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'s3://my-bucket/documents/api-reference.md'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'text/markdown'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'public'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'api'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'developer'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_vectorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'document'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;regclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;loading&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loading_uri&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;column_name&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'uri'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;chunking&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunking_recursive_character_text_splitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;chunk_size&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;700&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;separators&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;## '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;### '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;#### '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;- '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;1. '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'.'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'?'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'!'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'|'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding_openai&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'text-embedding-3-small'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;destination&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;destination_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'document_embeddings'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the worker via:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;pgai&lt;/span&gt; &lt;span class="n"&gt;vectorizer&lt;/span&gt; &lt;span class="n"&gt;worker&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; 
&lt;span class="n"&gt;postgresql&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;postgres&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;postgres&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;localhost&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5432&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;postgres&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And watch the magic happen as pgai creates vector embeddings for your source data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get Started With Pgai Vectorizer Today
&lt;/h2&gt;

&lt;p&gt;We're excited to see what you'll build with the new pgai Vectorizer, whether you're creating semantic search, RAG, or next-gen agentic applications.&lt;/p&gt;

&lt;p&gt;Check out the &lt;a href="https://github.com/timescale/pgai" rel="noopener noreferrer"&gt;&lt;u&gt;GitHub repository&lt;/u&gt;&lt;/a&gt; to explore capabilities and getting started guides.&lt;/p&gt;

&lt;p&gt;As you can tell by this post, we really value community feedback. If you encounter any issues or have suggestions for improvements, please open an &lt;a href="https://github.com/timescale/pgai/issues" rel="noopener noreferrer"&gt;&lt;u&gt;issue on GitHub&lt;/u&gt;&lt;/a&gt; or join our &lt;a href="https://discord.gg/KRdHVXAmkp" rel="noopener noreferrer"&gt;&lt;u&gt;community Discord&lt;/u&gt;&lt;/a&gt;. Your input will help shape the future development of pgai Vectorizer as we continue to enhance its capabilities.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>python</category>
      <category>ai</category>
      <category>news</category>
    </item>
    <item>
      <title>PostgreSQL vs. Qdrant for Vector Search: 50M Embedding Benchmark</title>
      <dc:creator>Team Tiger Data</dc:creator>
      <pubDate>Fri, 02 May 2025 14:35:37 +0000</pubDate>
      <link>https://dev.to/tigerdata/postgresql-vs-qdrant-for-vector-search-50m-embedding-benchmark-3hhe</link>
      <guid>https://dev.to/tigerdata/postgresql-vs-qdrant-for-vector-search-50m-embedding-benchmark-3hhe</guid>
      <description>&lt;p&gt;Vector search is becoming a core workload for AI-driven applications. But do you really need to introduce a new system just to handle it?&lt;/p&gt;

&lt;p&gt;We ran a performancebenchmark to find out: &lt;a href="https://www.timescale.com/blog/pgvector-vs-qdrant" rel="noopener noreferrer"&gt;comparing PostgreSQL (using pgvector + pgvectorscale) with Qdrant on 50 million embeddings&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The results at 99% recall:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Sub-100ms query latencies&lt;/li&gt;
&lt;li&gt;471 queries per second (QPS) on Postgres—11x higher throughput than Qdrant (41 QPS)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Head to the full write-up for a deep dive into our &lt;a href="https://www.timescale.com/blog/pgvector-vs-qdrant" rel="noopener noreferrer"&gt;vector database comparison&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs4cujejzv0axs1s27e72.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs4cujejzv0axs1s27e72.jpg" alt="Postgres vs Qdrant vector database performance comparison" width="720" height="720"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  For vectors, Postgres is all you need.
&lt;/h2&gt;

&lt;p&gt;At 99% recall, Postgres delivers sub-100ms query latencies and handles 11x more query throughput than Qdrant (471 QPS vs. Qdrant’s 41 QPS).&lt;/p&gt;

&lt;p&gt;The results show that thanks to &lt;code&gt;pgvectorscale&lt;/code&gt;, &lt;a href="https://docs.timescale.com/ai/latest/sql-interface-for-pgvector-and-timescale-vector/" rel="noopener noreferrer"&gt;Postgres can keep up with specialized vector databases&lt;/a&gt; and deliver as good, if not better performance at scale. Learn more about &lt;a href="https://www.timescale.com/blog/why-postgres-wins-for-ai-and-vector-workloads" rel="noopener noreferrer"&gt;why Postgres wins for AI and vector workloads&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Turning PostgreSQL Into a High-Performance Vector Search Engine
&lt;/h2&gt;

&lt;p&gt;How? We built &lt;code&gt;pgvectorscale&lt;/code&gt; to push Postgres to its limits for vector workloads—without compromising recall, latency, or cost-efficiency. It turns your favorite relational database into a high-performance vector search engine.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ No extra systems.&lt;/li&gt;
&lt;li&gt;✅ No new query languages.&lt;/li&gt;
&lt;li&gt;✅ Just Postgres.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We used &lt;a href="https://rtabench.com/" rel="noopener noreferrer"&gt;RTABench&lt;/a&gt; to run a transparent, reproducible evaluation—designed for real-world, high-scale workloads.&lt;/p&gt;

&lt;p&gt;Curious about the architecture behind it all?&lt;/p&gt;

&lt;p&gt;👉 Read our whitepaper on &lt;a href="https://docs.timescale.com/about/latest/whitepaper/" rel="noopener noreferrer"&gt;building Timescale for real-time and AI workloads&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It dives into how we engineered Timescale to handle time-series, vector, and relational data—all in one Postgres-native platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR: For many vector workloads, Postgres is all you need.
&lt;/h2&gt;

&lt;p&gt;Have you used Postgres or Qdrant for vector search?&lt;br&gt;
What’s your stack look like today—and where do you feel the friction?&lt;/p&gt;

&lt;p&gt;👉 Postgres vs Qdrant: which side are you on? Comment down below!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Connecting S3 and Postgres: Automatic Synchronization Without ETL Pipelines</title>
      <dc:creator>Team Tiger Data</dc:creator>
      <pubDate>Thu, 01 May 2025 12:32:36 +0000</pubDate>
      <link>https://dev.to/tigerdata/connecting-s3-and-postgres-automatic-synchronization-without-etl-pipelines-32kg</link>
      <guid>https://dev.to/tigerdata/connecting-s3-and-postgres-automatic-synchronization-without-etl-pipelines-32kg</guid>
      <description>&lt;p&gt;Modern applications need data that's both accessible and fast. You have data in S3, but transforming it into usable insights requires complex ETL (extract-transform-load) pipelines. With our new &lt;a href="https://www.timescale.com/blog/connecting-s3-and-postgres-automatic-synchronization-without-etl-pipelines" rel="noopener noreferrer"&gt;livesync for S3 and pgai Vectorizer features&lt;/a&gt;, Timescale transforms how you interact with S3 data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Powerful Postgres–S3 Integration Approaches
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F06aeidm498g1kyknudmc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F06aeidm498g1kyknudmc.png" width="800" height="458"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our new features offer distinct approaches to working with S3 data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.timescale.com/blog/connecting-s3-and-postgres-automatic-synchronization-without-etl-pipelines#transform-s3-to-analytics-in-seconds-automatic-data-synchronization-with-livesync" rel="noopener noreferrer"&gt;&lt;strong&gt;Livesync for S3&lt;/strong&gt;&lt;/a&gt; brings your structured S3 data directly into Postgres tables, automatically synchronizing files as they change.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.timescale.com/blog/connecting-s3-and-postgres-automatic-synchronization-without-etl-pipelines#simplify-document-embeddings-with-pgai-vectorizer" rel="noopener noreferrer"&gt;&lt;strong&gt;pgai Vectorizer&lt;/strong&gt; leaves documents in S3 but generates searchable embeddings and metadata in Postgres&lt;/a&gt;, connecting unstructured content with structured data for RAG, search, and agentic applications.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both eliminate complex ETL pipelines, letting you work with S3 data using familiar SQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Transform S3 to Analytics in Seconds: Automatic Data Synchronization With Livesync
&lt;/h2&gt;

&lt;p&gt;S3 is where countless organizations store their data, but Timescale Cloud is where they unlock insights. Livesync for S3 bridges this gap, eliminating the traditional complexity of moving data between these systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  The problem: Complex ETL pipelines for S3 data
&lt;/h3&gt;

&lt;p&gt;Data management challenges create significant obstacles when bridging S3 storage and analytics environments. Organizations struggle with the manual effort required to transport data between S3 buckets and analytical databases, requiring custom integration code that demands ongoing maintenance. This challenge is compounded by the brittle and resource-intensive nature of maintaining ETL processes.&lt;/p&gt;

&lt;p&gt;Many organizations find themselves caught in a constant battle to ensure data freshness, requiring vigilant monitoring systems to confirm that analytics platforms accurately reflect the most current information in S3 repositories. The culmination of these challenges frequently manifests as performance bottlenecks, where inefficient data transfer mechanisms cause critical delays in delivering up-to-date information to customer-facing applications, leading to poor user experiences and customers making decisions based on stale data.&lt;/p&gt;

&lt;h3&gt;
  
  
  The solution: Automatic data synchronization
&lt;/h3&gt;

&lt;p&gt;Livesync for S3 bridges this gap, eliminating the traditional complexity of moving data between these systems. We've engineered livesync for S3 to bring stream-like behavior to object storage, effectively turning your S3 bucket into a continuous data feed.&lt;/p&gt;

&lt;p&gt;Our solution delivers speed and simplicity:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero-ETL experience&lt;/strong&gt; : Eliminate complex pipelines or custom integration code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time data pipeline&lt;/strong&gt; : Turn your S3 bucket into a continuous data feed with automatic synchronization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Familiar tools&lt;/strong&gt; : Use S3 for storage and Timescale Cloud for analytics without compromise.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimal configuration&lt;/strong&gt; : Connect to your S3 bucket, define mapping, and let livesync handle the rest.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How livesync works
&lt;/h3&gt;

&lt;p&gt;Behind the scenes, we're doing the heavy lifting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Schema mapping that infers your data from CSV or Parquet files to hypertables&lt;/li&gt;
&lt;li&gt;Managing the initial data load&lt;/li&gt;
&lt;li&gt;Maintaining continuous synchronization&lt;/li&gt;
&lt;li&gt;Intelligent tracking of processed files to prevent duplicates or missed data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This enables teams across multiple industries to build robust pipelines. For organizations with production applications on Postgres looking to scale their real-time analytics, livesync for S3 has a sister solution—&lt;a href="https://www.timescale.com/blog/connect-any-postgres-to-real-time-analytics" rel="noopener noreferrer"&gt;&lt;u&gt;livesync for Postgres&lt;/u&gt;&lt;/a&gt;—which lets you keep your Postgres as-is while streaming data in real time to a Timescale Cloud instance optimized for analytical workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  The inner workings of livesync for S3
&lt;/h3&gt;

&lt;p&gt;Secure cross-account authentication&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbunjw7pte7q1v6jobbwx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbunjw7pte7q1v6jobbwx.png" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Livesync employs a robust security model using AWS role assumption. Our service assumes a specific role in your AWS account with precisely the permissions needed to access your S3 data. To prevent confused deputy attacks, we implement the industry-standard External ID verification using your unique Project ID/Service ID combination.&lt;/p&gt;

&lt;h4&gt;
  
  
  Smart polling and file discovery
&lt;/h4&gt;

&lt;p&gt;Behind the scenes, livesync intelligently scans your S3 bucket using optimized ListObjectsV2 calls. Starting with the prefix from your pattern (like "logs/" from "logs/**/*.csv"), it applies glob matching to find relevant files. The system tracks processed files in lexicographical order, ensuring no file is missed or duplicated.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fij6v8264zidawnxzo2ze.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fij6v8264zidawnxzo2ze.png" width="800" height="496"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To maintain performance, livesync for S3 manages an orderly queue limited to 100 files per connection. When files are plentiful, polling accelerates to every minute; when caught up, it follows your configured schedule. You can always trigger immediate processing with the "Pull now" button.&lt;/p&gt;

&lt;h4&gt;
  
  
  Optimized data processing pipeline
&lt;/h4&gt;

&lt;p&gt;Livesync handles different file formats with specialized techniques:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CSV files&lt;/strong&gt; are analyzed for compression (UTF-8, ZIP, GZIP), then processed using high-performance parallel ingestion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parquet files&lt;/strong&gt; undergo efficient conversion before being streamed into TimescaleDB (which lives at the core of your Timescale Cloud service).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire pipeline includes intelligent error handling, which is clearly visible in the dashboard. After three consecutive failures, livesync automatically pauses to prevent resource waste, awaiting your review.&lt;/p&gt;

&lt;p&gt;This architecture delivers the perfect balance of reliability, performance, and operational simplicity, bringing your S3 data into Timescale Cloud with minimal configuration and maximum confidence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build powerful ingest pipelines with minimal configuration:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;IoT telemetry flows:&lt;/strong&gt; Connect devices that log to S3 (like AWS IoT Core) directly to time-series analytics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming data persistence:&lt;/strong&gt; Automatically process data from Kinesis, Kafka, or other streaming platforms that land files in S3 and transform into TimescaleDB hypertables for high-performance querying.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crypto/financial data analytics:&lt;/strong&gt; Sync trading data from S3 into TimescaleDB for real-time analytics on recent market movements and long-term historical analysis for backtesting and trend identification.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Currently supporting CSV and Parquet file formats, livesync delivers a frictionless way to unlock the value of your data stored in S3.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcelzm9f7sfzbwblx9bjm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcelzm9f7sfzbwblx9bjm.png" width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Simple setup, powerful results
&lt;/h3&gt;

&lt;p&gt;Livesync for S3 continuously monitors your S3 bucket for incoming sensor data, automatically maps schemas, and syncs data into TimescaleDB hypertables in minutes. This enables operators to query millions of readings with millisecond latency, driving real-time dashboards that catch anomalies before equipment fails. Livesync for S3 ensures that syncing from S3 to hypertables remains smooth, dependable, and lightning-fast.&lt;/p&gt;

&lt;p&gt;Setting up &lt;a href="https://docs.timescale.com/migrate/latest/livesync-for-s3/" rel="noopener noreferrer"&gt;&lt;u&gt;livesync for S3&lt;/u&gt;&lt;/a&gt; is surprisingly straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Connect to your S3 bucket with your credentials.&lt;/li&gt;
&lt;li&gt;Define how your objects map to TimescaleDB tables.&lt;/li&gt;
&lt;li&gt;Let livesync for S3 handle the rest—monitoring and ingesting new data automatically.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Behind the scenes, we're doing the heavy lifting of schema mapping, managing the initial data load, and maintaining continuous synchronization. The system intelligently tracks what it's processed, so you never have duplicate data or missed files.&lt;/p&gt;

&lt;p&gt;For example, in manufacturing environments where sensors continuously capture critical equipment data through AWS IoT Core and store it in S3, livesync ensures this data becomes immediately queryable in TimescaleDB. This enables operators to identify anomalies before equipment fails, turning static S3 storage into actionable intelligence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zero maintenance, maximum performance
&lt;/h3&gt;

&lt;p&gt;Once configured, livesync for S3 delivers ease and performance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero-maintenance operation&lt;/strong&gt; once configured&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema mapping&lt;/strong&gt; that infers your data from CSV or Parquet files to hypertables&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic retry mechanisms&lt;/strong&gt; for transient failures&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fine-grained control&lt;/strong&gt; over which objects sync and when&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complete observability&lt;/strong&gt; with detailed history of file imports and error messages (if any)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Simplify Document Embeddings With Pgai Vectorizer
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Searching unstructured documents embeddings with pgvector
&lt;/h3&gt;

&lt;p&gt;While livesync brings S3 data into Postgres, pgai Vectorizer takes a different approach for unstructured documents. It creates searchable vector embeddings in Postgres from documents stored in S3 while keeping the original files in place.&lt;/p&gt;

&lt;h3&gt;
  
  
  The problem: Complex pipelines for document search
&lt;/h3&gt;

&lt;p&gt;AI applications using RAG (retrieval-augmented generation) can help businesses unlock insights from mountains of unstructured data. Today, that unstructured data’s natural home is Amazon S3. On the other hand, Postgres has become the default vector database for developers, thanks to extensions like &lt;a href="https://github.com/pgvector/pgvector" rel="noopener noreferrer"&gt;&lt;u&gt;pgvector&lt;/u&gt;&lt;/a&gt; and &lt;a href="https://github.com/timescale/pgvectorscale" rel="noopener noreferrer"&gt;&lt;u&gt;pgvectorscale&lt;/u&gt;&lt;/a&gt;. These extensions enable them to build intelligent applications with vector search capabilities without needing to use a separate database just for vectors.&lt;/p&gt;

&lt;p&gt;We’ve previously written about how &lt;a href="https://www.timescale.com/blog/vector-databases-are-the-wrong-abstraction" rel="noopener noreferrer"&gt;&lt;u&gt;vector databases are the wrong abstraction&lt;/u&gt;&lt;/a&gt; because they divorce the source data from the vector embedding and lose the connection between unstructured data that's being embedded and the embeddings themselves. This problem is especially apparent for documents housed in object storage like Amazon S3.&lt;/p&gt;

&lt;p&gt;Before pgai Vectorizer, developers typically needed to manage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex ETL pipelines to chunk, format, and create embeddings from source data&lt;/li&gt;
&lt;li&gt;Multiple systems: a vector database for embeddings, an application database for metadata, and possibly a separate lexical search index&lt;/li&gt;
&lt;li&gt;Data synchronization services to maintain a single source of truth&lt;/li&gt;
&lt;li&gt;Queuing systems for updates and synchronization&lt;/li&gt;
&lt;li&gt;Monitoring tools to catch data drift and handle rate limits from embedding services&lt;/li&gt;
&lt;li&gt;Alert systems for stale search results&lt;/li&gt;
&lt;li&gt;Validation checks across all these systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Processing documents in AI pipelines introduces several challenges, such as managing diverse file formats (PDFs, DOCX, XLSX, HTML, and more), handling complex metadata, keeping embeddings up to date with document changes, and ensuring efficient storage and retrieval.&lt;/p&gt;

&lt;h3&gt;
  
  
  The solution: Automatic document vectorization
&lt;/h3&gt;

&lt;p&gt;To solve these challenges, Timescale has added support for document vectorization to pgai Vectorizer, giving developers an automated way to create embeddings from documents in Amazon S3 and keep those embeddings synchronized as the underlying data changes, eliminating the need for external ETL pipelines and queuing systems.&lt;/p&gt;

&lt;p&gt;Pgai Vectorizer provides a streamlined approach where developers can reference documents in S3 (or local storage) via URLs stored in a database table. The vectorizer then handles the complete workflow—downloading documents, parsing them to extract content, chunking text appropriately, and generating embeddings for use in semantic search, RAG, or agentic applications.&lt;/p&gt;

&lt;p&gt;This integration supports a wide variety of file formats, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Documents: PDF, DOCX, TXT, MD, AsciiDoc&lt;/li&gt;
&lt;li&gt;Spreadsheets: CSV, XLSX&lt;/li&gt;
&lt;li&gt;Presentations: PPTX&lt;/li&gt;
&lt;li&gt;Images: PNG, JPG, TIFF, BMP&lt;/li&gt;
&lt;li&gt;Web content: HTML&lt;/li&gt;
&lt;li&gt;Books: MOBI, EPUB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For developers, pgai Vectorizer for document vectorization offers three key benefits:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Get started more easily&lt;/strong&gt; → Automatic embedding creation with a simple SQL command manages the entire workflow from document reference to searchable embeddings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spend less time wrangling data infrastructure&lt;/strong&gt; → Automatic updating and synchronization of embeddings means your vector search stays current with your S3 documents without manual intervention. It’s as simple as adding a new row or updating a “modified_at” column in the documents table, and pgai Vectorizer will take off any (re)processing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Continuously improve your AI systems&lt;/strong&gt; → Testing and experimentation with different embedding models or chunking strategies can be done &lt;a href="https://www.timescale.com/blog/open-source-vs-openai-embeddings-for-rag" rel="noopener noreferrer"&gt;&lt;u&gt;with a single line of SQL&lt;/u&gt;&lt;/a&gt;, allowing you to optimize your application's performance.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By keeping your embeddings automatically synchronized to the source documents in S3, pgai Vectorizer ensures that your Postgres database remains the single source of truth for both your structured and vector data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Under the hood: How pgai Vectorizer works with Amazon S3
&lt;/h2&gt;

&lt;p&gt;Pgai Vectorizer simplifies the entire document processing pipeline through a streamlined architecture that connects your Amazon S3 documents with Postgres. Here's how it works:&lt;/p&gt;

&lt;h4&gt;
  
  
  Architecture overview
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy0nxj834n1hvuxevl0oo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy0nxj834n1hvuxevl0oo.png" width="800" height="485"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Architecture overview of pgai Vectorizer: The vectorizer system takes in source data from Postgres tables and S3 buckets, creates embeddings via worker processes running in AWS Lambda using user-specified parsing, chunking, and embedding configurations, and stores the final embeddings in Postgres tables using the pgvector data type.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The pgai Vectorizer architecture for document vectorization consists of several key components:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data sources: Postgres and Amazon S3&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text and metadata residing in Postgres tables&lt;/li&gt;
&lt;li&gt;Postgres tables containing URLs that reference documents in Amazon S3 (which serves as the data aggregation layer where your documents reside)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Vectorization configuration&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stored in Postgres, allowing you to manage everything through familiar SQL commands&lt;/li&gt;
&lt;li&gt;Defines chunking strategies, embedding models, and processing parameters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Vectorizer worker (AWS Lambda)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A daemon process that handles the actual work of processing documents&lt;/li&gt;
&lt;li&gt;Responsible for downloading, parsing, chunking, and embedding creation&lt;/li&gt;
&lt;li&gt;Automatically manages synchronization between source documents and embeddings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Destination&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All embeddings are stored in Postgres alongside metadata&lt;/li&gt;
&lt;li&gt;Enables unified queries across both structured data and vector embeddings&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Document processing pipeline
&lt;/h4&gt;

&lt;p&gt;The document vectorization process follows these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Documents are referenced via URLs stored in a database column.&lt;/li&gt;
&lt;li&gt;The vectorizer downloads documents using these URLs.&lt;/li&gt;
&lt;li&gt;Documents are parsed to extract text content in an embedding-friendly format. &lt;/li&gt;
&lt;li&gt;The content is chunked using configurable chunking strategies.&lt;/li&gt;
&lt;li&gt;Chunks are processed for embedding generation using your chosen embedding model.&lt;/li&gt;
&lt;li&gt;Embeddings are stored in Postgres with references to the source documents.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1phx1teqqkarcuu6zg0u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1phx1teqqkarcuu6zg0u.png" width="800" height="229"&gt;&lt;/a&gt;&lt;br&gt;
_ Pgai Vectorizer document processing pipeline showing how files in Amazon S3 get parsed, chunked, formatted, and embedded in order to be used in vector search queries in a Postgres database._&lt;/p&gt;

&lt;h4&gt;
  
  
  Key components
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Loader&lt;/strong&gt; : Loads files from Amazon S3&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parser&lt;/strong&gt; : Extracts content from retrieved files, handling different document formats&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunking&lt;/strong&gt; : Splits content into appropriate sizes for embedding models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Formatting&lt;/strong&gt; : Organizes chunks with metadata from the source files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding generator&lt;/strong&gt; : Processes chunks into vector embeddings&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Use cases for pgai Vectorizer document vectorization
&lt;/h3&gt;

&lt;p&gt;Pgai Vectorizer's document vectorization capabilities enable several powerful use cases across industries by connecting S3-stored documents with Postgres vector search:&lt;/p&gt;

&lt;h4&gt;
  
  
  Financial analysis
&lt;/h4&gt;

&lt;p&gt;Automatically vectorize financial documents from S3 without custom pipelines. Connect document insights with quantitative metrics for unified queries.&lt;/p&gt;

&lt;h4&gt;
  
  
  Legal document management
&lt;/h4&gt;

&lt;p&gt;Maintain synchronized knowledge bases of legal documents with automatic embedding updates. Test different models for your specific domain.&lt;/p&gt;

&lt;h4&gt;
  
  
  Enhanced customer support
&lt;/h4&gt;

&lt;p&gt;Make knowledge base content immediately searchable as it changes, connecting support documents with customer data.&lt;/p&gt;

&lt;h4&gt;
  
  
  Research systems
&lt;/h4&gt;

&lt;p&gt;Build research AI with continuously updated paper collections, connecting published findings with experimental time-series data.&lt;/p&gt;

&lt;p&gt;In each case, pgai Vectorizer eliminates infrastructure complexity while enabling continuous improvement through its "set it and forget it" synchronization and simple experimentation capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try out the S3 features in Timescale Cloud Today
&lt;/h2&gt;

&lt;p&gt;Livesync and pgai Vectorizer are just the first steps in our vision to unify Postgres and object storage into a single, powerful lakehouse-style architecture—built for real-time AI and analytics. &lt;/p&gt;

&lt;p&gt;→ &lt;a href="https://docs.timescale.com/migrate/latest/livesync-for-s3/" rel="noopener noreferrer"&gt;&lt;u&gt;Try Livesync for S3.&lt;/u&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;→ &lt;a href="https://github.com/timescale/pgai/blob/main/docs/vectorizer/document-embeddings.md" rel="noopener noreferrer"&gt;&lt;u&gt;Try pgai Vectorizer. &lt;/u&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;→&lt;a href="https://console.cloud.timescale.com/signup" rel="noopener noreferrer"&gt;&lt;u&gt; Sign up for Timescale Cloud&lt;/u&gt;&lt;/a&gt; and get started in seconds.&lt;/p&gt;

&lt;p&gt;We can’t wait to see what you build.&lt;/p&gt;

&lt;h3&gt;
  
  
  Day 4 preview: &lt;em&gt;Developer Tools That Speed Up Your Workflow: Introducing SQL Assistant, Recommendation Engine, and Insights&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt;Tomorrow, we'll reveal how Timescale delivers high-speed performance without sacrificing simplicity through &lt;strong&gt;SQL assistant with agent mode&lt;/strong&gt; , &lt;strong&gt;recommendation engine&lt;/strong&gt; , and &lt;strong&gt;Insights&lt;/strong&gt;. See how plain-language queries eliminate SQL wrangling, how automated tuning keeps databases optimized with a single click, and why developers finally get both the millisecond response times users demand and the operational simplicity teams need.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>aws</category>
      <category>ai</category>
      <category>devops</category>
    </item>
    <item>
      <title>Postgres vs. Qdrant: Why Postgres Wins for AI and Vector Workloads</title>
      <dc:creator>Team Tiger Data</dc:creator>
      <pubDate>Wed, 30 Apr 2025 16:01:47 +0000</pubDate>
      <link>https://dev.to/tigerdata/postgres-vs-qdrant-why-postgres-wins-for-ai-and-vector-workloads-3d71</link>
      <guid>https://dev.to/tigerdata/postgres-vs-qdrant-why-postgres-wins-for-ai-and-vector-workloads-3d71</guid>
      <description>&lt;p&gt;It's Timescale Launch Week and we’re bringing benchmarks: &lt;a href="https://www.timescale.com/blog/pgvector-vs-qdrant" rel="noopener noreferrer"&gt;Postgres vs. Qdrant on 50M Embeddings&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There’s a belief in the AI infrastructure world that you need to abandon general-purpose databases to get great performance on vector workloads. The logic goes: Postgres is great for transactions, but when you need high-performance vector search, it’s time to bring in a specialized vector database like Qdrant.&lt;/p&gt;

&lt;p&gt;That logic doesn’t hold—just like it didn’t when we benchmarked &lt;a href="https://www.timescale.com/blog/pgvector-vs-pinecone" rel="noopener noreferrer"&gt;pgvector vs. &lt;u&gt;Pinecone&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Like everything in Launch Week, this is about speed without sacrifice. And in this case, Postgres delivers both.&lt;/p&gt;

&lt;p&gt;We’re releasing a new benchmark that challenges the assumption that you can only scale with a specialized vector database. We compared Postgres (with &lt;a href="https://github.com/pgvector/pgvector" rel="noopener noreferrer"&gt;&lt;u&gt;pgvector&lt;/u&gt;&lt;/a&gt; and &lt;a href="https://github.com/timescale/pgvectorscale" rel="noopener noreferrer"&gt;&lt;u&gt;pgvectorscale&lt;/u&gt;&lt;/a&gt;) to Qdrant on a massive dataset of 50 million embeddings. The results show that Postgres not only holds its own but also delivers standout throughput and latency, even at production scale.&lt;/p&gt;

&lt;p&gt;This post summarizes the key takeaways, but it’s just the beginning. &lt;a href="https://www.timescale.com/blog/pgvector-vs-qdrant" rel="noopener noreferrer"&gt;&lt;u&gt;Check out the full benchmark blog post&lt;/u&gt;&lt;/a&gt; on query performance, developer experience, and operational experience.&lt;/p&gt;

&lt;p&gt;Let’s dig into what we found and what it means for teams building production AI applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Benchmark: Postgres vs. Qdrant on 50M Embeddings
&lt;/h2&gt;

&lt;p&gt;We tested Postgres and Qdrant on a level playing field:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;50 million embeddings&lt;/strong&gt; , each with 768 dimensions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ANN-benchmarks&lt;/strong&gt; , the industry-standard benchmarking tool&lt;/li&gt;
&lt;li&gt;Focused on &lt;strong&gt;approximate nearest neighbor (ANN) search&lt;/strong&gt;, no filtering&lt;/li&gt;
&lt;li&gt;All benchmarks run on identical AWS hardware&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The takeaway? Postgres with pgvector and pgvectorscale showed significantly higher throughput while maintaining sub-100 ms latencies. Qdrant performed strongly on tail latencies and index build speed, but Postgres pulled ahead where it matters most for teams scaling to production workloads.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fox73b5q8gbq353qnicwc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fox73b5q8gbq353qnicwc.png" alt="Vector search query throughput at 99 % recall (bar graph). Postgres with pgvector and pgvectorscale processes 471.57 queries per second vs. Qdrant's 41.47." width="800" height="509"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the complete results, including detailed performance metrics, graphs, and testing configurations, &lt;a href="https://www.timescale.com/blog/pgvector-vs-qdrant" rel="noopener noreferrer"&gt;&lt;u&gt;read the full benchmark blog post&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It Matters: AI Performance Without the Rewrite
&lt;/h2&gt;

&lt;p&gt;These results aren’t just a technical curiosity. They have &lt;strong&gt;real implications&lt;/strong&gt; for how you architect your AI stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Production-grade latency:&lt;/strong&gt; Postgres with pgvectorscale delivers sub-100 ms p99 latencies needed to power real-time or responsive AI applications.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Higher concurrency&lt;/strong&gt; : Postgres delivered significantly higher throughput, meaning you can support more simultaneous users without scaling out as aggressively.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lower complexity&lt;/strong&gt; : You don't need to manage and integrate a separate, specialized vector database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operational familiarity&lt;/strong&gt; : You leverage the reliability, tooling, and operational practices you already have with Postgres.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQL-first development&lt;/strong&gt; : You can filter, join, and integrate vector search naturally with relational data, without learning new APIs or query languages.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Postgres with pgvector and pgvectorscale gives you the performance of a specialized vector database &lt;em&gt;without&lt;/em&gt; giving up the ecosystem, tooling, and developer experience that make Postgres the world’s most popular database.&lt;/p&gt;

&lt;p&gt;You don’t need to split your stack to do vector search.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes It Work: Pgvectorscale and StreamingDiskANN
&lt;/h2&gt;

&lt;p&gt;How can Postgres compete with (and outperform) purpose-built vector databases?&lt;/p&gt;

&lt;p&gt;The answer lies in&lt;a href="https://github.com/timescale/pgvectorscale" rel="noopener noreferrer"&gt;&lt;u&gt;pgvectorscale&lt;/u&gt;&lt;/a&gt; (part of the&lt;a href="https://github.com/timescale/pgai" rel="noopener noreferrer"&gt;&lt;u&gt;pgai&lt;/u&gt;&lt;/a&gt; family), which implements the StreamingDiskANN index (a disk-based ANN algorithm built for scale) for pgvector. Combined with Statistical Binary Quantization (SBQ), &lt;a href="https://www.timescale.com/blog/how-we-made-postgresql-as-fast-as-pinecone-for-vector-data" rel="noopener noreferrer"&gt;&lt;u&gt;it balances memory usage and performance&lt;/u&gt;&lt;/a&gt; better than traditional in-memory HNSW (hierarchical navigable small world) implementations.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can run large-scale vector search on standard cloud hardware.&lt;/li&gt;
&lt;li&gt;You don’t need massive memory footprints or expensive GPU-accelerated nodes.&lt;/li&gt;
&lt;li&gt;Performance holds steady even as your dataset grows to tens or hundreds of millions of vectors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All while staying inside Postgres.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Choose Postgres, and When Not To
&lt;/h2&gt;

&lt;p&gt;To be clear: Qdrant is a capable system. It has faster index builds and lower tail latencies. It’s a strong choice if you’re not already using Postgres, or for specific use cases that need native scale-out and purpose-built vector semantics.&lt;/p&gt;

&lt;p&gt;However, for many teams—especially those already invested in Postgres— &lt;strong&gt;it makes no sense to introduce a new database&lt;/strong&gt; just to support vector search.&lt;/p&gt;

&lt;p&gt;If you want high recall, high throughput, and tight integration with your existing stack, Postgres is more than enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  Want to Try It?
&lt;/h2&gt;

&lt;p&gt;Pgvector and pgvectorscale are open source and available today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/pgvector/pgvector" rel="noopener noreferrer"&gt;&lt;u&gt;pgvector GitHub&lt;/u&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/timescale/pgvectorscale" rel="noopener noreferrer"&gt;&lt;u&gt;pgvectorscale GitHub&lt;/u&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Or save time and access both by creating a &lt;a href="https://timescale.com/signup" rel="noopener noreferrer"&gt;&lt;u&gt;free Timescale Cloud account&lt;/u&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Vector search in Postgres isn’t a hack or a workaround. It’s fast, it scales, and it works. If you’re building AI applications in 2025, you don’t have to sacrifice your favorite database to move fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Up Next at Timescale Launch Week
&lt;/h2&gt;

&lt;p&gt;Next up, we’re taking Postgres even further: Learn how to stream external S3 data into Postgres with livesync for S3 and work with S3 data in place using the pgai Vectorizer. Two powerful ways to seamlessly integrate external data from S3 directly into your Postgres workflows!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>postgres</category>
      <category>vectordatabase</category>
      <category>news</category>
    </item>
    <item>
      <title>Connect Any Postgres to Real-Time Analytics</title>
      <dc:creator>Team Tiger Data</dc:creator>
      <pubDate>Tue, 29 Apr 2025 18:14:58 +0000</pubDate>
      <link>https://dev.to/tigerdata/connect-any-postgres-to-real-time-analytics-5anm</link>
      <guid>https://dev.to/tigerdata/connect-any-postgres-to-real-time-analytics-5anm</guid>
      <description>&lt;p&gt;TLDR: We built livesync for Postgres to solve the analytics-vs-stability dilemma. Stream data from any Postgres instance directly into Timescale Cloud with zero downtime and no application changes. It performs historical backfills at 150GB/hour while capturing live changes through CDC, automatically converting tables to hypertables. Your production database remains untouched while you gain columnar storage, compression, and time-partitioning capabilities that dramatically accelerate queries. No more complex ETL pipelines or risky migrations, just high-performance analytics without compromising system reliability. &lt;a href="https://www.timescale.com/blog/connect-any-postgres-to-real-time-analytics" rel="noopener noreferrer"&gt;Read the full article&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://www.timescale.com/blog/scaling-postgresql-to-petabyte-scale" rel="noopener noreferrer"&gt;Scaling real-time analytics on Postgres&lt;/a&gt; has always been a balancing act. Timescale was built to solve this problem, to make Postgres scalable, fast, and analytics-ready without sacrificing reliability. &lt;/p&gt;

&lt;p&gt;But teams with production applications on vanilla Postgres (or locked into other database-as-a-service platforms) often find themselves stuck. They face an impossible choice: risk downtime to migrate, build brittle ETL (extract-transform-load) pipelines, or live with the slow drag of overloaded systems.&lt;/p&gt;

&lt;p&gt;We built &lt;a href="https://docs.timescale.com/migrate/latest/livesync-for-s3/" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;u&gt;Timescale livesync for Postgres&lt;/u&gt;&lt;/strong&gt;&lt;/a&gt; to break this cycle—and today, we’re excited to kick off a &lt;strong&gt;Timescale Launch Week&lt;/strong&gt; by introducing it.&lt;/p&gt;

&lt;p&gt;Livesync lets you stream data from any Postgres database, whether it's running on Amazon RDS for PostgreSQL, Amazon Aurora, Azure PostgreSQL, self-hosted, or elsewhere, into Timescale Cloud with zero downtime, no application rewrites, and no disruption to your existing systems.&lt;/p&gt;

&lt;p&gt;You get the analytical speed of Timescale’s &lt;a href="https://docs.timescale.com/use-timescale/latest/hypertables/" rel="noopener noreferrer"&gt;&lt;u&gt;hypertables&lt;/u&gt;&lt;/a&gt; and &lt;a href="https://docs.timescale.com/use-timescale/latest/hypercore/" rel="noopener noreferrer"&gt;&lt;u&gt;hypercore&lt;/u&gt;&lt;/a&gt;, providing seamless time-based partitioning, automatic columnar storage, and unbeatable compression without sacrificing the production stability you’ve worked hard to build. If you’re after a refresher on what Timescale can do, then check out our &lt;a href="https://docs.timescale.com/about/latest/whitepaper/" rel="noopener noreferrer"&gt;&lt;u&gt;whitepaper&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You get speed without sacrifice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Use Livesync for Real-Time Analytics?
&lt;/h2&gt;

&lt;p&gt;When your application needs to deliver real-time insights, your options have traditionally been limited:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Risky full migrations&lt;/strong&gt; , often requiring downtime windows that never feel safe enough.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Heavy, custom-built ETL pipelines&lt;/strong&gt; that introduce complexity, lag, and new points of failure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overloaded production systems&lt;/strong&gt; , where even simple dashboards slow down, and you risk impacting your critical transactional workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In an ideal world, every application that faces these problems would migrate directly to Timescale (we are just Postgres after all!), but we understand that’s not always realistic. For many teams, the stakes are too high to move fast and break things. Systems powering financial transactions, IoT networks, and SaaS platforms can’t tolerate disruption, yet they can't ignore the need for analytics.&lt;/p&gt;

&lt;p&gt;Livesync offers a new path forward: &lt;strong&gt;keep your existing Postgres exactly as it is&lt;/strong&gt; , but stream your data in real time to a dedicated Timescale Cloud instance optimized for analytical workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No downtime. No risk. No performance hits to production.&lt;/strong&gt; To get the most out of Timescale Cloud, we’d still advise that you start planning to fully migrate in the future, but for now you have blazing-fast real-time analytics.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Livesync for Postgres Works
&lt;/h2&gt;

&lt;p&gt;Livesync uses &lt;a href="https://www.postgresql.org/docs/current/logical-replication.html" rel="noopener noreferrer"&gt;&lt;u&gt;Postgres’ logical replication&lt;/u&gt;&lt;/a&gt; protocol to stream changes from your production database into Timescale Cloud. But rather than simply duplicating your data, livesync extends logical replication with high-throughput ingestion, automatic hypertable creation, and a cloud-native architecture that prepares your data for real-time analytics at scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zero-downtime setup and continuous replication
&lt;/h3&gt;

&lt;p&gt;When you first connect livesync for Postgres, it performs an initial historical backfill, copying your existing data into Timescale at speeds of up to 150 GB per hour. &lt;/p&gt;

&lt;p&gt;At the same time, it begins capturing and streaming live changes through change data capture (CDC), recording every insert, update, and delete from your source Postgres database as they happen. You can choose exactly which tables to replicate, moving only the data you need for analytics, while the rest of your production system continues operating normally, with no maintenance windows, no locks, and no downtime risk.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Futss8utz8cc0766rt86l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Futss8utz8cc0766rt86l.png" alt="Data Flow - Existing data + Real time replication" width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Behind the scenes, livesync uses a microservice that connects to your source database as a logical replication subscriber, consuming changes directly from your Postgres publication.&lt;/p&gt;

&lt;p&gt;Unlike pure logical replication, livesync for Postgres automatically prepares your data for high-performance analytics: It can create corresponding tables inside Timescale Cloud and configure them as hypertables, unlocking immediate benefits like native time-based partitioning and faster time-series queries.&lt;/p&gt;

&lt;p&gt;Once your data is live in Timescale Cloud, you can optionally enable columnar storage and compression to further accelerate analytics and optimize storage, without modifying your ingestion or sync setup. This tight integration ensures that Livesync for Postgres doesn't just mirror your data—it sets it up to scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Non-Invasive, Read-Only Architecture for Real-Time Analytics
&lt;/h2&gt;

&lt;p&gt;Livesync is designed to be as non-invasive as possible. It creates a lightweight publication and replication slot on your Postgres database but does not alter your existing schemas, application code, or database connections.&lt;/p&gt;

&lt;p&gt;Your production environment remains fully operational at all times. Livesync simply observes and streams changes without interfering, making it especially well-suited for production systems with strict stability, compliance, or uptime requirements.&lt;/p&gt;

&lt;h2&gt;
  
  
  High-Performance, Scalable by Design
&lt;/h2&gt;

&lt;p&gt;Livesync for Postgres is engineered for high throughput across both historical and live data.&lt;/p&gt;

&lt;p&gt;During the initial backfill, livesync achieves historical copy speeds of up to 150 GB per hour. During live operations, it can sustain 30,000 to 40,000 DML operations per second.&lt;/p&gt;

&lt;p&gt;Future improvements, including intra-table parallelization and adoption of &lt;a href="https://www.postgresql.org/docs/current/protocol-logical-replication.html" rel="noopener noreferrer"&gt;&lt;u&gt;logical replication protocol v2&lt;/u&gt;&lt;/a&gt; (which allows streaming of in-flight transactions rather than send on commit semantics), are already on our roadmap to push these limits even further.&lt;/p&gt;

&lt;h2&gt;
  
  
  Seamless Integration with Existing Architectures
&lt;/h2&gt;

&lt;p&gt;Livesync fits cleanly into your existing infrastructure, whether you're running RDS, Aurora, Azure Database for PostgreSQL, or a self-managed instance.&lt;/p&gt;

&lt;p&gt;Your operational systems continue working exactly as they do today. Livesync simply adds a Timescale-powered analytics layer next to your source database, letting you redirect analytical queries to Timescale by switching to a new connection string when you're ready.&lt;/p&gt;

&lt;p&gt;No replatforming, no rewrites, no downtime. Just faster SQL real-time analytics.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Choose Livesync
&lt;/h2&gt;

&lt;p&gt;Livesync is the right choice when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero downtime is non-negotiable&lt;/strong&gt; , such as in financial systems, healthcare apps, production SaaS platforms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need real-time insights today&lt;/strong&gt; , but can't risk application rewrites or complex migrations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want to incrementally adopt Timescale&lt;/strong&gt; , starting small and expanding over time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You’re evaluating Timescale performance&lt;/strong&gt; before committing to full migration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Whether you're monitoring 10,000 IoT sensors, analyzing 50 million transactions, or building dashboards for end users, livesync gets you there—without the "lift and pray" gamble of traditional migration projects.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(That said, if you're open to consolidating on a single database, check out our&lt;/em&gt; &lt;a href="https://docs.timescale.com/migrate/latest/" rel="noopener noreferrer"&gt;&lt;em&gt;&lt;u&gt;migration tooling &lt;/u&gt;&lt;/em&gt;&lt;/a&gt;&lt;em&gt;which takes the risk out of  full migrations to Timescale.)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Livesync Applications
&lt;/h2&gt;

&lt;p&gt;Livesync is built for real-world systems where uptime, performance, and gradual adoption matter. Here’s how different verticals might use it to accelerate analytics without disruption:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Financial services&lt;/strong&gt; : Offload heavy historical queries to Timescale without risking downtime or introducing disruptive schema changes. Keep OLTP workloads fast and stable while running complex analytics separately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IoT environments&lt;/strong&gt; : Send millions of time-series events daily directly into hypertables, enabling real-time rollups, faster trend analysis, and storage optimizations like compression—without custom ETL pipelines or manual partitioning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS-native architectures&lt;/strong&gt; : Layer Timescale Cloud analytics on top of existing RDS or Aurora deployments, delivering sub-second analytical performance without replatforming or disrupting operational systems.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In every case, livesync removes the traditional pain of "faster analytics" projects, delivering immediate real-world results without sacrificing production stability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up Livesync
&lt;/h2&gt;

&lt;p&gt;Getting started is simple!&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create a Timescale Cloud service&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connect livesync&lt;/strong&gt; to your existing Postgres—no code changes required (see the Actions tab when you manage your service).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configure your source database&lt;/strong&gt; , either by providing your host, port, user and database or a Postgres connection string.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Select your tables&lt;/strong&gt; , configure hypertables, and hit start.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch your data flow&lt;/strong&gt; into Timescale—with full visibility, full control, and no disruption.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once live, you can redirect your analytical queries to Timescale’s battle-tested storage engine, unlocking sub-second dashboards and deep insights without impacting your production system. &lt;/p&gt;

&lt;h2&gt;
  
  
  And For Our Next Trick...
&lt;/h2&gt;

&lt;p&gt;Connecting any Postgres to real-time analytics is just the start. This week is a Timescale Launch Week, a celebration of new features and new ways to move faster without compromise. Every day, we’re showing how you can deliver speed without sacrifice across your entire stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Day 1&lt;/strong&gt; : Connect Any Postgres to Real-Time Analytics — Start using livesync for Postgres and add real-time analytics to your existing stack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Day 2&lt;/strong&gt; : Compare Pgvector and Qdrant — See a side-by-side breakdown of two popular open-source vector databases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Day 3&lt;/strong&gt; : Connect S3 Data to Postgres — Use livesync for S3 and the pgai Vectorizer to work with external data directly inside Postgres.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Day 4&lt;/strong&gt; : Supercharge Your Developer Experience — Use our SQL Assistant’s new AI agent mode, get recommendations to tune your instance, and explore new query visibility tools with Timescale Insights.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Day 5&lt;/strong&gt; : Strengthen Security and Compliance — Maintain control while scaling your analytics performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try Livesync Today
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://console.cloud.timescale.com/signup" rel="noopener noreferrer"&gt;Start your free Timescale trial&lt;/a&gt; and set up livesync for Postgres today.&lt;/p&gt;

&lt;p&gt;And stay tuned, in the coming weeks we'll dive deeper into how we built our livesync products.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>datascience</category>
      <category>performance</category>
      <category>sql</category>
    </item>
  </channel>
</rss>
