DEV Community: Nicolas Hourcard

QuestDB 6.6 - Dynamic Commits

Nicolas Hourcard — Mon, 28 Nov 2022 14:07:58 +0000

We are excited to announce the release of QuestDB 6.6.1, which brings dynamic commits to optimize ingestion throughput and data freshness for reads. In this blog post, our CTO, Vlad, shares the story driving the creation of the dynamic commits.

QuestDB's data structure and out-of-order data ingestion

Many storage systems adopt a Log-Structured Merge tree at their core. QuestDB differs from them, and the ingested data will always be ordered by timestamp once it is committed to disk. QuestDB 6.0 enabled out-of-order ingestion, for which we introduced a commit lag to optimize ingestion throughput for unordered data. The commit lag includes a time-based buffer and delays the data commit. This way, out-of-order data can be re-ordered on the fly in memory. QuestDB's in-memory reordering is particularly efficient, and avoids heavy copy-on-merge operations, which would be needed otherwise.

As such, we have an implicit trade-off between the ingestion throughput of unordered data and the availability of data for reads. A higher buffer implies a longer time delay for the data to be available for reads, while a short buffer might affect disk write throughput, as we need to reshuffle already committed out-of-order data.

To recap, when treating the incoming data, the commit lag ensures:

Data is sorted chronologically.
New data is merged with the existing one.
A consistent view of the existing data is maintained for concurrent reads.

For QuestDB 6.5.5 and earlier versions, users needed to understand the "shape" of their data to adjust the commit lag value, either through server configuration or query settings.

A misconfigured commit lag would lead to user pain and frustration: some of our users would expect data to be available for reads immediately, but the default configuration was out of whack. For heavy out-of-order ingestion patterns, the default commit lag would lead to too many copy-on-merge operations and significantly slow down the database. And this was where the story started.

Community-driven development

A couple of weeks ago, I was working on a massive WAL (Write Ahead Log) PR, when Nic, our CEO, sent me a thread of messages:

Nic: FYI I asked Javier and Imre to connect and look at this commit lag together. Our default settings are not good enough…Someone in the community just asked me if the best we can do is ingestion with a 5-minute delay. These default settings give the wrong impression that we cannot process data in real time. Let's brainstorm.

Me: They can reduce the commit lag, why would not they do that? They can set it to 0?

Nic: They don't know that the commit lag is configurable, and this leaves a bad impression on new users.

This feedback was too important to ignore, so I caught up with our Core Engineer, Imre Aranyosi, and DevRel, Javier Ramirez. It turned out that Javier had produced a demo project in Go, sending data to QuestDB Cloud in small batches - approximately 10k rows every 50 ms. When querying data from this project, Javier could not see new data immediately. Similarly, data visibility is one of the most frequently asked questions on QuestDB Slack as well as on Stackoverflow. To our users, it wasn't clear which configuration parameters to set and where to show this data, either. We had to investigate this problem further.

Imre and I began exploring and we soon realized that it was due to the commit lag setting that data were not readily available. To increase data freshness, we had to commit more often and decouple the commit frequency from the commit lag value. We wanted to improve our user experience, without complicating the database configuration. So we went back to the whiteboard and considered the commit lag anew. Its optimal size depended on the shape of the incoming data, which was not static. This meant that the commit lag size had to be dynamic! We needed to predict the commit lag size and resize the value dynamically.

Dynamic commits

This is what we eventually implemented for QuestDB 6.6.1: to predict the correct commit lag value, QuestDB takes the maximum of the latest 4 overlap values with a multiplication factor - a bigger commit lag is better than a smaller one. This prediction is updated every second. The commit lag value shrinks down to 0 when data stops overlapping (no out-of-order data) and inflates rapidly in response to out-of-order demands, depending on the data shape prediction.

Javier has upgraded his demo project to QuestDB 6.6.1 and now data is visible instantly. Our DevRel is happy, and so are our users: it does not matter the data ingestion method and the scope of the data, QuestDB automatically adjusts settings and delivers the optimal data immediacy.

In short, users do not need to do anything to benefit from the optimal ingestion rate and data availability for read operations. Check us out on GitHub for more details.

How we achieved write speeds of 1.4 million rows per second

Nicolas Hourcard — Fri, 21 May 2021 12:59:05 +0000

At QuestDB, we've built an open-source time series database focused on performance. We started QuestDB so that we could bring our experience in low-latency trading and the technical approaches we developed in this domain to a variety of real-time data processing use cases.

The journey to today's version of QuestDB began with the original prototype in 2013, and we've described what happened since in a post published during our HackerNews launch last year. Our users deploy QuestDB to make time series analysis fast, efficient, and convenient in financial services, IoT, application monitoring, and machine learning.

What's the best way to store time series data?

In the early stages of the project, we were inspired by vector-based append-only systems like kdb+ because of the advantages of speed and the simple code path this model brings. QuestDB’s data model uses what we call time-based arrays which is a linear data structure. This allows QuestDB to slice data during ingestion in small chunks and process it all in parallel. Data that arrives in the wrong time order is dealt with and reordered in memory before being persisted to disk. Therefore, data lands on the database ordered by time already. As such, QuestDB does not rely on computationally intense indices to reorder data for any given time-series queries.

This liner model differs from the LSM trees or B-tree-based storage engines found in other open source databases such as InfluxDB or TimescaleDB.

Beyond ingestion capabilities, QuestDB’s data layout enables CPUs to access data faster. Our codebase leverages modern CPU architecture with SIMD instructions to request that the same operation be performed on multiple data elements in parallel. We store data in columns and partition it by time to lift the minimal amount of data from the disk for a given query.

How does QuestDB compare to ClickHouse, InfluxDB and TimescaleDB

We saw the Time Series Benchmark Suite (TSBS) regularly coming up in discussions about database performance and decided we should provide the ability to benchmark QuestDB along with other systems. The TSBS is a collection of Go programs to generate datasets and then benchmark read and write performance. The suite is extensible so that different use cases and query types can be included and compared across systems.

Here are our results of the benchmark with the cpu-only use case using up to fourteen workers on an AWS EC2 m5.8xlarge instance with sixteen cores.

We reach maximum ingestion performance using four workers, whereas the other systems require more CPU resources to hit maximum throughput. QuestDB achieves 959k rows/sec with 4 threads. We find that InfluxDB needs 14 threads to reach its max ingestion rate (334k rows/sec), while TimescaleDB reaches 145k rows/sec with 4 threads. ClickHouse hits 914k rows/sec with twice as many threads as QuestDB.

When running on 4 threads, QuestDB is 1.7x faster than ClickHouse, 6.5x faster than InfluxDB and 6.6x faster than TimescaleDB.

When we run the suite again using an AMD Ryzen5 processor, we found that we were able to hit maximum throughput of 1.43 million rows per second using 5 threads. This is compared to the Intel Xeon Platinum that's in use by our reference benchmark m5.8xlarge instance on AWS.

How should you store out-of-order time series data?

Re-ordering data which is "out-of-order" (O3) during ingestion proved particularly challenging. It is a new approach that we wanted to detail a little bit more in this article. Our idea of how we could handle out out-of-order ingestion was to add a three-stage approach:

Keep the append model until records arrive out-of-order
Sort uncommitted records in a staging area in-memory
Reconcile and merge the sorted out-of-order data and persisted data at commit time

The first two steps are straightforward and easy to implement, and handling append-only data is unchanged. The heavy out-of-order commit kicks in only when there is data in the staging area. The bonus of this design is that the output is vectors, meaning our vector-based readers are still compatible.

This pre-commit sort-and-merge adds an extra processing phase to ingestion with an accompanying performance penalty. We nevertheless decided to explore this approach and see how far we could reduce the penalty by optimizing the out-of-order commit.

How we sort, merge and commit out-of-order time series data

Processing a staging area gives us a unique opportunity to analyze the data holistically where we can avoid physical merges altogether and get away with fast and straightforward memcpy or similar data movement methods. Such methods can be parallelized thanks to our column-based storage. We can employ SIMD and non-temporal data access where it makes a difference.

We sort the timestamp column from the staging area via an optimized version of radix sort, and the resulting index is used to reshuffle the remaining columns in the staging area in parallel:

The now-sorted staging area is mapped relative to the existing partition data. It may not be obvious from the start but we are trying to establish the type of operation needed and the dimensions of each of the three groups below:

When merging datasets in this way, the prefix and suffix groups can be persisted data, out-of-order data, or none. The merge group is where more cases occur as it can be occupied by persisted data, out-of-order data, both out-of-order and persisted data, or none.

When it's clear how to group and treat data in the staging area, a pool of workers perform the required operations, calling memcpy in trivial cases and shifting to SIMD-optimized code for everything else. With a prefix, merge, and suffix split, the maximum liveliness of the commit (how susceptible it is to add more CPU capacity) is partitions_affected x number_of_columns x 3.

How often should time series data be sorted and merged?

Being able to copy data fast is a good option, but we think that heavy data copying can be avoided in most time series ingestion scenarios. Assuming that most real-time out-of-order situations are caused by the delivery mechanism and hardware jitter, we can deduce that the timestamp distribution will be contained by some boundary.

For example, if any new timestamp value has a high probability to fall within 10 seconds of the previously received value, the boundary is then 10 seconds, and we call this boundary lag.

When timestamp values follow this pattern, deferring the commit can render out-of-order commits a normal append operation. The out-of-order system can deal with any variety of lateness, but if incoming data is late within the time specified by lag, it will be prioritized for faster processing.

How to compare time series database performance

We have opened a pull request (Questdb benchmark support) in TimescaleDB's TSBS GitHub repository, to add the ability to run the benchmark against QuestDB. In the meantime, users may clone our fork of the benchmark and run the suite to see the results for themselves.

tsbs_generate_data --use-case="cpu-only" --seed=123 --scale=4000 \
  --timestamp-start="2016-01-01T00:00:00Z" --timestamp-end="2016-01-02T00:00:00Z" \
  --log-interval="10s" --format="influx" > /tmp/bigcpu

tsbs_load_questdb --file /tmp/bigcpu --workers 4

Building an open source database with a permissive license

Pushing database performance further while making it easy for developers to get started with our product motivates us every day. This is why we are focused on building a solid community of developers who can participate and improve the product through our open source distribution model.

Beyond making QuestDB easy to use, we want to make it easy to audit, review, and make code or general project contributions. All of QuestDB's source code is available on GitHub under the Apache 2.0 license and we welcome all sorts of contributions from GitHub issues to pull requests.

What is time-series data, and why are we building a time-series database (TSDB)?

Nicolas Hourcard — Thu, 19 Nov 2020 18:02:51 +0000

This blog post covers the basics of time-series data and why time-series databases have seen such an explosion in popularity since the category emerged a decade ago. Additionally, we will briefly cover the origin story of QuestDB, why we set out to build a new database from scratch and go through the database design choices and trade-offs.

Time-series data and characteristics of TSDBs

Time-series data is everywhere. Sensors, financial exchanges, servers, and software applications generate streams of events, which need to be analyzed on the fly. Time-series databases (TSDB) emerged as a category to better deal with vasts amount of machine data. These specialized Database Management Systems (DMBS) are now empowering millions of developers to collect, store, process, and analyze data over time. With new time-series forecasting methods and machine learning models, companies are now better equipped to train and refine their models to predict future outcomes more accurately.

Time-Series data explained

Time series is a succession of data points ordered by time. Time-series data is often plotted on a chart where the x-axis is time and the y-axis is a metric that changes over time. For example, stock prices change every microsecond or even nanosecond, and the trend is best presented as time-series data.

Time-series data has always been plentiful in financial services with fast-changing tick price data and in e-commerce/ad-tech to better understand user analytics. With the rise in connected devices, application monitoring, and observability, time-series data is now critical in nearly all fields. We list a couple of examples below:

Time-series data has several unique characteristics:

The amount of data created and processed is large.
The amount of data flowing from the source is often uninterrupted.
The volume is also unpredictable and can come with bursts of high volumes of data incoming at irregular intervals. This is very common in financial markets, with spikes of trading volume occurring after events, which are difficult to predict.
Fresh data needs to be analyzed on the fly. Anomaly detection is a good example.

With the rise of time-series data, time series has been the fastest-growing database category for the past five years according to DB-engines.

Time-series databases design

As use cases and the need for time series analysis are increasing exponentially, so is the amount of raw data itself. To better cope with the ever-growing amount of data, time-series databases emerged a decade ago. They focus on performance with fast ingests to process a higher number of data points. The trade-off is less stringent consistency guarantees, which are typically must-haves for OLTP workloads. It is pretty common for time-series databases not to be ACID compliant.

Unlike traditional databases in which older data entries are typically updated with the most recent data point to show the latest state, time-series databases continuously accumulate data points over time. This way, one can draw insights from the evolution of metrics to conclude meaningful insights from the data. This is why TSDBs are optimized for write operations rather than updates. Once the data is stored in a database, most use cases require querying this data in real-time to uncover insights on the data quickly. DevOps teams will set real-time alerts to detect anomalies in server metrics such as CPU or memory. E-commerce websites need to understand buyers’ behavior to gather new insights and optimize their stock. A fintech company will want to detect fraud as transactions occur.

Automated partitioning management

Time partitions are created automatically as data arrives. In QuestDB, data is partitioned by time (hourly, daily, weekly or monthly). Slicing the data by time partitions makes time-based queries more efficient. Time-based queries will only lift the relevant time partitions from the disk rather than lifting the entire dataset. Partitioning also allows having multiple tiers of storage, where older partitions can be mounted into cold storage, which is cheaper and slower.

Downsampling and interpolation

Representing the data with a lower frequency. For example, shifting from a daily view to a monthly view. In order to facilitate such queries with SQL and make them less verbose, QuestDB built a native extension to ANSI SQL with the keyword SAMPLE BY. This SQL statement slices the dataset by a time interval (15 minutes in our example below) and runs aggregations for that time period. We can optionally fill values for those periods for which we have no results (interpolation, fill with null, default, etc.)

Interval search

Fast retrieving data over arbitrary intervals. For example, zooming into a specific timeframe preceding a monitoring alert to better understand the underlying cause in real-time. QuestDB’s WHERE filter and IN time modifier for timestamp search is fast and efficient. The SQL query below retrieves all the data points in June 2018 for the column pickup_datetime:

Time series joins

Align join time-series data from two different tables, which do not have exactly matching timestamps. These are known as ASOF joins, which we have elaborated in the ASOF JOIN section in our documentation. Below, the two tables, trips and weather, each show values for given timestamps. However, the timestamps for each table are not equal. For each timestamp in trips, ASOF finds the nearest timestamp in weather and shows the associated weather value in the result table:

Most recent first

With time-series data, the most recent data is often more likely to be analyzed. QuestDB’s SQL language extension includes LATEST ON to get the most recent view of a record instantly. As data is ingested in chronological order, QuestDB starts scanning from the bottom and can thus retrieve the data point fast.

Streaming ingestion protocols

As time-series data is mostly machine data, it is produced and streamed to a database in a continuous fashion. The ability to sustain streamed data rather than in slow batches quickly becomes a must. The InfluxDB line protocol is very efficient and offers a lot of flexibility. For example, you can create new columns on the fly without specifying a schema ahead of time.

Why we set out to build QuestDB

Democratizing time-series data performance

Our CTO worked in electronic trading and had built trading infrastructure for more than 10 years. In 2013, his boss would not allow him to use the only high-performance database suited to deal with time-series data because of its proprietary nature and price.

QuestDB was built with the intention to democratize the performance that was only available for high-end enterprise applications and make the tooling available for every developer around the world leveraging an open-source distribution model. Instead of writing a new querying language from scratch, our CTO decided to facilitate developer adoption via SQL rather than a complex proprietary language.

And this was the origin of QuestDB.

We have heard a large number of companies complaining about the performance limitations of open-source time-series databases. Most of those reuse existing libraries or are an extension of a well-known database that was not designed to process time-series data efficiently in the first place.

Instead, we chose an alternative route, one that took more than 7 years of R&D. Our vision from day 1 was to challenge the norm and build software that uses new approaches and leverages the techniques learned in low-latency trading floors. An important aspect was to study and understand the evolution of hardware to build database software that could extract more performance from CPUs, memory, and modern hard disks.

QuestDB design and performance

QuestDB is built-in zero-GC Java and C++, and every single algorithm in the code base has been written from scratch with the goal of maximizing performance.

QuestDB’s data model (time-based arrays) differs from the LSM-tree or B-tree based storage engines found in InfluxDB or TimescaleDB. It reduces overhead and data duplication while maintaining immediate consistency and persisting data on disk.

This linear data model structure massively optimizes ingestion as it allows the database to slice data extremely efficiently in small chunks and process it all in parallel. QuestDB also saturates the network cards to process messages from several senders in parallel. Our ingestion is append-only, with constant complexity, i.e. O(1); QuestDB does not rely on computationally intense indices to reorder data as it hits the database. Out-of-order ingests are dealt with and re-ordered in memory before being persisted to disk.

QuestDB’s data layout enables CPUs to access data faster. With respect to queries, our codebase leverages modern CPU architecture with SIMD instructions to enable the same operation to be performed on multiple data elements in parallel. We store data in columns and partition it by time in order to lift the minimal amount of data from the disk for a given query.

We didn't get everything right from the start! In 2021, we shipped QuestDB 6.0 to support high performance out-of-order data. A few months ago, we shipped dynamic commits to optimize ingestion throughput and data freshness for reads. We also rewrote our ingestion layer to make it more performant — taking advantage of the latest OS kernel innovations — and released official clients in seven programming languages to improve the developer experience. We are in the middle of decoupling data ingestion from table writers to eliminate table locks when using the Postgres wire protocol. We already have some ideas to make downsampling and aggregation queries even faster.

Additional resources on time-series data and databases

To learn more about time-series data:

Best Time Series Databases
DB-Engines Ranking of Time Series DBMS
Code to the Moon featuring QuestDB
Amazon Timestream - Time series is the new black
Time series on Wikipedia

To learn more about relevant projects:

Visualizing the stock market structure
Prophet: forecasting at scale
A Guide to Time Series Forecasting in Python

The original post was published on QuestDB's blog

QuestDB announces Grafana support

Nicolas Hourcard — Mon, 26 Oct 2020 15:35:38 +0000

Hi all,

QuestDB is a fast SQL time-series database. We have done our HackerNews launch a few months ago and our live demo with 1.6 billion rows from a well known NYC taxi dataset is still up and running!

We announce QuestDB 4.0.4 with Grafana support, better postgreSQL compatibility and authentification for InfluxDB line protocol.

Hit us with any questions, and bear with us until we release out of order inserts.

The original blog post about building a Grafana dashboard on QuestDB is on QuestDB's blog.

How we made our SQL database QuestDB even faster and more accurate

Nicolas Hourcard — Fri, 29 May 2020 14:55:14 +0000

See our article here

About a month ago, we posted about using SIMD instructions to make aggregation calculations faster.

Many comments suggested that we implement compensated summation (aka Kahan) as the naive method could produce inaccurate and unreliable results. This is why we spent some time integrating kahan and Neumaier summation algorithms. This post summarises a few things we learned along this journey.

We thought Kahan would badly affect the performance since it uses 4x as many operations as the naive approach. However, some comments also suggested we could use prefetch and co-routines to pull the data from RAM to cache in parallel with other CPU instructions. We got phenomenal results thanks to these suggestions, with Kahan sums nearly as fast as the naive approach.

A lot of you also asked if we could compare this with Clickhouse. As they implement Kahan summation, we ran a quick comparison. Here's what we got for summing 1bn doubles with nulls with Kahan algo. The details of how this was done are in the post.

QuestDB: 68ms Clickhouse: 139ms

Thanks for reading and please leave us a star if you find the project interesting!

Nic

QuestDB - fast relational time-series DB, zero GC java

Nicolas Hourcard — Mon, 02 Dec 2019 17:06:52 +0000

Hi all,

We have just released QuestDB open source (apache 2.0), and we would welcome your feedback.

QuestDB is an open-source NewSQL relational database designed to process time-series data, faster. Our approach comes from low-latency trading; QuestDB’s stack is engineered from scratch, zero-GC Java and dependency-free.

https://www.questdb.io/
https://github.com/questdb/questdb

thanks

Nic