DEV Community

DolphinDB
DolphinDB

Posted on

How to Downsample Your Data Efficiently

Image description

Tired of spending countless mintues downsampling your data? Look no further!

In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million minute-level records in only 41 seconds in DolphinDB.

The basic configuration of the DolphinDB server is:

  • 16 CPU cores
  • 256 GB memory
  • 4 SSDs

A DolphinDB cluster with 4 data nodes is deployed, and each node uses a SSD.

Image description

The data we use is:

  • the level 1 quotes on August, 2007 from New York Stock Exchange
  • around 272 GB, with 6.48 billion records

Downsampling can be performed with a SQL statement in DolphinDB.

Image description

As the SQL query may involve multiple partitions, DolphinDB breaks down the job into several tasks and assigns the tasks to the corresponding data nodes for parallel execution. When all the tasks are completed, the system merges the intermediate results from the nodes to return the final result.

The script is as follows:

db = database("dfs://TAQ")
quotes = db.loadTable("quotes")
select count(*)  from quotes where date between 2007.08.01 : 2007.08.31

model=select  top 1 symbol,date, minute(time) as minute, bid, ofr from quotes where date=2007.08.01,symbol=`EBAY
if(existsTable("dfs://TAQ", "quotes_minute_sql"))
 db.dropTable("quotes_minute_sql")
db.createPartitionedTable(model, "quotes_minute_sql", `date`symbol)

timer{
 minuteQuotes=select avg(bid) as bid, avg(ofr) as ofr from quotes where data between 2007.08.01 : 2007.08.31 group by symbol,date,bar(time, 60) as minute
 loadTable("dfs://TAQ", "quotes_minute_sql").append!(minuteQuotes)
}

select count(*)  from loadTable("dfs://TAQ", "quotes_minute")
Enter fullscreen mode Exit fullscreen mode

The frequency can be adjusted as needed just by modifying bar(time, 60). Here 60 means the data is downsampled to 1-minute interval as the timestamp values have seconds precision.

Image description

The table “quotes_minute_sql“ is created with createPartitionedTable and the downsampled result can be appended to this table.

Image description

We can execute the script and visit the web-based user interface to check the resource usage. It’s shown that all CPU cores have participated in the downsampling. On each data node, 15 tasks are running concurrently as data is being read from disk.

Image description

When we come back to VScode and check the execution status, we find that it only takes 41 seconds to complete the data downsampling, which generates 61 million minute-level records.

Image description

DolphinDB exhibits outstanding performance in data downsampling due to the following reasons:

  1. Jobs are executed distributedly and resources of different nodes can be utilized at the same time;
  2. Compression reduces the disk I/O;
  3. Columnar storage and vectorized computation improve the efficiency of aggregation.

To learn detailed operations of data downsampling, take a look at this demo!
https://youtu.be/0vRuiz1Lf6Y

API Trace View

How I Cut 22.3 Seconds Off an API Call with Sentry 👀

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more →

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

👋 Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay