DEV Community

Discussion on: Building ClickHouse Dashboard and crunching WallStreetBets data πŸ’ΈπŸ€‘

Collapse
 
neskodi profile image
Sergio Neskodi • Edited

First, thank you for the tutorial and for cube.js in general.
Second, I'd like to point out a problem in how you are aggregating data for the candlestick chart.

As I see, you are grouping by day so each candlestick represents a day on the chart.

Daily high price for that day is the "max" of all the "high" prices of that day - OK
Daily low price for that day is the "min" of all the "low" prices of that day - OK
Daily "open" price should be the "open" of the first price within that day - but you are showing "average open for the day" - NOT OK
Daily "close" price should be the "close" of the last price within that day - but you are showing "average close for the day" - NOT OK

This will generate a misleading chart, so even if the simplification is deliberate for the sake of demonstrating Cube.js, you should probably leave a note about that for anyone following your footsteps.

A much more valuable information for most readers though (myself included) would be the correct way to set up Cube.js to generate an accurate candlestick chart (this is what I actually came here for). Although the SQL for that case becomes heavier by order of magnitude, maybe Cube.js has the power of making it simple for us?

Collapse
 
igorlukanin profile image
Igor Lukanin

Hey Sergio @neskodi πŸ‘‹

Thanks for taking the time to dig deep into this content piece. Also, thanks for the feedback, that's very helpful. Appreciate it!

Daily "open" price should be the "open" of the first price within that day - but you are showing "average open for the day" - NOT OK
Daily "close" price should be the "close" of the last price within that day - but you are showing "average close for the day" - NOT OK

You're right, that's exactly how these measure are defined. In a general case, it's not okay. No doubt, the open and close prices should refer to the very first and the very last transactions in a given day.

However, there's a catch. The dataset I'm using in this example contains a single entry per ticker per day. It means that the "average open for the day" is exactly equal to the "daily open", right? If the dataset were more granular, it would be definitely an issue.

A much more valuable information for most readers though (myself included) would be the correct way to set up Cube.js to generate an accurate candlestick chart

The trick is to aggregate all open or close values (in Postgres, you can use ARRAY_AGG; in ClickHouse, there's groupArray), sort them by date, and pick the first element.

Here's a working example:

cube(`Stocks`, {
  sql: `
    SELECT 'INTC' AS Ticker, '2021-11-02T10:00:00Z'::timestamp AS Date,  1 AS Open, 10 AS Close, 15 AS High,  5 AS Low, 1000 AS Volume UNION
    SELECT 'INTC' AS Ticker, '2021-11-02T11:00:00Z'::timestamp AS Date, 10 AS Open, 20 AS Close, 25 AS High, 10 AS Low, 1500 AS Volume
  `,

  measures: {
    open: { sql: `(ARRAY_AGG(${CUBE}.Open ORDER BY ${CUBE}.Date ASC))[1]`, type: `number`, format: `currency` },
    close: { sql: `(ARRAY_AGG(${CUBE}.Close ORDER BY ${CUBE}.Date DESC))[1]`, type: `number`, format: `currency` },
    high: { sql: `${CUBE}.High`, type: `max`, format: `currency` },
    low: { sql: `${CUBE}.Low`, type: `min`, format: `currency` },
  },

  dimensions: {
    ticker: { sql: `${CUBE}.Ticker`, type: `string` },
    date: { sql: `${CUBE}.Date`, type: `time` },
  },
});
Enter fullscreen mode Exit fullscreen mode

The query yields correct results:

I hope it helps :–)

Collapse
 
neskodi profile image
Sergio Neskodi

Thanks Igor! Totally makes sense now. I didn't know that your source dataset contained only one candle per day. In this case, taking the average is fully reasonable.

I imagined something like my dataset where I have one-minute candles and I must aggregate them by day.

Also, thank you for taking time to expand your answer with examples of aggregation. Very useful indeed!