When you're working with IOT or OT data you often want to look at a series of state transitions for sensors or devices and work out the amount of time spent in each state. Each device can only be in a single state at a time, so when a new state is started, the old one is closed off.
This is often used for billing or metering (think power meters, devices connected to a network, or even a complex state machine) and can be referred to as dwell time or connection time. In this case, the desired output is ( device_id, state, start_time, dwell_time
).
Let's consider the following table which tracks states for sensors.
CREATE TABLE testdata
(
sid INT NOT NULL,
state VARCHAR(1) NOT NULL,
ts TIMESTAMPTZ NOT NULL
);
We can populate the table with some test data:
-- Create some test data
INSERT INTO testdata (sid, state, ts) VALUES
(1, 'a', '2019-Nov-01 12:01:00'),
(1, 'a', '2019-Nov-01 12:02:00'),
(1, 'a', '2019-Nov-01 12:05:00'),
(1, 'b', '2019-Nov-01 12:10:00'),
(1, 'b', '2019-Nov-01 12:20:00'),
(2, 'a', '2019-Nov-01 12:20:00'),
(2, 'a', '2019-Nov-01 12:22:00'),
(1, 'a', '2019-Nov-01 13:00:00'),
(1, 'a', '2019-Nov-01 13:02:00'),
(1, 'a', '2019-Nov-01 13:06:00'),
(1, 'a', '2019-Nov-01 13:12:00'),
(1, 'a', '2019-Nov-01 14:00:00'),
(1, 'a', '2019-Nov-01 14:12:00'),
(1, 'a', '2019-Nov-01 14:14:00'),
(1, 'a', '2019-Nov-01 14:30:00'),
(1, 'a', '2019-Nov-01 14:35:00');
Now, you can solve this with plain SQL (I'll leave that as a exercise for the reader😉), but if you're using PostgreSQL + TimescaleDB then you can use one of the hyperfunctions which come with the timescaledb-toolkit
extension.
The final solution looks like this:
WITH states AS (
SELECT
(state_timeline(state_agg(ts, state))).*,
sid
FROM testdata
GROUP BY sid
)
SELECT
sid,
state,
start_time,
end_time - start_time dwell_time
FROM states
ORDER BY sid, start_time;
Which gives the following results:
sid state start_time dwell_time
1 a 2019-11-01 12:01:00+00 00:09:00
1 b 2019-11-01 12:10:00+00 00:15:00
1 a 2019-11-01 12:25:00+00 02:10:00
2 a 2019-11-01 12:20:00+00 00:02:00
The magic lies in the state_agg
aggregate, which tracks transitions between states within our grouping set (in this case the state column), and also in the state_timeline
function, which takes the output of state_agg
and creates a timeline of all state changes with the time they were entered and exited.
With that information in hand, it’s easy to work out the dwell_time
using end_time - start_time
. If you're impressed by this check out the other things hyperfunctions can do!
Top comments (0)