DEV Community

Okibaba
Okibaba

Posted on

Zoom camp rising wave workshop

This month I attended the risingwave workshop hosted by datatalks for the 2024 cohort of the zoomcamp data engineering course.

Risingwave is an open-source, cloud-native streaming database system designed for real-time analytics and event-driven applications. With risingwave you can handle streaming data using sql-like queries, making it very powerful for scenarios like; real-time monitoring, anomaly detection, event-driven architectures, and continuous data transformation.

As a quick demo of how sql-like risingwave is, lets create a materialized view to compute the average, min and max trip time between taxi zone as well as find the top 10 maximum trip time for a route pair taking into consideration trip direction (i.e trip from a to b is different from trip from b to a).Data we are working with is the New York openly available yellow taxi data.

CREATE MATERIALIZED VIEW mv_avg_min_max_trip_time AS 
SELECT 
    pu_zone.Zone AS pickup_zone, 
    do_zone.Zone AS dropoff_zone, 
    AVG(EXTRACT(EPOCH FROM (trip_data.tpep_dropoff_datetime - trip_data.tpep_pickup_datetime)) / 60) AS avg_trip_duration, 
    MIN(EXTRACT(EPOCH FROM (trip_data.tpep_dropoff_datetime - trip_data.tpep_pickup_datetime)) / 60) AS min_trip_duration, 
    MAX(EXTRACT(EPOCH FROM (trip_data.tpep_dropoff_datetime - trip_data.tpep_pickup_datetime)) / 60) AS max_trip_duration 
FROM 
    trip_data 
JOIN 
    taxi_zone pu_zone ON trip_data.PULocationID = pu_zone.location_id 
JOIN 
    taxi_zone do_zone ON trip_data.DOLocationID = do_zone.location_id 
GROUP BY 
    pu_zone.Zone, 
    do_zone.Zone; 

SELECT 
    pickup_zone, 
    dropoff_zone, 
    max_trip_duration
FROM 
    mv_avg_min_max_trip_time 
ORDER BY 
    max_trip_duration DESC 
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

Sources
datatalks risingwave workshop link
https://github.com/risingwavelabs/risingwave-data-talks-workshop-2024-03-04/blob/main/workshop.md

Data talks data engineering course repo (week 6)
https://github.com/DataTalksClub/data-engineering-zoomcamp

Top comments (0)