What We’ll Build
In this post, we’ll show how to:
Load NYC Taxi data via a REST API
Store it in DuckDB using DLT
Visualize the relationships using Cognee
Step 1 – Ingest Data with DLT
@dlt.resource(write_disposition="replace", name="zoomcamp_data")
def zoomcamp_data():
url = "https://us-central1-dlthub-analytics.cloudfunctions.net/data_engineering_zoomcamp_api"
response = requests.get(url)
df = pd.DataFrame(response.json())
df['Trip_Pickup_DateTime'] = pd.to_datetime(df['Trip_Pickup_DateTime'])
df['tag'] = pd.cut(
df['Trip_Pickup_DateTime'],
bins=[
pd.Timestamp("2009-06-01"),
pd.Timestamp("2009-06-10"),
pd.Timestamp("2009-06-20"),
pd.Timestamp("2009-06-30")
],
labels=["first_10_days", "second_10_days", "last_10_days"]
)
yield df[df['tag'].notnull()]
Step 2 – Run Pipeline to DuckDB
pipeline = dlt.pipeline(
pipeline_name="zoomcamp_pipeline",
destination="duckdb",
dataset_name="zoomcamp_tagged_data"
)
pipeline.run(zoomcamp_data())
Step 3 – Enrich and Visualize with Cognee
wait cognee.add(df_set1_json, node_set=["first_10_days"])
await cognee.add(df_set2_json, node_set=["second_10_days"])
await cognee.add(df_set3_json, node_set=["last_10_days"])
Result 🎉
Upload your notebook and see interactive graphs emerge from your dataset.
🧪 DuckDB + 🧵 DLT + 🧠Cognee = Magic!
Top comments (0)