DEV Community

Mona Hamid
Mona Hamid

Posted on

Build a Data-to-Graph Pipeline with DLT, DuckDB & Cognee 🧠📈

What We’ll Build
In this post, we’ll show how to:

Load NYC Taxi data via a REST API

Store it in DuckDB using DLT

Visualize the relationships using Cognee

Step 1 – Ingest Data with DLT

@dlt.resource(write_disposition="replace", name="zoomcamp_data")
def zoomcamp_data():
    url = "https://us-central1-dlthub-analytics.cloudfunctions.net/data_engineering_zoomcamp_api"
    response = requests.get(url)
    df = pd.DataFrame(response.json())
    df['Trip_Pickup_DateTime'] = pd.to_datetime(df['Trip_Pickup_DateTime'])

    df['tag'] = pd.cut(
        df['Trip_Pickup_DateTime'],
        bins=[
            pd.Timestamp("2009-06-01"),
            pd.Timestamp("2009-06-10"),
            pd.Timestamp("2009-06-20"),
            pd.Timestamp("2009-06-30")
        ],
        labels=["first_10_days", "second_10_days", "last_10_days"]
    )
    yield df[df['tag'].notnull()]
Enter fullscreen mode Exit fullscreen mode

Step 2 – Run Pipeline to DuckDB

pipeline = dlt.pipeline(
    pipeline_name="zoomcamp_pipeline",
    destination="duckdb",
    dataset_name="zoomcamp_tagged_data"
)
pipeline.run(zoomcamp_data())
Enter fullscreen mode Exit fullscreen mode

Step 3 – Enrich and Visualize with Cognee

wait cognee.add(df_set1_json, node_set=["first_10_days"])
await cognee.add(df_set2_json, node_set=["second_10_days"])
await cognee.add(df_set3_json, node_set=["last_10_days"])
Enter fullscreen mode Exit fullscreen mode

Result 🎉
Upload your notebook and see interactive graphs emerge from your dataset.

🧪 DuckDB + 🧵 DLT + 🧠 Cognee = Magic!

Top comments (0)