DEV Community

Francis
Francis

Posted on

Inside DuckDB: Deep Dive into DuckDB MetaPipeline

Recently, I’ve been interested in DuckDB’s code and decided to take a look. DuckDB is a C++ open-source project, great for beginners to learn, especially those in the database industry.

This section will delve into the MetaPipeline of DuckDB, using an example of joining two tables.

For example:

SELECT name, score FROM student st INNER JOIN score s ON st.id = s.stu_id;
The final plan for this SQL is:

➜ debug git:(master) ./duckdb stu
v0.8.1-dev416 9d5158ccd2
Enter ".help" for usage hints.
D EXPLAIN SELECT name, score FROM student st INNER JOIN score s ON st.id = s.stu_id;

┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐                             
│         PROJECTION        │                             
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             
│            name           │                             
│           score           │                             
└─────────────┬─────────────┘                                                          
┌─────────────┴─────────────┐                             
│         HASH_JOIN         │                             
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             
│           INNER           │                             
│        stu_id = id        ├──────────────┐              
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │              │              
│           EC: 4           │              │              
│          Cost: 4          │              │              
└─────────────┬─────────────┘              │                                           
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         SEQ_SCAN          ││         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           score           ││          student          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           stu_id          ││             id            │
│           score           ││            name           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 4           ││           EC: 3           │
└───────────────────────────┘└───────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Before diving into this article, let’s ask a few questions:

  • What exactly happens with MetaPipeline?
  • How many plans are there in the above plan?
  • What is a Pipeline?
  • What is a MetaPipeline?

Now, let’s get into the main content.

Inside DuckDB: Deep Dive into DuckDB MetaPipeline

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more →

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more