DEV Community

The GeekNarrator

Duckdb Internals with Mark Raasveldt

Deep Dive into DuckDB with CTO Mark Raasveldt Decode the insights of databases with Geek Narrator podcast. In this episode, host Kaivalya Apte converses with Mark Raasveldt, the CTO of DuckDB labs, discussing his journey from being a database enthusiast to creating DuckDB. They delve into how DuckDB, an analytical database, differs from other databases, the design decisions, its internal mechanisms, and much more. The episode also highlights the advantages of DuckDB in analytics, the motivation behind its ACID compliance, and how DuckDB handles ingestion, transaction isolation, mutations, and queries. Join in to learn how your data workloads can benefit from DuckDB. 00:00 Introduction and Guest Introduction 00:44 Guest's Journey into Databases 03:40 The Birth of DuckDB 04:30 Challenges with Existing Databases 05:15 Technical Difficulties 05:16 Why Existing Databases Fall Short for Data Scientists 09:16 The Role of SQLite and Its Limitations 13:59 Defining DuckDB 16:48 Comparing DuckDB with Other Analytical Databases 19:50 Deployment Models for DuckDB 22:47 Data ingestion into DuckDB 22:51 Data Ingestion in DuckDB 30:24 How DuckDB Handles Updates and Mutations 35:35 Understanding Column Granularity and Rewrites 35:58 Implications of Compression on Data Updates 36:38 Trade-offs in Row Group Size 37:32 Benefits of Column Storage Model 38:15 Row Groups and Parallelism 39:02 Choosing Row Group Size: An Experimental Approach 40:00 Handling Data Type Changes in Columns 41:00 Internal Data Structures in DuckDB 42:21 Reading Data: Point Lookups, Aggregations, and Joins 47:22 Optimization for Full Table Scans 53:49 Understanding ACID Compliance in DuckDB 55:49 Multi-Version Concurrency Control (MVCC) in DuckDB 59:50 Use Cases and Applications of DuckDB 01:01:42 The Story Behind DuckDB's Name 01:02:34 Future Vision for DuckDB References: DuckDB: https://duckdb.org/ Mark's blog: https://mytherin.github.io/ =============================================================================== For discount on the below courses: Appsync: https://appsyncmasterclass.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Testing serverless: https://testserverlessapps.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Production-Ready Serverless: https://productionreadyserverless.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Use the button, Add Discount and enter "geeknarrator" discount code to get 20% discount. =============================================================================== Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! Cheers, The GeekNarrator

Episode source