DEV Community

The GeekNarrator

Tiered Storage implementation by StarTree (Apache Pinot) with Neha Pawar

In this podcast I have invited Neha Pawar, who is one of the Founding Engineers are StarTree (the company powering Apache Pinot). We talked about how StarTree has implemented Tiered storage and how it differs from other available implementations.  Note: Currently tiered storage is available only in StarTree’s Pinot and not available in the open source version. But its only about time.


Chapters: 00:00 Introduction 03:28 What does Tiered Storage mean? 05:51 How many tiers are typically supported? 07:30 Is it mainly about Cost Optimisation? How do I compare the cost savings vs performance hit? 15:41 What is mmap and how does it help? 16:45 How do I implement/approach Tiered Storage? What are the challenges? 23:00 What is Apache Pinot? When we say low latency, how low it is? 25:00 How is it implemented in StarTree (Apache Pinot)? 36:45 What happens when I query for more number of (or all) columns? How is that optimised? 47:10 What are the failure modes? 50:15 How can we test and validate Tiered Storage as a feature? 54:30 How would bloom filter false positives affect performance and correctness? 56:15 Can I move back my data from Cold storage to Hot Storage? 57:45 What other cloud storage services are supported other than S3? 58:35 What is the future of Tiered Storage?



Episode source