DEV Community

Cover image for On the Andy Pavlo's DB review
Pranav Aurora
Pranav Aurora

Posted on

1

On the Andy Pavlo's DB review

The Andy Pavlo yearly review has a massive chokehold amongst the DB community. It's like the oscars of databases?

This year was a pretty special review, our project, pg_mooncake was mentioned.

Here are some thoughts from reading the review, and what we've learnt at Mooncake Labs in our first 121 days of existence.

1. Yes, we're guilty of 'Shoving Ducks everywhere'...

Our first project, pg_mooncakeadded a native columnstore table (Iceberg) to Postgres for 1000x faster analytics.

While, there are quite a few extensions on the market bringing DuckDB into Postgres; we focussed on making the columnar storage feel like a regular Postgres table. Things like transactional writes, triggers etc. See our architecture

To us, it feels like the final touch to complete the 'analytics in PG experience'. Almost a decade later from early projects like Citus, we're optimistic that analytics in Postgres will be a reality.

2. 2024 felt like year of the Data Lake.

Snowflake vs Databricks. elastic's 'search lake' (lol). s3 tables.

What I mean by the 'lake': serverless workloads on data in object storage.

In 2024, analytic (DatabricksSQL, Snowflake Iceberg) & Vector Search (Turbo Puffer, Lance) moved to the lake.

In 2025, I reckon there will be more workloads (lookups, full-text) running in this manner.

3. As for vector search...

Agents are everywhere; and yet vector search wasn't a topic at all... Couple thoughts.

  1. Just use Postgres
  2. If you have big 'data', LanceDB / Turbopuffer
  3. Vector search workloads moving toward full-text workloads. Something we've noticed a lot. Hybrid Search results are often ~95%+ full-text results.

4. As for AI / Agents

A lot of the AI companies we spend time with are each building a'systems of record' for each customer... And they're all storing structured/unstructured data in a 'Lake'. See Rox's architecture

Another trend we've seen: LLMs being used for data processing and ML tasks (feature extraction, classifiers).

It kind of makes sense too… on small data. Product engineers can use LLMs out of the box, instead of picking/training/deploying ML models for each task.

I am super super curious how Snowflake, Databricks and Redshift AI functions will play out this year.

2025 will be exciting.

Pranav

API Trace View

How I Cut 22.3 Seconds Off an API Call with Sentry 👀

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more →

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay