Sagara

Posted on Jul 23

Personal Picks: Data Product News (July 23, 2025)

#dataengineering #snowflake #dbt #iceberg

※This is an English translation of the following article:
https://dev.classmethod.jp/articles/modern-data-stack-info-summary-20250723/

Hi, this is Sagara.

As a consultant in the Modern Data Stack field, I see a vast amount of information being released every day.

With so much happening, I'd like to use this article to summarize the Modern Data Stack-related news that has caught my eye over the past couple of weeks.

*Disclaimer: This is not an exhaustive list of all product updates. It only includes information that I found interesting based on my personal judgment and bias.

General Modern Data Stack

"Data Engineering Study #30 - Celebrating 30 Sessions! A Look Back and Forward at Data Engineering Tech and Careers with Past Speakers" was held.

Data Engineering Study #30 has taken place. To mark this 30th milestone, we welcomed back 10 past speakers for a 5-minute lightning talk session where they discussed the "aftermath" of the technical initiatives and their own careers they had previously talked about.

https://forkwell.connpass.com/event/357942/

For more details on Data Engineering Study #30, the following event report is very helpful. Please check it out as well.

https://zenn.dev/shinyaa31/articles/aaadbefa457197

Also, Yuzutaso, who has been the advisor for Data Engineering Study, has stepped down after this #30 event, and a new advisor team of the following three members has been formed. I, Sagara, have also become a member of this new advisor team!

As someone who has learned a lot about the technology and careers in the data engineering world by watching Data Engineering Study, I will do my best to make it even more exciting than before!

The next event, #31, will feature lightning talks and a public planning meeting by the three members of the advisor team. Please join us!

https://forkwell.connpass.com/event/363198/

Summer Data Engineering Roadmap

MotherDuck's blog has published an article outlining a roadmap for how to learn data engineering.

It's broken down into three levels—Foundation, Core, and Advanced—making it an easy-to-understand guide on where to start.

https://motherduck.com/blog/summer-data-engineering-roadmap/

Additionally, MotherDuck's blog has previously published articles summarizing which tools are useful for data engineering. These are also systematically organized and serve as a great reference.

https://motherduck.com/blog/data-engineering-toolkit-essential-tools/

https://motherduck.com/blog/data-engineering-toolkit-infrastructure-devops/

Databricks vs. Snowflake: The Final Chapter

A blog post from Orchestra explains their perspective that the rivalry between Databricks and Snowflake is over, and both companies are moving into new markets with different strategies (developer-focused vs. business-focused).

The following is just a summary of the article's content, but it outlines these predictions:

Databricks → To a Developer Platform
- Leveraging its strong developer community, it might aim to provide the best development environment on top of hyperscalers like Azure.
Snowflake → To the Business App Market
- Utilizing its customer base of business users, it might aim to become a "composable" application platform that could replace giant SaaS products like Salesforce.

https://www.getorchestra.io/blog/databricks-vs-snowflake-the-final-chapter

Data Extract/Load

Fivetran

Fivetran's Comparison Article with Airbyte

Fivetran's official blog has published an article comparing their service with Airbyte.

It's important to consider that this article is "written by Fivetran," and I felt some of the descriptions of Airbyte were a bit dated. (For example, there was no mention of Airbyte's Schema Change Management or Connector Builder).

However, it's still a useful reference for a rough understanding of the differences between the two companies.

https://www.fivetran.com/blog/fivetran-vs-airbyte-features-pricing-services-and-more

Data Warehouse/Data Lakehouse

Snowflake

"QUERY_INSIGHTS" View Released, Providing Analysis and Improvement Suggestions for Queries Executed in Snowflake

A new ACCOUNT_USAGE view, QUERY_INSIGHTS, has been released. It automatically analyzes the execution of queries within Snowflake and stores the results, identifying areas that may be impacting performance.

https://docs.snowflake.com/en/release-notes/2025/other/2025-07-03-query-insights

I tried it out myself, and I found it excellent that Snowflake handles the detection automatically. All the user has to do is check the QUERY_INSIGHTS view, review the findings, and take action. It simplifies the process greatly!

https://dev.classmethod.jp/articles/snowflake-query-insights-view/

Official Snowflake MCP Server with Cortex AI Functionality Released

Snowflake has officially released an MCP Server that supports Cortex AI features. It currently supports Cortex Search and Cortex Analyst.

https://github.com/Snowflake-Labs/mcp

An official blog post about this release also mentioned that Snowflake-managed MCP servers, where Snowflake manages the infrastructure, are planned for a future release.

https://www.snowflake.com/en/blog/mcp-servers-unify-extend-data-agents/

BigQuery

"source_column_match" and "null_markers NULL" Options Added to CREATE EXTERNAL TABLE and LOAD DATA

BigQuery has added source_column_match and null_markers NULL options to CREATE EXTERNAL TABLE and LOAD DATA, which are used for querying and loading data from external storage.

https://cloud.google.com/bigquery/docs/release-notes#July_22_2025

Databricks

RSS Feed for Databricks Release Notes

Databricks has started providing an RSS feed for its release notes, which includes the latest product information and other feature release notes.

https://docs.databricks.com/aws/en/release-notes/#feed

Recursive CTEs are in Public Preview in Databricks

As a new feature in Databricks, Recursive CTEs are now in public preview.

https://www.databricks.com/blog/introducing-recursive-common-table-expressions-databricks

MotherDuck/DuckDB

MotherDuck Announces New "Mega" and "Giga" Instances for Large, Complex Data Processing

To meet the demand for more intensive data processing that exceeds the capabilities of the existing "Jumbo" ducklings, MotherDuck has announced two new, larger instance sizes: "Mega" and "Giga."

According to the article, Mega is designed for large-scale workloads, while Giga is intended for extremely complex and massive transformation jobs for which there are no other alternatives.

https://motherduck.com/blog/announcing-mega-giga-instance-sizes-huge-scale/

For a list of instance types offered by MotherDuck, please see the official documentation below.

https://motherduck.com/docs/about-motherduck/billing/instances/

Data Transform

dbt

Build Iceberg Tables via BigLake Metastore in dbt-bigquery

Starting with the dbt-bigquery 1.10 release, it is now possible to build Iceberg tables via BigLake Metastore.

https://www.getdbt.com/blog/dbt-supports-apache-iceberg-tables-bigquery

https://docs.getdbt.com/docs/mesh/iceberg/bigquery-iceberg-support

Although it's still a preview feature, BigLake Metastore also provides Apache Iceberg REST Catalog functionality. This means it's now theoretically possible to "build an Iceberg table using dbt with BigLake Metastore as the catalog, and then query that Iceberg table from an external engine." (I'd love to try this out sometime...)

https://cloud.google.com/bigquery/docs/blms-rest-catalog?hl=en

Business Intelligence

Tableau

Analyzing Pro Wrestling Match Videos with Generative AI on Vertex AI and Tableau

rtama published an article on converting pro wrestling match videos into structured data and analyzing it with Tableau.

This article is an excellent reference for understanding what kind of prompts can be used to convert video into structured data suitable for analysis. (The choice of pro wrestling as a subject is also brilliant!)

https://zenn.dev/cavernaria/articles/ec04775eec5c4b

Lightdash

Lightdash Now Supports dbt Fusion and dbt 1.10

Lightdash has announced support for dbt Fusion and dbt 1.10.

https://changelog.lightdash.com/we-now-support-dbt-fusion-and-dbt-1-10-projects-319440

One key point to note is that from dbt 1.10 onwards, meta: must be defined under config:. For Lightdash users, migrating this part will be a significant hurdle.

Recognizing this challenge, Lightdash has released a Migration Guide and a migration tool called MetaMove.

https://docs.lightdash.com/dbt-guides/dbt-1.10-migration

https://docs.lightdash.com/dbt-guides/dbt-fusion-migration

Omni

Official Omni MCP Server Released

Omni has released its official MCP Server. This enhancement expands the potential of Omni, as it allows for connecting the semantic models built in Omni with other tools to obtain more accurate answers.

https://omni.co/blog/introducing-omnis-mcp-server

https://docs.omni.co/docs/ai/mcp