Modern Data Stack Information Summary - October 2025
This article is an English translation of the original Japanese version: https://dev.classmethod.jp/articles/modern-data-stack-info-summary-20251001/
Hello, this is Sagara.
As a consultant specializing in Modern Data Stack, I observe that the Modern Data Stack ecosystem generates a tremendous amount of information daily.
Among this wealth of information, this article summarizes the Modern Data Stack-related updates that caught my attention over the past two weeks.
Disclaimer: This is not an exhaustive list of all the latest updates for the mentioned products. The information included here is based on my personal judgment and interests.
Modern Data Stack General
Launch of "Open Semantic Interchange (OSI)"
Snowflake, Salesforce, dbt Labs, and other companies have announced the launch of an open-source initiative called "Open Semantic Interchange (OSI)" to promote data utilization for AI.
This initiative aims to build a common semantic data framework by standardizing fragmented Semantic Layer definitions that vary across different products through a vendor-neutral open specification.
The following vendors are listed as Launch Partners:
Below are the press releases from Snowflake and Salesforce regarding this announcement:
https://www.salesforce.com/blog/agentic-future-demands-open-semantic-layer/
While other participating products have also published blogs about this announcement, I found Select Star's post particularly interesting. As shown in the figure quoted from their blog, if this can be realized, Select Star could act as a hub to coordinate Semantic Layer definitions with BI tools not participating in the Open Semantic Interchange initiative, which I find exciting.
https://www.selectstar.com/resources/snowflake-ai-ready-semantic-model
"Everyone's Strongest Data Platform Architecture Vol. 5 - All-Star Special!!" Event Held
On September 25th, "Everyone's Strongest Data Platform Architecture Vol. 5 - All-Star Special!!" was held.
https://datatech-jp.connpass.com/event/360596/
The event had over 100 in-person attendees and more than 500 online participants. You can get a sense of the event's excitement by checking the hashtag "みん強" (Min-Kyō) at the following link:
https://x.com/hashtag/%E3%81%BF%E3%82%93%E5%BC%B7?src=hashtag_click
Below are links to the presentation materials from each speaker that I was able to find:
https://speakerdeck.com/tenajima/data-vaultwoyong-itamarutipurodakutonotamenodetaji-pan-kai-fa
https://speakerdeck.com/pei0804/revops-practice-learned
https://speakerdeck.com/foursue/20250924-lt2ben-yaru
https://speakerdeck.com/genshun9/minqiang-nokoremadetokorekara
Data Extract/Load
Airbyte
Airbyte 2.0 Released
Airbyte has released version 2.0, marking a major version upgrade. (The OSS version has not yet released 2.0.)
https://airbyte.com/blog/airbyte-2-0
As quoted from the links above, the following features have been released:
- Enterprise Flex: An architecture that separates the control plane and data plane, providing a hybrid model where management is done in the cloud while actual data remains within the customer's infrastructure
- Data Activation: A feature that directly syncs insights from data warehouses to business applications like Salesforce and HubSpot. This allows the Reverse ETL process to be completed within the platform
- Speed: Connector architecture has been redesigned to improve data sync speed by 4-10x. For example, MySQL to S3 sync is 4.7x faster, and Postgres to S3 is 12x faster
-
New Pricing Plans: A new plan structure tailored to team growth stages. The "Capacity Based Pricing" introduced for Pro plans and above is particularly notable, as it's based on required parallel processing capacity (Data Workers) rather than data transfer volume
- Core (formerly OSS): Free open-source version
- Standard (formerly Cloud): Pay-as-you-go managed service
- Pro (formerly Teams): Capacity-based pricing with governance features like RBAC and SSO
- Enterprise Flex: All Pro features plus the ability to deploy data planes anywhere—cloud, multi-cloud, or on-premises
- Self-Managed Enterprise: Fully self-managed enterprise version for organizations with strict security requirements
Data Warehouse/Data Lakehouse
Snowflake
FILE Data Type Generally Available
The FILE data type for handling unstructured data in Snowflake is now generally available.
This enables confident use of generative AI with Cortex AI SQL for images and document files!
https://docs.snowflake.com/en/release-notes/2025/other/2025-09-25-file-data-type-ga
Cortex Analyst Feature Enhancements
Cortex Analyst has received functional updates with two new features added. Derived metrics is a capability that other Semantic Layers already had, and since actual business often requires calculations using multiple metrics, this is a welcome addition!
- Private facts and metrics: A feature that defines metrics in the Semantic Model but prevents end users from directly querying these metrics (primarily intended for metrics used only in Derived metrics)
- Derived metrics: A new type of metric that allows defining metrics based on calculations between multiple metrics
https://docs.snowflake.com/en/release-notes/2025/other/2025-09-30-semantic-model-improvements
dbt Projects on Snowflake Now Supports docs generate
dbt Projects on Snowflake received a silent update that now enables docs generate functionality.
While I haven't tested it yet, this should allow the execute dbt project
command to perform docs generate when hosting docs with GitHub Actions, eliminating the need to rewrite profiles.yml
for dbt Core!
https://x.com/SS_chneider/status/1973154146976145839
Claude Sonnet 4.5 Now Available in Snowflake
Claude Sonnet 4.5 is now available within Snowflake. The official documentation doesn't mention it yet.
Additionally, it's accessible in unsupported regions by enabling cross-region inference.
https://www.snowflake.com/en/blog/cortex-ai-claude-sonnet-4-5/
SELECT's Summary Article on Snowflake Features Released in Summer 2025
SELECT has published a summary article on Snowflake features released in summer 2025.
https://select.dev/posts/snowflake-summer-2025-product-updates
Best Practices Article for Combining Snowflake × Power BI
phData has published a best practices article for combining Snowflake × Power BI.
The article mainly covers the following topics:
- Use Power BI's native Snowflake Connector
- Carefully select connection mode (Import, DirectQuery, or Composite) based on use case
- Properly model data, including adopting star schema
- Configure Microsoft Entra SSO for Snowflake
- Use appropriate Azure VMs for gateways
- Minimize distance between Snowflake and Power BI data centers
- Increase concurrent query limits for data models
- Leverage AI features like Copilot
https://www.phdata.io/blog/how-to-optimize-power-bi-and-snowflake-for-advanced-analyitcs/
BigQuery
Column-Level Lineage Now Available in Dataplex
As a new feature in Dataplex, column-level lineage viewing is now available (generally available).
https://cloud.google.com/dataplex/docs/release-notes#September_29_2025
https://cloud.google.com/dataplex/docs/lineage-views#column-level-lineage
Array Unnesting Feature Using Gemini Released
A feature using Gemini that can expand each element of an array into independent rows has been released.
https://cloud.google.com/bigquery/docs/release-notes#September_29_2025
https://cloud.google.com/bigquery/docs/data-prep-get-suggestions#unnest-arrays
Summary Article on New BigQuery SQL Features
yu yamada from Google Cloud has published an article summarizing five new features related to BigQuery SQL, including UNION based on column names and simplified array operations.
https://zenn.dev/google_cloud_jp/articles/3b20a94df7624e
Databricks
Databricks One in Public Preview
"Databricks One," a simple user interface designed for business users, has entered public preview.
https://docs.databricks.com/aws/ja/workspace/databricks-one
As shown in the figure below, it features a UI where you can ask questions about data in natural language and directly link to related dashboards.
Lakeflow Pipelines Editor in Public Preview
Databricks has released "Lakeflow Pipelines Editor," a new IDE for developing and debugging ETL pipelines, as a public preview.
https://docs.databricks.com/aws/en/dlt/dlt-multi-file-editor
As shown in the figure quoted from the link above, it's not just for editing pipeline code but also allows viewing dependencies between tables.
OpenAI GPT-5 and Claude Sonnet 4.5 Now Available in Databricks
While these are separate announcements, both GPT-5 and Sonnet 4.5 are now available within Databricks.
https://www.databricks.com/blog/run-openai-models-directly-databricks
https://www.databricks.com/blog/claude-sonnet-45-here
MotherDuck/DuckDB
DuckDB ducklake Extension and DuckLake 0.3 Released
The DuckDB ducklake extension and DuckLake 0.3 have been released. Using the ducklake extension requires DuckDB v1.4.0.
The main updates appear to be data copying between DuckLake and Iceberg using DuckDB's iceberg extension, and using the MERGE statement released in DuckDB v1.4.0 through the ducklake extension.
https://duckdb.org/2025/09/17/ducklake-03.html
MotherDuck Announces First European Cloud Region in Private Preview
MotherDuck has announced its first European cloud region as a private preview.
This new region runs on AWS eu-central-1
, with official release planned for this fall.
https://motherduck.com/blog/motherduck-in-europe/
Business Intelligence
Looker
Looker Accessible from Gemini CLI
A feature to access Looker has been released as an extension for Gemini CLI.
It appears you can check available Explores, confirm dimensions and measures available in specified Explores, and even create Looks and dashboards in Looker.
https://cloud.google.com/looker/docs/release-notes#September_23_2025
https://github.com/gemini-cli-extensions/looker
Data Activation (Reverse ETL)
Hightouch
Dashboards Now Available in Hightouch
As a new feature in Hightouch, functionality to consolidate multiple charts into dashboards has been released.
This should be useful for cases where you want to check everything in Hightouch, such as dashboards for confirming campaign performance.
https://changelog.hightouch.io/
https://hightouch.com/docs/campaign-intelligence/dashboards
Data Orchestration
Airflow
Airflow 3.1 Released
Airflow's latest version 3.1 has been released.
https://github.com/apache/airflow/releases/tag/3.1.0
Astronomer has published a blog post summarizing the added features.
It appears that improvements to AI workflow support, updates to a React-based UI interface, and DAG favorites functionality have been added.
https://www.astronomer.io/blog/introducing-apache-airflow-3-1/
Top comments (0)