※This article is an English translation of the original Japanese article: https://dev.classmethod.jp/articles/modern-data-stack-info-summary-20250528/
Hi, this is Sagara.
As a consultant specializing in the Modern Data Stack, I see a lot of information being shared daily in this field.
With so much news out there, I've decided to summarize some of the Modern Data Stack-related information that caught my eye over the past couple of weeks in this article.
Note: This article doesn't cover all the latest updates for every product mentioned. I'm only including information that I found particularly interesting, based on **my personal perspective and selection.
General Modern Data Stack News
Salesforce to Acquire Informatica
Salesforce issued a press release announcing its acquisition of Informatica.
By integrating Informatica's extensive data infrastructure-related capabilities, such as data integration, cataloging, and MDM management, onto the Salesforce platform, it seems that much more will be achievable directly within the Salesforce ecosystem.
https://www.salesforce.com/news/press-releases/2025/05/27/salesforce-signs-definitive-agreement-to-acquire-informatica/
Below is an article from the CEO of Orchestra in response to this acquisition. The image below, quoted from the article, shows Salesforce's acquisition history over the past nine years or so. They've been acquiring many companies at an incredible pace...
https://dataopsleadership.substack.com/p/breaking-salesforce-buys-informatica
Data Warehouse/Data Lakehouse
Snowflake
Snowflake Openflow Released
Snowflake has released a new feature, Openflow. (As of May 28, 2025, it is in Public Preview.)
Openflow is a service based on Apache NiFi and can be used for ingesting and transforming data from various data sources.
https://docs.snowflake.com/en/release-notes/2025/other/2025-05-20-openflow
MFA Now Supports TOTP and Passkeys
As a new Snowflake feature, TOTP and passkeys are now available for MFA authentication. (I was a bit worried when it disappeared from the release notes once, but I'm glad it has been re-released!)
https://docs.snowflake.com/en/release-notes/2025/other/2025-05-23-mfa
We've also written a blog post about this (in Japanese), so please check it out.
https://dev.classmethod.jp/articles/snowflake-snowsight-time-based-one-time-password/
"Cost Anomalies" Feature Released for Automatic Cost Anomaly Detection and Notification at Account/Organization Level
Snowflake has released "Cost anomalies," a new feature that automatically detects and notifies about cost anomalies at the account and organization levels. (As of May 28, 2025, it is in Public Preview.)
https://docs.snowflake.com/en/release-notes/2025/other/2025-05-16-cost
I tried it out myself, and it's a convenient feature that allows you to check the details of anomalies while viewing past cost situations in a graph, and set up email alerts for detected anomalies.
We've also written a blog post about this (in Japanese), so please check it out.
https://dev.classmethod.jp/articles/snowflake-cost-anomalies-pupr/
terraform-provider-snowflake Roadmap Updated
The roadmap for terraform-provider-snowflake has been updated for the first time since its GA release.
Key focus areas for the coming months include PAT, SPCS, Listing, Account management features, and a PoC to introduce Snowflake REST API into the Provider.
https://github.com/snowflakedb/terraform-provider-snowflake/blob/main/ROADMAP.md
How to Deploy Streamlit in Snowflake Apps Using dbt
A blog post from phData summarized how to deploy Streamlit in Snowflake apps using dbt.
While I knew it was possible to deploy via SQL and could be done by creating macros, this particular approach was new to me and quite interesting.
https://www.phdata.io/blog/how-to-deploy-snowflake-streamlit-apps-the-easiest-method-explained-using-dbt/
BigQuery
Announcing "GENERATE_TABLE" Function to Write Recognized Image Information Directly to a Table
BigQuery has announced a new feature, the GENERATE_TABLE
function, which allows you to write information recognized from images directly into a table.
https://cloud.google.com/blog/products/data-analytics/convert-ai-generated-unstructured-data-to-a-bigquery-table?hl=en
The following is quoted from the blog post mentioned above. By defining an External Table for GCS where images are stored and an LLM Model object beforehand, you can execute a query to record information obtained from images into a table.
Onehouse
Announcing New Query Engine "Quanton"
Onehouse has announced "Quanton," a new query engine available for the Onehouse Compute Runtime.
It supports Apache Spark and SQL and is mentioned to be more cost-effective than using compute resources from EMR, Snowflake, or Databricks.
https://www.onehouse.ai/blog/announcing-spark-and-sql-on-the-onehouse-compute-runtime-with-quanton
MotherDuck/DuckDB
Announcing "DuckLake," a New Lakehouse Format Where Metadata Management is Handled by the Database
DuckLake was announced on the official DuckDB blog.
Recognizing the complexity of file-based metadata management in recent formats like Iceberg and Delta Lake, DuckLake is a product implemented with the approach of having an SQL database handle the entire metadata management layer, including what would be the catalog layer in Iceberg.
The following four benefits of DuckLake are mentioned in the blog post:
- Simplicity
- To run DuckLake on a laptop, you just need to install DuckDB and use the DuckLake extension (in this case, DuckDB's local file handles catalog management).
- No Avro or JSON files; everything is controllable via SQL.
- Scalability
- An architecture that separates storage, compute, and metadata management.
- Speed
- Unlike traditional Open Table Formats, file I/O is not required.
- Reduces the number of files written for small changes and can handle concurrent modifications.
- Features
- Operable via SQL, supports ACID-compliant transactions, and allows adding/deleting columns and changing data types.
- Data and delete files written to storage by DuckLake are compatible with Iceberg, allowing for metadata-only migration.
- DuckLake compute nodes have been simultaneously released as a DuckDB extension (available from DuckDB v1.3.0). https://duckdb.org/2025/05/27/ducklake Below are the official DuckLake website and repository: https://ducklake.select/ https://github.com/duckdb/ducklake
DuckDB 1.3.0 Released
The latest version of DuckDB, 1.3.0, has been released.
The caching feature for external file queries and the ability to directly query parquet, csv, and json files using CLI commands particularly caught my attention.
https://duckdb.org/2025/05/21/announcing-duckdb-130.html
Data Transform
dbt
Documentation for Hybrid Projects Published
Documentation for dbt's new Hybrid projects feature has been published. (As of May 28, 2025, it is available in Private Beta.)
By predefining environment variables related to dbt Cloud, artifacts such as manifest.json
can apparently be automatically uploaded to dbt Cloud when running commands like dbt run
with dbt Core.
https://docs.getdbt.com/docs/deploy/hybrid-projects
Data Application
Streamlit
Article Summarizing Best Practices for Building Gen AI Apps with Streamlit
An article summarizing best practices for building Gen AI apps with Streamlit was published on the official Streamlit blog.
It covers a wide range of topics, including directory structure, API key storage, context maintenance, and cache utilization.
https://blog.streamlit.io/best-practices-for-building-genai-apps-with-streamlit/
Business Intelligence
Looker
Some Looker Permissions Now Apply to Studio in Looker
Some Looker permissions now apply to Studio in Looker. (Preview)
https://cloud.google.com/looker/docs/release-notes#May_20_2025
As mentioned in the documentation below, permissions like explore
and see_user_dashboards
will apply, enabling users to access only authorized Explores and dashboards in Studio in Looker.
https://cloud.google.com/looker/docs/overview-of-studio-in-looker-permissions
Looker 25.8 Release Notes Published
The release notes for Looker 25.8 have been published.
The updates that particularly caught my attention were the Code Interpreter in Conversational Analytics and the ability to apply gemini_in_looker
permissions to specific models.
https://cloud.google.com/looker/docs/release-notes#May_14_2025
Power BI
Blog Post Summarizing May 2025 Updates
A blog post summarizing the May 2025 updates for Power BI was published on Microsoft's official blog.
Although I'm not very familiar with Power BI myself, the updates seemed to center around Copilot features specific to Power BI and the definition of Semantic Models for AI.
https://powerbi.microsoft.com/en-us/blog/power-bi-may-2025-feature-summary/
The future roadmap for Microsoft Fabric, including Power BI, will apparently be published on the following page:
https://roadmap.fabric.microsoft.com/?product=powerbi
Hex
Announced Acquisition of Hashboard (Information from April 30, 2025)
Although this news is from April 30, 2025, Hex announced the acquisition of Hashboard.
https://hex.tech/blog/welcoming-hashboard/
I hadn't heard of Hashboard myself, but it's a BI tool where you define a data model beforehand and then build dashboards.
https://hashboard.com/
Data Catalog
Secoda
Summary of Secoda's April 2025 Updates
The page summarizing Secoda's April 2025 updates has been updated on the official Secoda website.
https://www.secoda.co/product-news/april-2025
Personally, the following updates particularly caught my attention:
- Secoda's native app released on Snowflake Marketplace https://www.secoda.co/blog/secoda-snowflake-native-app-marketplace https://app.snowflake.com/marketplace/listing/GZTSZ113XX0X/secoda-secoda
- Omni integration now available, enabling lineage display https://www.secoda.co/blog/secoda-integration-omni
Data Activation (Reverse ETL)
Hightouch
Journey Feature Now Allows Simulation Through Test Runs
As a new feature in Journeys, you can now simulate outcomes through test runs after creating a Journey.
https://hightouch.com/blog/journey-simulations
The image below, quoted from the link above, shows how you can simulate how many records will be synced in each flow and destination.
Top comments (0)