DEV Community

Sagara
Sagara

Posted on

Personal Picks: Data Product News (May 28, 2025)

※This article is an English translation of the original Japanese article: https://dev.classmethod.jp/articles/modern-data-stack-info-summary-20250528/

Hi, this is Sagara.

As a consultant specializing in the Modern Data Stack, I see a lot of information being shared daily in this field.

With so much news out there, I've decided to summarize some of the Modern Data Stack-related information that caught my eye over the past couple of weeks in this article.

Note: This article doesn't cover all the latest updates for every product mentioned. I'm only including information that I found particularly interesting, based on **my personal perspective and selection.

General Modern Data Stack News

Salesforce to Acquire Informatica

Salesforce issued a press release announcing its acquisition of Informatica.
By integrating Informatica's extensive data infrastructure-related capabilities, such as data integration, cataloging, and MDM management, onto the Salesforce platform, it seems that much more will be achievable directly within the Salesforce ecosystem.
https://www.salesforce.com/news/press-releases/2025/05/27/salesforce-signs-definitive-agreement-to-acquire-informatica/

Below is an article from the CEO of Orchestra in response to this acquisition. The image below, quoted from the article, shows Salesforce's acquisition history over the past nine years or so. They've been acquiring many companies at an incredible pace...
https://dataopsleadership.substack.com/p/breaking-salesforce-buys-informatica
Salesforce acquisition history over the past 9 years

Data Warehouse/Data Lakehouse

Snowflake

Snowflake Openflow Released

Snowflake has released a new feature, Openflow. (As of May 28, 2025, it is in Public Preview.)
Openflow is a service based on Apache NiFi and can be used for ingesting and transforming data from various data sources.
https://docs.snowflake.com/en/release-notes/2025/other/2025-05-20-openflow

MFA Now Supports TOTP and Passkeys

As a new Snowflake feature, TOTP and passkeys are now available for MFA authentication. (I was a bit worried when it disappeared from the release notes once, but I'm glad it has been re-released!)
https://docs.snowflake.com/en/release-notes/2025/other/2025-05-23-mfa
We've also written a blog post about this (in Japanese), so please check it out.
https://dev.classmethod.jp/articles/snowflake-snowsight-time-based-one-time-password/

"Cost Anomalies" Feature Released for Automatic Cost Anomaly Detection and Notification at Account/Organization Level

Snowflake has released "Cost anomalies," a new feature that automatically detects and notifies about cost anomalies at the account and organization levels. (As of May 28, 2025, it is in Public Preview.)
https://docs.snowflake.com/en/release-notes/2025/other/2025-05-16-cost
I tried it out myself, and it's a convenient feature that allows you to check the details of anomalies while viewing past cost situations in a graph, and set up email alerts for detected anomalies.
We've also written a blog post about this (in Japanese), so please check it out.
https://dev.classmethod.jp/articles/snowflake-cost-anomalies-pupr/

terraform-provider-snowflake Roadmap Updated

The roadmap for terraform-provider-snowflake has been updated for the first time since its GA release.
Key focus areas for the coming months include PAT, SPCS, Listing, Account management features, and a PoC to introduce Snowflake REST API into the Provider.
https://github.com/snowflakedb/terraform-provider-snowflake/blob/main/ROADMAP.md

How to Deploy Streamlit in Snowflake Apps Using dbt

A blog post from phData summarized how to deploy Streamlit in Snowflake apps using dbt.
While I knew it was possible to deploy via SQL and could be done by creating macros, this particular approach was new to me and quite interesting.
https://www.phdata.io/blog/how-to-deploy-snowflake-streamlit-apps-the-easiest-method-explained-using-dbt/

BigQuery

Announcing "GENERATE_TABLE" Function to Write Recognized Image Information Directly to a Table

BigQuery has announced a new feature, the GENERATE_TABLE function, which allows you to write information recognized from images directly into a table.
https://cloud.google.com/blog/products/data-analytics/convert-ai-generated-unstructured-data-to-a-bigquery-table?hl=en
The following is quoted from the blog post mentioned above. By defining an External Table for GCS where images are stored and an LLM Model object beforehand, you can execute a query to record information obtained from images into a table.
Example of GENERATE_TABLE function usage

Onehouse

Announcing New Query Engine "Quanton"

Onehouse has announced "Quanton," a new query engine available for the Onehouse Compute Runtime.
It supports Apache Spark and SQL and is mentioned to be more cost-effective than using compute resources from EMR, Snowflake, or Databricks.
https://www.onehouse.ai/blog/announcing-spark-and-sql-on-the-onehouse-compute-runtime-with-quanton

MotherDuck/DuckDB

Announcing "DuckLake," a New Lakehouse Format Where Metadata Management is Handled by the Database

DuckLake was announced on the official DuckDB blog.
Recognizing the complexity of file-based metadata management in recent formats like Iceberg and Delta Lake, DuckLake is a product implemented with the approach of having an SQL database handle the entire metadata management layer, including what would be the catalog layer in Iceberg.
The following four benefits of DuckLake are mentioned in the blog post:

  • Simplicity
    • To run DuckLake on a laptop, you just need to install DuckDB and use the DuckLake extension (in this case, DuckDB's local file handles catalog management).
    • No Avro or JSON files; everything is controllable via SQL.
  • Scalability
    • An architecture that separates storage, compute, and metadata management.
  • Speed
    • Unlike traditional Open Table Formats, file I/O is not required.
    • Reduces the number of files written for small changes and can handle concurrent modifications.
  • Features
    • Operable via SQL, supports ACID-compliant transactions, and allows adding/deleting columns and changing data types.
    • Data and delete files written to storage by DuckLake are compatible with Iceberg, allowing for metadata-only migration.
    • DuckLake compute nodes have been simultaneously released as a DuckDB extension (available from DuckDB v1.3.0). https://duckdb.org/2025/05/27/ducklake Below are the official DuckLake website and repository: https://ducklake.select/ https://github.com/duckdb/ducklake

DuckDB 1.3.0 Released

The latest version of DuckDB, 1.3.0, has been released.
The caching feature for external file queries and the ability to directly query parquet, csv, and json files using CLI commands particularly caught my attention.
https://duckdb.org/2025/05/21/announcing-duckdb-130.html

Data Transform

dbt

Documentation for Hybrid Projects Published

Documentation for dbt's new Hybrid projects feature has been published. (As of May 28, 2025, it is available in Private Beta.)
By predefining environment variables related to dbt Cloud, artifacts such as manifest.json can apparently be automatically uploaded to dbt Cloud when running commands like dbt run with dbt Core.
https://docs.getdbt.com/docs/deploy/hybrid-projects

Data Application

Streamlit

Article Summarizing Best Practices for Building Gen AI Apps with Streamlit

An article summarizing best practices for building Gen AI apps with Streamlit was published on the official Streamlit blog.
It covers a wide range of topics, including directory structure, API key storage, context maintenance, and cache utilization.
https://blog.streamlit.io/best-practices-for-building-genai-apps-with-streamlit/

Business Intelligence

Looker

Some Looker Permissions Now Apply to Studio in Looker

Some Looker permissions now apply to Studio in Looker. (Preview)
https://cloud.google.com/looker/docs/release-notes#May_20_2025
As mentioned in the documentation below, permissions like explore and see_user_dashboards will apply, enabling users to access only authorized Explores and dashboards in Studio in Looker.
https://cloud.google.com/looker/docs/overview-of-studio-in-looker-permissions

Looker 25.8 Release Notes Published

The release notes for Looker 25.8 have been published.
The updates that particularly caught my attention were the Code Interpreter in Conversational Analytics and the ability to apply gemini_in_looker permissions to specific models.
https://cloud.google.com/looker/docs/release-notes#May_14_2025

Power BI

Blog Post Summarizing May 2025 Updates

A blog post summarizing the May 2025 updates for Power BI was published on Microsoft's official blog.
Although I'm not very familiar with Power BI myself, the updates seemed to center around Copilot features specific to Power BI and the definition of Semantic Models for AI.
https://powerbi.microsoft.com/en-us/blog/power-bi-may-2025-feature-summary/
The future roadmap for Microsoft Fabric, including Power BI, will apparently be published on the following page:
https://roadmap.fabric.microsoft.com/?product=powerbi

Hex

Announced Acquisition of Hashboard (Information from April 30, 2025)

Although this news is from April 30, 2025, Hex announced the acquisition of Hashboard.
https://hex.tech/blog/welcoming-hashboard/
I hadn't heard of Hashboard myself, but it's a BI tool where you define a data model beforehand and then build dashboards.
https://hashboard.com/

Data Catalog

Secoda

Summary of Secoda's April 2025 Updates

The page summarizing Secoda's April 2025 updates has been updated on the official Secoda website.
https://www.secoda.co/product-news/april-2025
Personally, the following updates particularly caught my attention:

Data Activation (Reverse ETL)

Hightouch

Journey Feature Now Allows Simulation Through Test Runs

As a new feature in Journeys, you can now simulate outcomes through test runs after creating a Journey.
https://hightouch.com/blog/journey-simulations
The image below, quoted from the link above, shows how you can simulate how many records will be synced in each flow and destination.
Hightouch Journey Simulation showing record counts

Top comments (0)