augusto kiniama rosa

Posted on Jan 1, 2025 • Originally published at blog.infostrux.com on Oct 1, 2024

The Unofficial Snowflake Monthly Release Notes: September 2024

#newreleases #snowflakedatacloud #snowflakedatasuperhe #datasuperhero

Monthly Snowflake Unofficial Release Notes #New features #Previews #Clients #Behavior Changes

Welcome to the fantastic Unofficial Release Notes for Snowflake for September 2024! You’ll find all the latest features, drivers, and more in one convenient place.

As an unofficial source, I am excited to share my insights and thoughts. Let’s dive in! You can also find all of Snowflake’s releases here.

This month, we provide coverage up to release 8.37 (General Availability — GA and Public Previews — Preview). I hope to extend this eventually to private preview notices as well.

I would appreciate your suggestions on continuing to combine these monthly release notes. Feel free to comment below or chat with me on LinkedIn.

Behavior change bundles 2024_05 are active by default, 2024_06 is enabled by default but can be disabled, and 2024_07 is available to be enabled.

ServiceNow v1 has been replaced by the Snowflake Connector for ServiceNow V2 and has been removed as an option.

What’s New in Snowflake

New Features

Authentication with AWS IAM from procedures and functions (Preview): Snowpark External Access now supports authentication with AWS services via Identity and Access Management (IAM).
Snowpark-optimized Warehouse RESOURCE_CONSTRAINT (Preview): The CREATE or ALTER warehouse commands allow you to specify the memory and CPU architecture for Snowpark-optimized warehouses.
Snowflake Notebook cells cannot send logs, spans, or span events to event tables while telemetry data is temporarily disabled. Telemetry data will be enabled again in a future release. Any logs or traces emitted by other objects called from Notebooks, such as stored procedures and UDFs, will continue to send telemetry data to your account's event table. To re-enable Notebooks cells to send telemetry to your event tables, please contact Snowflake Support or your account representative.
Pandas on Snowflake (GA) allows you to execute your pandas code in a distributed manner directly on your data in Snowflake. By simply changing the import statement and making a few code adjustments, you can retain the familiar pandas experience while benefiting from the scalability and security features of Snowflake. With pandas on Snowflake, you can handle larger datasets and avoid the need to migrate your pandas pipelines to other big data frameworks or invest in large and expensive machines. This solution runs workloads natively in Snowflake through transpilation to SQL, enabling it to leverage parallelization, as well as the data governance and security advantages of Snowflake. pandas on Snowflake is delivered through the Snowpark pandas API as part of the Snowpark Python library, allowing for scalable data processing of Python code within the Snowflake platform.
Calling stored procedures in the FROM clause of SELECT statements (GA) calls a stored procedure that returns tabular data directly from the SELECT statement's FROM clause. This technique can be used to simplify SQL statements that save results to a table.
Snowflake REST APIs — Preview With this release, we are pleased to announce the preview of Snowflake REST APIs.
Snowflake REST APIs (Preview) for resource management provide a set of endpoints that allow users to programmatically interact with and control various resources in the Snowflake Data Cloud.
New Snowflake region China (Ningxia) region (GA), on Amazon Web Services (AWS) in the China (Ningxia) region (cn-northwest-1), The China region is a separate region operated by Digital China Cloud Technology Limited (DCC), an authorized operating partner of Snowflake, Inc.

Machine Learning Updates (Cortex, ML, DocumentAI)

Snowflake Feature Store (GA) enables data scientists and ML engineers to create, maintain, and use ML features in data science and ML workloads within Snowflake by standardizing commonly used feature transformations in a centralized repository.
New models available in Snowflake Cortex AI, Cortex LLM COMPLETE function in Snowflake Cortex AI now support the following additional models: jamba-1.5-large, llama3.2–1b and llama3.2–3b.
The DOCUMENT_AI_USAGE_HISTORY view (GA) in the Account Usage schema allows you to query the usage history for Document AI.
The new multilingual embedding model available in Snowflake Cortex AI (Preview) supports the following additional multilingual models: voyage-multilingual-2
New Cortex LLM Function — CLASSIFY_TEXT — Preview With this release, we are pleased to announce the preview of a new Snowflake Cortex LLM function, CLASSIFY_TEXT. This new Cortex LLM task-specific function gives you an easy way to label text records into categories that are relevant for your business.
The new CLASSIFY_TEXT function (Preview) provides text classification accuracy comparable to the most powerful models on the market today, allowing you to easily label text records such as emails, call transcripts, and product reviews for a variety of business categories. Integrate the results of CLASSIFY_TEXT into your data pipeline; the outputs generated are structured in JSON, without any additional post-processing needed, and quickly and accurately label text records without the need for any prompt engineering, post-processing, or providing any examples.
New AI21 model in Snowflake Cortex AI (Preview): The jamba-1.5-mini with a context length of 256K supports use cases such as structured output (JSON), and grounded generation.
The Anomaly Detection ML Function(Preview) now includes preprocessing features that enable you to successfully train an anomaly detection model even if your training data contains missing, duplicated, or misaligned timesteps. The new preprocessing features allow you to manually specify an event cadence in case the model fails or incorrectly infers it. Automatically interpolate missing target values from nearby time steps. Aggregate dimensional values from events occurring outside the canonical event cadence. You can specify aggregation behaviours for the type of value or per column or use defaults, and A relatively small number of such corrections does not noticeably affect detection accuracy.

Performance Improvements

Improved replication refreshes through parallelization and reduces the time it takes to clone objects, especially for databases and schemas with extensive metadata and reduces the overall refresh time when replicating large volumes of data.

Data Lake Updates

Cloning support for Snowflake-managed Apache Iceberg tables (Preview), support for cloning Snowflake-managed Iceberg tables is available.
Apache Iceberg tables: Automated refresh(Preview): With automated refreshes, Snowflake polls your external Iceberg catalog continuously and serverless to synchronize the metadata with the most recent remote changes.
Apache Iceberg tables: Catalog integration for Iceberg REST (Preview), connecting Snowflake to Apache Iceberg tables managed in a remote catalog that complies with the open source Apache Iceberg REST OpenAPI specification.
Iceberg tables: Delta table support (Preview) allows you to create read-only Iceberg tables from Delta table files in object storage.

SQL Updates

The new function REDUCE (GA) reduces an array to a single value using the logic in a lambda expression.
The new function ST_INTERPOLATE (GA) accepts an input GEOGRAPHY object and returns an interpolated object within a specified tolerance. Call this function when you want to see how GEOGRAPHY objects appear in the planar coordinate system.
RANGE BETWEEN supports the FIRST_VALUE and LAST_VALUE functions (GA), as well as the availability of RANGE BETWEEN window frames with explicit offsets for two additional functions: FIRST_VALUE and LAST_VALUE.
SHOW commands GA): Added support for the new WITH PRIVILEGES parameters SHOW DATABASES, SHOW SCHEMAS, and SHOW WAREHOUSES. The WITH PRIVILEGES parameter lets you limit results to databases, schemas, and warehouses for which the role executing the statement has been granted the privileges specified in the list.

Data Clean Rooms Updates (GA Updates)

Branded clean room tiles brand their clean rooms with a logo and company name by configuring the profile of their clean room environment, and collaborators see the provider’s logo and name on clean room tiles on the Joined and Invited tabs.
Consumer direct activation sends analysis results directly to their Snowflake account, allowing them to access row-level data after running templates for overlap and data enrichment use cases. A provider can disable this option to prevent consumer direct activation.
Activation Hub column policies specify which columns should be used as ID columns for their activation, which might be different from the join columns used while running an analysis.
As a consumer, you can schedule analyses to run on an hourly, daily, weekly, or monthly basis, allowing you to keep your data up to date without having to manually rerun an analysis. Scheduled analyses are performed as a background process. The Audience Overlap & Segmentation template, SQL Query template, and custom templates created with the developer APIs all support scheduling and analysis.
Clean room data stats, including data for their own tables in the clean room. These statistics show distinct counts for their join policy columns among the top five values in all other columns. When non-join columns contain more than 20 distinct values, no distinct counts are displayed.
Activate their respective RampID and put it back into their LiveRamp account via Snowflake shares or SFTP upload. Users can then use their LiveRamp Connect account to push these segments downstream to other LiveRamp-supported destinations.
Activate the Trade Desk CRM and return first-party PII information to their Trade Desk account. This enables users to integrate their CRM data into data segments for audience targeting and conversion measurement on Trade Desk.
Managed account credit limit and monitoring; set a monthly limit on how many Snowflake credits can be used for clean room activity. Users cannot use the web app to access the clean room environment if their credit consumption is within 10 credits of the limit, nor can they see how many credits they have consumed for the month.
When a user runs an analysis using an Audience Overlap, SQL Query, or custom template, they are prompted to save the analysis and given the option to schedule future runs. Users can continue to use the application while the analysis is running in the background.
Integration with Yahoo DSP: when an analysis provides the Activation Hub, consumers can now activate the results of the analysis to their Yahoo DSP account, which lets consumers buy against audiences they are generating within the clean room through Yahoo DSP.
Publishers and advertisers can use the Google PAIR protocol to run an audience overlap analysis on encrypted identifiers and then push the results to their Google DV 360 account for activation, all without exposing unencrypted sensitive data.

Data Pipelines/Data Loading/Unloading Updates

The vectorized scanner option supports client-side encryption (GA), and the Parquet file format option, USE_VECTORIZED_SCANNER, now supports client-side encryption. With this option, you no longer need to configure the stage to use server-side encryption.
The new DYNAMIC_TABLE_REFRESH_HISTORY account usage view (GA) displays information about your dynamic tables' refresh history for up to a year.
Serverless tasks now have Python and JVM support (Preview), and they can invoke the following object types and functions: UDFs (user-defined functions) and stored procedures written in Python, Java, or Scala.

Open-Source Updates

terraform-snowflake-provider v0.96.0 (V1 redesign of resources and data sources: Row access policy, Resource monitor, and Masking policy, SDK upgrades: External volume and Authentication policy)
terraform-snowflake-provider v0.95.0 (V1 redesign resources View, User, and Database role, Add fully_qualified_name to all resources, Add identifier parsers, Add identifier with arguments, Add timeouts block to cortex, identify with arguments for procedure and external function)
Streamlit 1.38.0 (nothing released)
Modin 0.32.0 (Add native query compiler, Interoperability between query compilers, Initial Polars API, Using dynamic partitioning in broadcast_apply , Add more granular lazy flags to query compiler, Add a new environment variable for using dynamic partitioning)
Snowflake VS Code Extension 1.10.1 (Parent nodes in the native app pane will be colored to reflect any status changes to their children, Added support for authenticating through OAuth via the connection configuration file)
Snowflake VS Code Extension 1.10.0 (added File Format and Git Repositories to the Object Explorer)

Client, Drivers, Libraries and Connectors Updates

New features:

Ingest Java SDK 2.2.1 (ExternalVolumeManager to support multiple stages for a new table format, dependency versions, parameters to support a new table format)
ODBC Driver 3.4.1 (improved error messages for network errors)
Snowflake Connector for Google Analytics Raw Data 2.0.0 (support for identifiers in a worksheet format)
Snowflake Connector for Google Analytics Aggregate Data 2.0.0 (connector requires all configured identifiers to be quoted based on the identifier requirements, report tables have change_tracking enabled, reset the connector’s configuration before the configuration is finalized using RESET_CONFIGURATION procedure, recover a connector in states ERROR, PAUSING, or STARTING using the RECOVER_CONNECTOR_STATE procedure)
Node.js 1.13.0 (support for the passcode and passcodeInPassword parameters in the MFA authentication process)
Snowflake Connector for Kafka 2.4.1 (Snowflake Ingest Java SDK to version 2.2.2)
Snowpark Library for Python 1.22.0 & 1.22.1 (following new functions in snowflake.snowpark.functions:array_remove, ln;improved documentation for Session.write_pandas by making the use_logical_type option more explicit, support for specifying the following to DataFrameWriter.save_as_table:enable_schema_evolution, data_retention_time, max_data_extension_time, change_tracking, copy_grants, iceberg_config - A dictionary that can hold the following iceberg configuration options:external_volume, catalog, base_location, catalog_sync, storage_serialization_policy; support for specifying the following to DataFrameWriter.copy_into_table:iceberg_config - A dictionary that can hold the following iceberg configuration options:external_volume, catalog, base_location, catalog_sync, storage_serialization_policy;support for specifying the following parameters to DataFrame.create_or_replace_dynamic_table: mode, refresh_mode, initialize, clustering_keys, is_transient, data_retention_time, max_data_extension_time , many local testing updates, and many Snowpark pandas API updates)
Snowpark Library for Scala and Java 1.14.0 (support for reading structured types from Snowflake, added the following new functions:Variant.asJsonNode, Functions.round, Functions.hex, Functions.unhex, Functions.shiftleft, Functions.shiftright, Functions.reverse, Functions.isnull, Functions.unix_timestamp, Functions.locate, Functions.ntile, Functions.radn, Functions.randn, Functions.regexp_extract, Functions.signum, Functions.sign, Functions.substring_index, Functions.collect_list, Functions.log10, Functions.log1p, Functions.base64, Functions.unbase64, Functions.expr, Functions.array, Functions.date_format, Functions.last, Functions.desc, Functions.asc, Functions.size)

Bug fixes:

Ingest Java SDK 2.2.2 (critical issue by updating the location for the file name in metadata)
ODBC Driver 3.4.1 (issue of introducing delays in some cases when running put/get command, issue where unsupported usage of SQL_DEFAULT_PARAM is not handled correctly)
Node.js 1.13.1 (compilation error with the types file)
Node.js 1.13.0 (Deleted query IDs exposed to users on failed requests, axios error and response sanitization, error handling issues in the getResultsFromQueryId method, issue related to re-authentication for JWT and SAML authentication, issue with returned types for async methods in the driver types definition)
Snowflake CLI 2.8.1 (issue where the git execute command did not correctly handle upper case in directory names, issue where the snow git setup did note correctly handle fully qualified repository names, the snow git setup command behavior in cases where API integration, or a secret with a default name, already exists, issue where the snow snowpark package create command created empty zip files when a package name contained capital letters)
Snowflake Connector for Kafka 2.4.1 (issues with schematization)
Snowflake Connector for Python 3.12.2 (Improved error handling for asynchronous queries, providing more detailed and informative error messages when an async query fails, improved inference of top-level domains for accounts specifying a region in China, now defaulting to snowflakecomputing.cn,improved implementation of snowflake.connector.util_text.random_string to reduce the likelihood of collisions, log level for OCSP fail-open warning messages from ERROR to WARNING)
Snowpark Library for Python 1.22.0 (lots of bug)
Snowpark Library for Python1.21.1 (bug where using to_pandas_batches with async jobs caused an error due to improper handling of waiting for asynchronous query completion)
Snowpark Library for Scala and Java 1.14.0 (incorrect time info in the Open Telemetry span, duplicated Open Telemetry span in the count action)
Snowflake Connector for ServiceNow® V2 5.10.1 (configuration validation in the UPDATE_CONNECTION_CONFIGURATION procedure)

Conclusion

September 2024 was very focused on machine learning, genAI, and smaller updates across other areas. However, there are many other big and small improvements. It was amazing to see a full RestAPI coming live for Snowflake. I am looking forward to writing about authentication with AWS IAM from procedures and functions.

My name is Augusto Rosa, and I am the Vice President of Engineering for Infostrux Solutions. I am also honored to be a Snowflake Data Super Hero 2024 and Snowflake SME.

Thank you for reading this blog post. You can follow me on LinkedIn.

Subscribe to Infostrux Medium Blogs https://medium.com/infostrux-solutions for the most interesting Data Engineering and Snowflake news.

DEV Community