DEV Community

augusto kiniama rosa
augusto kiniama rosa

Posted on • Originally published at blog.infostrux.com on

The Unofficial Snowflake Monthly Release Notes: August 2024

Monthly Snowflake Unofficial Release Notes #New features #Previews #Clients #Behavior Changes

Welcome to the fantastic Unofficial Release Notes for Snowflake for August 2024! You’ll find all the latest features, drivers, and more in one convenient place.

As an unofficial source, I am excited to share my insights and thoughts. Let’s dive in! You can also find all of Snowflake’s releases here.

This month, we provide coverage up to release 8.32 (General Availability — GA). I hope to extend this eventually to private preview notices as well.

I would appreciate your suggestions on continuing to combine these monthly release notes. Feel free to comment below or chat with me on LinkedIn.

Behavior change bundles 2024_05 are active by default, 2024_06 is enabled by default but can be disabled, and 2024_07 is available to be enabled.

What’s New in Snowflake

New Features

  • Snowpark Container Services made into GA for all commercial AWS regions and Preview for all commercial Azure regions
  • Python user-defined aggregate functions (GA), use Snowpark Python APIs to create and call user-defined aggregate functions (UDAFs), which take one or more rows as input and produce a single row of output
  • Access to Git repositories from Snowflake (GA), you can now fetch, and clone git repos directly inside Snowflake
  • Data Dictionary Data Preview (Preview), generated when a data dictionary is enabled for a listing, and enables a basic email to someone to review the detection
  • Outbound private connectivity with Azure External Network Access and External Functions (Preview), use outbound private connectivity with two features external Network Access and External Functions
  • Full-text search (Preview), call a new SEARCH function to find character data (text) in specified columns from one or more tables, including fields in VARIANT, OBJECT, and ARRAY columns
  • Cortex Analyst (Preview), enables you to create applications capable of reliably answering business questions based on your structured data in Snowflake with business users can ask questions in natural language and receive direct answers without writing SQL
  • Differential Privacy(Preview), a widely recognized standard for data privacy that limits the risk that someone could leak sensitive information from a sensitive dataset, even if they are carrying out a targeted privacy attack, and Snowflake uses rigorous mathematics to ensure that they cannot identify individuals and entities in the dataset to an unacceptable degree of certainty

Snowsight Updates

  • New stage explorer in Snowsight (GA), load a staged file into a table, you can select the files directly from a stage explorer, and even select sub-folders from it
  • Schema detection and visual column mapping for loading files to existing tables in Snowsight (Preview), visualize the column mapping between the source file and the target table and make adjustments as needed before loading

Snowflake Applications

  • Cross-Cloud Auto-Fulfillment in a Snowflake Native App with Snowpark Container Services (Preview), automatically replicate the data product associated with your listing to other Snowflake regions
  • Snowflake Native App Framework: Support for VPS on AWS (GA), support for Virtual Private Snowflake (VPS) on Amazon Web Services
  • Snowflake Native App Framework: Support for government regions on AWS

Data Lake Updates

  • Iceberg tables: Support for government regions (GA)
  • Polaris Catalog: New system function for troubleshooting issues with syncing Snowflake-managed Iceberg tables (Preview), system function SYSTEM$SEND_NOTIFICATIONS_TO_CATALOG, the function sends a notification to a Polaris catalog, and if the send fails, it returns an error message explaining why and help for diagnosing why a Snowflake-managed Iceberg table isn’t syncing to a Polaris catalog

Streamlit Updates

  • Support for Streamlit 1.35.0
  • Custom UI in Streamlit in Snowflake (GA), custom UI enables customization of the look, feel, and front-end behavior of Streamlit in Snowflake apps
  • Streamlit in Snowflake on AWS GovCloud (GA)

SQL Updates

  • RANGE BETWEEN window frames with explicit offsets (GA), consists of a logically computed set of rows, which by using a range-based frame with explicit offsets, such as RANGE BETWEEN 3 PRECEDING AND 3 FOLLOWING, you can easily compute rolling calculations, such as moving sums and averages, over time-series data
  • Setting users as SNOWFLAKE_SUPPORT users no longer supported, can no longer set a user’s SUPPORT_USER attribute using the CREATE USER or ALTER USER commands
  • RANGE BETWEEN with explicit offsets: Additional window functions supported, STDDEV, STDDEV_SAMP, STDDEV_POP (and aliases), VARIANCE , VARIANCE_SAMP, VARIANCE_POP (and aliases) and COUNT_IF
  • UNDROP command: Support for restoring objects using ID, if you have dropped multiple tables with the same name, you can use this feature to restore a specific table using the table ID
  • Wildcard filtering for functions, when you specify a wildcard (*) as an argument in a call to a function, you can now use the ILIKE and EXCLUDE keywords for filtering in a SELECT list or GROUP BY clause
  • SEARCH_IP Function (Preview), searches for valid IPv4 addresses in specified character-string columns from one or more tables, including fields in VARIANT, OBJECT, and ARRAY columns

Machine Learning Updates (Cortex, ML, DocumentAI)

  • ML Functions: Improved Error Messages in Classification (Preview), improvements to error languages, and suggestions for fixing them
  • Cortex Search Service (Preview), enables low-latency, high-quality “fuzzy” search over your Snowflake data, which allows you to create your own search service through Snowsight without writing any SQL
  • DocumentAI new Arctic-TILT model, it doubles the length of the answers provided by the model with up to 256 tokens long (about 160 words), and improves training time
  • Cross-region inference for Snowflake AI & ML features (GA), enables processing inference requests in a different region if the request cannot be processed in the region where the inference is originally requested and works for any Snowflake feature supported by cross-region inference, including Cortex LLM Functions and Snowflake Copilot
  • Time Series ML Functions — Error Message Improvements, improved error messages for the Forecasting and Anomaly Detection ML Functions, now error messages from these functions now contain only actionable information
  • Easier Training of Forecasting Models from Real-World Data, now includes preprocessing features that allow you to successfully train a forecasting model even when your training data has missing, duplicate, or misaligned time steps
  • Snowflake ML Functions: Top Insights(Preview), updates to the Top Insights ML Function for key driver analysis, lets you easily identify drivers of a metric’s change over time or explain differences in a metric among various verticals
  • The new Mistral Large 2 model is available in Snowflake Cortex AI, now available for serverless inference in Snowflake Cortex AI, which adds a significantly more capable model than Mistral Large in math, reasoning, and coding with an increased context window of 128K
  • Cortex Analyst: New regions AWS ap-northeast-1 (Tokyo) and Azure West Europe (Netherlands)
  • New multilingual embedding models available in Snowflake Cortex AI, multilingual-e5-large (EMBED_TEXT_1024 (SNOWFLAKE.CORTEX)

Data Clean Rooms Updates

  • Support for external tables and Iceberg tables (GA), providers and consumers can now include external tables and Iceberg tables in a clean room
  • Integration with TransUnion TruAudience Identity (GA), providers and consumers can now use TransUnion’s latest TruAudience Identity solution when creating or installing a clean room in the web app, which allows them to use the TransUnion identity graph to match records based on a collaboration ID
  • RSA authentication for the service account user, service account user that the web app uses to interact with the Snowflake account, now authenticates using Snowflake’s key-pair authentication instead of username/password authentication, which provides a more secure method of authentication
  • Activation for provider-run analyses, providers can now push the results of a provider-run analysis to their own Snowflake account, where it can be used for activation, and a consumer who has shared data in the clean room can control whether the provider can push the results of the provider-run analysis

Data Pipelines/Data Loading/Unloading Updates

  • Loading unstructured data with Document AI (Preview), supports loading unstructured data, similar to loading structured and semi-structured data. To load unstructured data with this preview feature, you can run the same COPY INTO table command with a new copy option file_processor
  • Tasks: A new option for ALTER TASK, supports a new option, REMOVE WHEN and you can use this option to easily remove the task’s condition

Security Updates

  • Session policies: Support added for secondary roles (GA), ALLOWED_SECONDARY_ROLES property of a session policy enables you to scope the set of secondary roles available to a user for the duration of the session, you can now Allow secondary roles in the session, Disallow secondary roles in the session and Allow only specific secondary roles.

Open-Source Updates

  • terraform-snowflake-provider 0.94.1 (Bug Use ALTER for managing PUBLIC schemas that exist)
  • Streamlit 1.38.0 (breaking changes: remove pydantic fix in bootstrap, and remove experimental cached widget replay, new features: Support to_pandas method to return a Pandas Series, support for Kubernetes / directory with mounted file style secrets, data handling support for Polars, and lots of bug fixes)
  • Modin (nothing released)
  • Snowflake VS Code Extension 1.9.1 (added Don’t ask again option to the Native App dialog, the snowflake.yml of a project is opened automatically when a project is created using the extension options, Bug Fixes: Fixed issue where the dialog for the Native App panel was being shown in wrong scenarios, changed description for the flag that enables the debug argument for the Native app operations, added caret to file when navigating to the definition of an object using the Native App panel)

Client, Drivers, Libraries and Connectors Updates

New features:

  • Go Snowflake Driver 1.11.1 (support for downloading files into an in-memory stream when using the GET command, and context propagation to snowflakeFileTransferAgent to support cancel for file transfer process)
  • Ingest Java SDK 2.2.0 (Improved code logic to support different storage volumes)
  • JDBC Driver 3.19.0 (support for disabling connection caching, PRIVATE_KEY_BASE64 connection parameter to support base64-encoded private keys, connection properties to support setting timeouts: HTTP_CLIENT_CONNECTION_TIMEOUT, HTTP_CLIENT_SOCKET_TIMEOUT, BROWSER_RESPONSE_TIMEOUT, updated dependencies: Arrow to version 17.0.0, threeten-bp to version 1.6.9)
  • Snowflake Connector for Google Analytics Raw Data 1.8.0 ()
  • Snowflake Connector for Google Analytics Raw Data 1.7.2 ()
  • Snowflake Connector for Google Analytics Raw Data 1.6.6 ()
  • Snowflake Connector for Google Analytics Raw Data 1.6.3 ()
  • Node.js 1.12.0 (SSO and MFA token caching to the node.js driver, picked a top-level domain for Snowflake hosts, added support for reading the connection information from a file, added the cwd (current working directory) parameter to use for GET/PUT execution when it differs from the connector directory, support for AES 256 encryption/decryption)
  • PHP PDO Driver for Snowflake 3.0.2 (increased the maximum allowable large object (LOB) size)
  • .NET Driver 4.1.0 (log messages about the domain destination to which the driver is connecting, updated DbCommand.Prepare() to do nothing instead of throwing an exception.)
  • Snowflake CLI 2.7.0 (snow snowpark init and snow streamlit init commands are marked as deprecated, added the --token-file-path option for the snow connection add command to support passing an OAuth token using a file. The function is also supported by setting the token_file_path parameter for connection definitions in the config.toml file, added support for Python remote execution with the snow stage execute and snow git execute similar to existing EXECUTE IMMEDIATE support, added support for autocomplete functionality in snow connection add --connection option, added the snow init command to support initializing projects with external templates, added support for user stages in the stage execute and stage execute copy commands, improved support for quoted identifiers in Snowpark commands, snow app run command now allows upgrading to an unversioned mode from a versioned or release mode application installation, snow app teardown command now allows dropping a package with versions when the --force flag is provided, snow app version create command now allows operating on application packages created outside Snowflake CLI, application.post_deploy SQL script to use the application database as the default, supports regionless hosts when generating Snowsight URLs, now app run and snow app deploy commands now correctly determine the modified status for large files uploaded to AWS S3.)
  • Snowflake CLI 2.8.0 (added support for project definition file defaults in templates, added support for native_app.package.post_deploy scripts in project definition files, These scripts execute when a Snowflake Native App package is created or updated, Currently, Snowflake REST APIs supports only SQL scripts: post_deploy: [{sql_script: script.sql}].)
  • Snowflake Connector for Kafka 2.4.0 (Upgraded the Snowflake Ingest Java SDK to version 2.2.0, which contains a critical fix for potential issues when change_tracking is enabled for streams and dynamic tables, upgraded the Snowflake JDBC driver to version 3.18.0, and improved the logging experience in various components for improved troubleshooting experience, and improved the channel reopening logic. Note: For all Snowpipe Streaming usage, Snowflake recommends using the Kafka connector version 2.4.0 or later)
  • Snowflake API for Python 0.12.0 (client now retries requests on retryable error codes, StageResource methods are now deprecated and have been renamed. The old method names are now aliases, From upload_file to put and From download_file to get)
  • Snowpark Library for Python 1.21.0 (this is a fairly large update, big Snowpark pandas API, read more here)
  • Snowpark Library for Scala and Java 1.13.0 (Compatible Snowflake release: 8.28, emit span in Java/Scala stored procedure. Support functions: All action functions, register UDF/UDTF/SProc, enable retrieving cloud provider tokens in the SnowflakeSecrets class. New functions: Session.updateQueryTag, functions.countDistinct, functions.max(String), functions.min(String), functions.mean(String), improved App name in the session query tag is JSON format now, upgraded SLF4J to 2.0.4, update documentation for SnowflakeFile)
  • Snowpark ML 1.6.1 (New Modeling features the set_params method is now available to set the parameters of the underlying scikit-learn estimator, if the Snowpark ML model has been fitted, and Support for model explainability in XGBoost, LightGBM, CatBoost, and scikit-learn models supported by the shap library)
  • Snowflake Connector for Google Analytics Raw Data 1.8.0 (Internal updates only)
  • Snowflake Connector for Google Analytics Raw Data 1.7.2 (added flattened event_params and user_properties columns in the sink table views, enabled change tracking on sink tables, sink table views are now refreshed with the copy grants statement.)
  • Snowflake Connector for Google Analytics Raw Data 1.6.3 (sink table views are now refreshed automatically, data is now synced sooner for timezones ahead of UTC, improved scalability of scheduling ingestions for large number of properties)
  • Snowflake Connector for PostgreSQL 6.4.0 (behavior changes connector now supports all known types of Postgres publications, support for all PostgreSQL DOMAIN types based on native data types)

Bug fixes:

  • .NET Driver 4.1.0 (issue where a cancel exception was lost when canceling a OpenAsync operation)
  • Go Snowflake Driver 1.11.1 (removed context propagation in snowflakeConn, which is used only for dialing purposes, prevent panic in the arrayToString method for Golang slices, and prevent panic in the decodeChunk method when a download is canceled)
  • Ingest Java SDK 2.2.0 (a critical issue that could potentially cause conflicts when change_tracking is enabled for streams and dynamic tables. Note: all Snowpipe Streaming usage, Snowflake recommends using the Ingest Java SDK version 2.2.0 or later)
  • JDBC Driver 3.19.0 (issue where the getDate method was missing an expected parameter, and a class not found problem related to LoggerFactory)
  • Node.js 1.12.0 (bug related to reusing the jwt token for login retries, fixed azure-storage-blob version compatibility with node version 14, issue that caused enum type errors when the isolatedModule option is set, issue the type definitions, by adding the missing cancel method and set the complete field in StatementOption as optional in driver types, issue with regex expressions in account name validation)
  • Snowflake CLI 2.7.0 (handle NULL md5 values correctly when returned by stage storage backends)
  • Snowflake CLI 2.8.0 (issue with invalid return values for snow snowpark list, snow snowpark describe, and snow snowpark drop commands, and snow app run command now shows warning returned by Snowflake)
  • Snowflake Connector for Kafka 2.4.0 (updated dependencies with known vulnerabilities)
  • Snowflake Connector for Python 3.12.1 (bug that logged the session token when renewing a session, bug where disabling client telemetry did not work, bug where passing login_timeout as a string raised a TypeError during the login retry step, updated the connector to use pathlib instead of os for resolving the default configuration file location, removed the upper cryptogaphy version pin, removed references to the snowflake-export-certs script, as its backing module was removed in a previous version, enhanced the retry mechanism for handling transient network failures during query result polling when no server response is received)
  • Snowflake API for Python 0.12.1 (Fixed multiple issues related to handling large results)
  • Snowflake API for Python 0.12.0 (Fixed multiple issues related to handling large results)
  • Snowpark Library for Python 1.21.0 (Made passing an unsupported aggregation function to pivot_table raise NotImplementedError instead of KeyError, removed axis labels and callable names from error messages and telemetry about unsupported aggregations, fixed AssertionError in Series.drop_duplicates and DataFrame.drop_duplicates when called after sort_values, bug in Index.to_frame where the result frame’s column name may be wrong where name is unspecified, bug where some Index docstrings are ignored, bug in Series.reset_index(drop=True) where the result name may be wrong, bug in Groupby.first/last ordering by the correct columns in the underlying window expression)
  • Snowpark Library for Scala and Java 1.13.1 (When the session parameter ERROR_ON_NONDETERMINISTIC_UPDATE is set to true, calls to session.table(…).update(…) no longer report errors)
  • Snowpark Library for Scala and Java 1.13.0 (variant object can’t handle null value, dataFrame alias doesn’t work in the JOIN condition)
  • Snowpark ML 1.6.1 (Feature Store bug fixes: metadata size is no longer limited when generating a dataset, and fix an error message in the run method of model versions when a function name is not given and the model has multiple target methods)
  • SnowSQL 1.3.2 (issue with the snowsql — version command failing when automatic upgrades are disabled (noup=False))
  • Snowflake Connector for Google Analytics Raw Data 1.7.2 (application upgrade fix for certain customers)
  • Snowflake Connector for Google Analytics Raw Data 1.6.6 (application upgrade fix for certain customers)
  • Snowflake Connector for ServiceNow® V2 5.9.1 (migration script fix for certain users)
  • Snowflake Connector for ServiceNow® V2 5.9.0 (fix RELOAD_TABLE procedure when both row_filter and data_range_start_time are set. Previously row filtering sync states were not cleaned up correctly, improve error handling in the data ingestion process when the connector is not able to overcome errors related to authentication)
  • Snowflake Connector for MySQL 6.4.0 (corrected an issue where the connector could become stuck in a state where commands were not delivered to the agent)
  • Snowflake Connector for MySQL 6.5.0 (in continuous mode, the compute warehouse will now be able to suspend if there is no data to merge into destination tables, fixed agent failure when MySQL server enforces secure connection)
  • Snowflake Connector for PostgreSQL 6.5.0 (in continuous mode, the compute warehouse will now be able to suspend if there is no data to merge into destination tables)
  • Snowflake Connector for PostgreSQL 6.4.0 (corrected an issue where the connector could become stuck in a state where commands were not delivered to the agent)

Conclusion

August 2024 was another month with great releases. Machine Learning and GenAI products continue to improve and mature well. Honestly, one could say that is Generally Availability times at Snowflake, with many products turning GA like Container Services. I am busy writing articles about some of these features.

Enjoy the reading.

I am Augusto Rosa, VP of Engineering for Infostrux Solutions. I am also a Snowflake Data Super Hero and Snowflake SME. You can follow me on LinkedIn.

Subscribe to Infostrux Medium Blogs https://medium.com/infostrux-solutions for the most interesting Data Engineering and Snowflake news.

Sources:


Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more →

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more