DEV Community

Sagara
Sagara

Posted on

Snowflake's New Feature: AI-Powered Data Quality Checks You Can Set Up Directly in Snowsight

This is an English translation of the original Japanese article: https://dev.classmethod.jp/articles/snowflake-setup-data-quality-checks-in-snowsight/

Snowflake has released a new preview feature that allows you to set up data quality checks directly from the Snowsight UI. This feature leverages Cortex AI to automatically suggest quality checks, and also supports manual quality check definitions — all through the Snowsight GUI.

I tried it out, so let me walk you through how it works.

What Is Data Quality Check Setup in Snowsight?

Previously, to perform data quality checks in Snowflake, you had to define and configure Data Metric Functions (DMFs) using SQL. With this new feature, the following capabilities are now available directly from the Snowsight GUI:

  • Cortex Data Quality (AI-suggested): Cortex AI analyzes metadata characteristics and data usage patterns to automatically suggest quality check definitions (DMFs). Once you accept the suggestions, it periodically detects data quality issues.
  • Manual definition: You can manually define quality check types and criteria based on your own knowledge of the data.
  • Execution schedule adjustment: You can configure the execution frequency of quality checks using time-based, schedule-based, or DML trigger-based settings.

Cortex Data Quality leverages Snowflake Cortex's AI_COMPLETE function, and all data and metadata remain securely within Snowflake. It also fully respects Snowflake's access control — suggestions are made based only on data the user has access to.

Prerequisites

Quoting from the official documentation, you need to be aware of the following regarding editions, roles, and permissions:

  • Snowflake Edition: Enterprise or higher
    • For this walkthrough, I used a Snowflake trial account with Enterprise edition on the AWS Tokyo region.
  • Required privileges for the operating role:
    • OWNERSHIP privilege on the target table
    • EXECUTE DATA METRIC FUNCTION privilege on the account
    • SNOWFLAKE.DATA_METRIC_USER database role
    • SNOWFLAKE.CORTEX_USER database role
  • LLM Models:
    • The mistral-7b and llama3.1-8b models must be allowed in the CORTEX_MODELS_ALLOWLIST account parameter (they are allowed by default).

Preparing Test Data and Setting Permissions

Run the following SQL to create the objects and sample data for testing.

This data intentionally includes the following quality issues:

  • CUSTOMER_NAME is NULL (ORDER_ID: 5, 13)
  • EMAIL is NULL (ORDER_ID: 8, 13)
  • EMAIL has an invalid format (ORDER_ID: 22 → invalid-email)
  • QUANTITY is negative (ORDER_ID: 17 → -1)
  • QUANTITY is zero (ORDER_ID: 19 → 0)
  • TOTAL_AMOUNT is negative (ORDER_ID: 17 → -25.00)
  • STATUS contains an unexpected value (ORDER_ID: 25 → INVALID_STATUS)
-- Create database and schema for testing
USE ROLE SYSADMIN;
CREATE DATABASE IF NOT EXISTS DATA_QUALITY_DEMO;
CREATE SCHEMA IF NOT EXISTS DATA_QUALITY_DEMO.PUBLIC;

-- Create a warehouse for testing (you can use an existing one if available)
CREATE WAREHOUSE IF NOT EXISTS DATA_QUALITY_WH
  WAREHOUSE_SIZE = 'XSMALL'
  AUTO_SUSPEND = 60
  AUTO_RESUME = TRUE;

USE DATABASE DATA_QUALITY_DEMO;
USE SCHEMA PUBLIC;
USE WAREHOUSE DATA_QUALITY_WH;

-- Create a test table (simulating e-commerce order data)
CREATE OR REPLACE TABLE ORDERS (
    ORDER_ID INT,
    CUSTOMER_NAME VARCHAR(100),
    EMAIL VARCHAR(200),
    ORDER_DATE DATE,
    PRODUCT_NAME VARCHAR(200),
    QUANTITY INT,
    UNIT_PRICE DECIMAL(10, 2),
    TOTAL_AMOUNT DECIMAL(10, 2),
    STATUS VARCHAR(50),
    SHIPPING_COUNTRY VARCHAR(100),
    CREATED_AT TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);

-- Insert sample data (20+ records, intentionally including data quality issues)
INSERT INTO ORDERS (ORDER_ID, CUSTOMER_NAME, EMAIL, ORDER_DATE, PRODUCT_NAME, QUANTITY, UNIT_PRICE, TOTAL_AMOUNT, STATUS, SHIPPING_COUNTRY)
VALUES
    (1, 'Taro Yamada', 'taro.yamada@example.com', '2025-01-05', 'Wireless Mouse', 2, 25.00, 50.00, 'Shipped', 'Japan'),
    (2, 'Hanako Suzuki', 'hanako.suzuki@example.com', '2025-01-06', 'Mechanical Keyboard', 1, 89.99, 89.99, 'Delivered', 'Japan'),
    (3, 'John Smith', 'john.smith@example.com', '2025-01-07', 'USB-C Hub', 3, 35.00, 105.00, 'Shipped', 'USA'),
    (4, 'Emily Johnson', 'emily.j@example.com', '2025-01-08', 'Monitor Stand', 1, 45.00, 45.00, 'Processing', 'USA'),
    (5, NULL, 'unknown@example.com', '2025-01-09', 'Webcam HD', 1, 59.99, 59.99, 'Shipped', 'Canada'),
    (6, 'Kenji Tanaka', 'kenji.tanaka@example.com', '2025-01-10', 'Laptop Sleeve', 2, 29.99, 59.98, 'Delivered', 'Japan'),
    (7, 'Maria Garcia', 'maria.garcia@example.com', '2025-01-11', 'Wireless Earbuds', 1, 79.99, 79.99, 'Shipped', 'Spain'),
    (8, 'Li Wei', NULL, '2025-01-12', 'Phone Case', 5, 12.00, 60.00, 'Delivered', 'China'),
    (9, 'Sakura Ito', 'sakura.ito@example.com', '2025-01-13', 'Desk Lamp', 1, 42.00, 42.00, 'Cancelled', 'Japan'),
    (10, 'David Brown', 'david.brown@example.com', '2025-01-14', 'Portable Charger', 2, 30.00, 60.00, 'Shipped', 'UK'),
    (11, 'Yuki Watanabe', 'yuki.w@example.com', '2025-01-15', 'Bluetooth Speaker', 1, 55.00, 55.00, 'Processing', 'Japan'),
    (12, 'Anna Mueller', 'anna.mueller@example.com', '2025-01-16', 'Tablet Stand', 1, 25.00, 25.00, 'Delivered', 'Germany'),
    (13, NULL, NULL, '2025-01-17', 'HDMI Cable', 10, 8.99, 89.90, 'Shipped', 'France'),
    (14, 'Ryo Nakamura', 'ryo.nakamura@example.com', '2025-01-18', 'Mouse Pad XL', 1, 19.99, 19.99, 'Delivered', 'Japan'),
    (15, 'Sophie Martin', 'sophie.martin@example.com', '2025-01-19', 'USB Flash Drive', 3, 15.00, 45.00, 'Shipped', 'France'),
    (16, 'Takeshi Kobayashi', 'takeshi.k@example.com', '2025-01-20', 'Webcam HD', 1, 59.99, 59.99, 'Delivered', 'Japan'),
    (17, 'Chen Mei', 'chen.mei@example.com', '2025-01-21', 'Wireless Mouse', -1, 25.00, -25.00, 'Returned', 'China'),
    (18, 'James Wilson', 'james.wilson@example.com', '2025-01-22', 'Mechanical Keyboard', 1, 89.99, 89.99, 'Processing', 'Australia'),
    (19, 'Aoi Sato', 'aoi.sato@example.com', '2025-01-23', 'Monitor Stand', 0, 45.00, 0.00, 'Cancelled', 'Japan'),
    (20, 'Lucas Dubois', 'lucas.dubois@example.com', '2025-01-24', 'Laptop Sleeve', 2, 29.99, 59.98, 'Shipped', 'France'),
    (21, 'Mika Yoshida', 'mika.yoshida@example.com', '2025-01-25', 'Desk Lamp', 1, 42.00, 42.00, 'Delivered', 'Japan'),
    (22, 'Robert Taylor', 'invalid-email', '2025-01-26', 'Portable Charger', 1, 30.00, 30.00, 'Shipped', 'USA'),
    (23, 'Haruto Kimura', 'haruto.k@example.com', '2025-01-27', 'Bluetooth Speaker', 2, 55.00, 110.00, 'Delivered', 'Japan'),
    (24, 'Isabella Rossi', 'isabella.rossi@example.com', '2025-01-28', 'Phone Case', 3, 12.00, 36.00, 'Shipped', 'Italy'),
    (25, 'Yuto Hayashi', 'yuto.hayashi@example.com', '2025-01-29', 'USB-C Hub', 1, 35.00, 35.00, 'INVALID_STATUS', 'Japan');
Enter fullscreen mode Exit fullscreen mode

Next, grant the necessary privileges to the role you'll be using. In this case, I'm granting permissions to SYSADMIN.

USE ROLE ACCOUNTADMIN;

-- Grant EXECUTE DATA METRIC FUNCTION privilege
GRANT EXECUTE DATA METRIC FUNCTION ON ACCOUNT TO ROLE SYSADMIN;

-- Grant database roles
GRANT DATABASE ROLE SNOWFLAKE.DATA_METRIC_USER TO ROLE SYSADMIN;
GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE SYSADMIN;
Enter fullscreen mode Exit fullscreen mode

Setting Up Data Quality Checks Manually

Let's set up a data quality check manually against the data we just created.

Open the target table from Snowsight's Database Explorer and click the Data Quality tab.

2026-02-26_08h35_00

The following screen appears. Click Setup manually.

2026-02-26_08h36_10

A list of available data quality check candidates is displayed. Click the one you want to configure. (In this case, I'll click Nulls.)

2026-02-26_08h36_38

A screen appears with an English sentence containing dropdown lists for configuring the Nulls check.

2026-02-26_08h39_18

I configured it as shown below. Clicking Edit SQL lets you see the DMF definition query that will be generated. If everything looks good, click Save in the bottom right.

2026-02-26_08h41_22

2026-02-26_08h41_45

Looking at the Data Quality Monitoring view, you can see that the quality check has been added under the Accuracy section.

2026-02-26_08h44_15

Also, clicking Settings in the Monitoring section allows you to change the frequency of data quality checks. (The default was set to run every hour.)

2026-02-26_09h02_21

Setting Up AI-Powered Data Quality Checks (Cortex Data Quality)

Next, let's set up AI-powered data quality checks. (This feature is officially called Cortex Data Quality.)

On the Data Quality tab of the target data, click + Add quality check, then select Generate with Cortex Data Quality.

2026-02-26_09h37_02

The following screen appears, where AI scans the data content and generates the necessary data quality checks.

2026-02-26_09h37_32

For my data, after about 10 seconds, 10 data quality checks were generated as shown below.

It's great that the WHY IS THIS RECOMMENDED? column on the far right explains why each data quality check was suggested.

Check the boxes for the quality checks you need, then click Apply in the bottom right.

2026-02-26_09h43_29

By the way, after clicking Apply, the following message appeared for my test data. I believe the permissions were sufficient, but I wasn't able to determine the exact cause...

2026-02-26_09h45_27

After this, looking at the Data Quality Monitoring screen, you can see that the data quality checks have been added as shown below. (The one check showing an error is because the manually configured check from earlier had already been executed after one hour had passed.)

2026-02-26_09h48_48

Running the Configured Data Quality Checks

Finally, let's run the data quality checks we've configured.

From Settings, change the frequency to Every 5 minutes and wait about 5 minutes.

2026-02-26_09h50_19

After about 5 minutes, the Data Quality Monitoring screen displayed the results as shown below. Since you can also review the history of past quality check results, this is more than adequate as a quality monitoring feature. (If I had one wish, it would be nice to have notifications when data quality checks fail — but at least as of now, that doesn't seem to be configurable from Snowsight alone.)

2026-02-26_10h06_52

2026-02-26_10h07_12

2026-02-26_10h08_37

Conclusion

I tried out the new feature that lets you set up data quality checks using AI directly from Snowsight, and summarized my findings in this article.

I had previously tested the Data Quality tab when it was first introduced, and at that time I felt that defining DMFs via SQL queries was time-consuming and a bottleneck. However, with this new feature, the fact that DMF definitions can be easily generated by relying on AI is truly impressive!

This is a feature you can try right away, so I highly encourage you to give it a shot.

Top comments (0)