DEV Community

Tsubasa Kanno
Tsubasa Kanno

Posted on

Auto-Generate Snowflake Semantic Views with AI - A Developer's Fast-Track to Cortex Analyst

Introduction

Cortex Analyst is one of Snowflake's most exciting features, enabling revolutionary natural language data analysis. Business users who don't usually write queries can simply ask "What are the top 10 products by sales last month?" and get instant insights without SQL - true data democratization in action!

However, for Cortex Analyst to reach its full potential, you need a well-structured semantic layer as the foundation. Snowflake offers Semantic Models or Semantic Views to accomplish this, but creating semantic layers requires time and effort: understanding table structures, defining business logic, setting appropriate descriptions and synonyms, and more.

If you're thinking "I want to use Cortex Analyst, but setting up the semantic layer is taking forever..." - you're in the right place! Today I'm introducing a stored procedure that auto-generates semantic views using AI. This tool lets you quickly create a baseline semantic view from existing tables, giving you a fast start toward delivering analytics environments to business users.

Note: This stored procedure is designed to assist with initial semantic view creation. We recommend customizing the generated views according to your business requirements.

Note: At the June 2025 Snowflake Summit, an official semantic layer auto-generation feature was announced. This stored procedure serves as a stopgap solution until the official feature is released. When the official feature launches, I'll write about it as well!

Note: This article represents my personal views and not those of Snowflake.

What is a Semantic View?

A semantic view is a Snowflake schema-level object that adds business context to physical database tables. It maps technical column names (e.g., amt_ttl_pre_dsc) to business-friendly concepts (e.g., "Total Sales", "Sales Amount") with detailed descriptions (e.g., "Total sales value for stores in the metropolitan area"), bridging the gap between data and business.

Even more importantly, semantic views help AI correctly understand database structure and business logic. Cortex Analyst can generate high-precision SQL queries from natural language questions precisely because of the semantic layer. Through synonyms and detailed descriptions, AI learns to understand that "last month's sales", "total sales", and "sales amount" all refer to the same concept, enabling it to select the right tables and columns for accurate query generation.

Components of a Semantic View

Semantic views consist of these elements:

Element Description Examples
Facts Primarily numeric fact data (row-level attributes) Sales quantity, unit price, amount, etc.
Dimensions Category and attribute data (analysis axes) Product name, customer ID, date, region, etc.
Metrics Aggregated indicators (business KPIs) Total sales, average unit price, sales growth rate, etc.

Each element can be defined with:

  • Synonyms: Business aliases and natural phrasings (e.g., "sales" → "revenue", "sales amount")
  • Comment: Business explanations including units, ranges, and other details

Why Semantic Views Matter

Semantic views provide these key values:

  1. Improved Cortex Analyst Accuracy: Serves as the foundation for AI to understand natural language and generate accurate SQL
  2. Consistent Business Definitions: Unified business metrics across the entire organization
  3. Data Democratization: Enables non-technical users to access data

For more details, see the official Snowflake documentation - Semantic View Overview.

Note: For information about Semantic Models (YAML format), see the Cortex Analyst Semantic Model Documentation. This article focuses on the newer Semantic Views feature.

Challenges in Creating Semantic Layers

Creating semantic layers manually involves these tasks:

  1. Understanding table structure (table definitions, column names, data types, sample data review)
  2. Classifying Facts, Dimensions, and Metrics
  3. Identifying business-relevant synonyms actually used in the field
  4. Considering appropriate descriptions (Comments) for each element
  5. Defining the structure in SQL syntax

For large tables or multiple tables, this process takes considerable time. However, spending too much time on preparation delays the delivery of analytics environments to business users - having a quality baseline to start with is extremely valuable!

AI-Powered Auto-Generation Stored Procedure

Overview

The GENERATE_SEMANTIC_VIEW stored procedure I'm introducing auto-generates semantic views through these steps:

  1. Retrieve Table Information: Get column info, data types, and comments using DESCRIBE TABLE
  2. Analyze Sample Data: Infer actual usage and data characteristics from the first 5 rows
  3. AI-Generated SQL: LLM via AI_COMPLETE function (like claude-sonnet-4-5) generates semantic view DDL
  4. Create Semantic View: Execute the generated SQL

Key Features

Feature Description
Fully Automated Automatically classifies Facts, Dimensions, and Metrics by just specifying table name
Rich Synonyms Infers actual usage from sample data to generate natural business-relevant synonyms
Detailed Descriptions Auto-generates Comments with business meaning for each element
Customizable Specify which LLM model to use (default: claude-sonnet-4-5)

Stored Procedure Code

The complete code for the GENERATE_SEMANTIC_VIEW stored procedure is provided below.

CREATE OR REPLACE PROCEDURE GENERATE_SEMANTIC_VIEW(
    SOURCE_DATABASE VARCHAR,      -- Source database name
    SOURCE_SCHEMA VARCHAR,        -- Source schema name
    SOURCE_TABLE VARCHAR,         -- Source table name
    TARGET_DATABASE VARCHAR,      -- Target database name
    TARGET_SCHEMA VARCHAR,        -- Target schema name
    TARGET_VIEW_NAME VARCHAR,     -- Semantic view name to create
    LLM_MODEL VARCHAR DEFAULT 'claude-sonnet-4-5' -- LLM model to use
)
RETURNS VARCHAR
LANGUAGE SQL
EXECUTE AS CALLER
AS
$$
DECLARE
    -- Table information
    column_info VARCHAR;
    sample_data_json VARCHAR;
    table_comment VARCHAR;

    -- AI generation results
    ai_response VARCHAR;

    -- Final SQL
    create_view_sql VARCHAR;

BEGIN
    -- ================================================================================
    -- STEP 1: Retrieve table information
    -- ================================================================================

    -- Get column information (using DESCRIBE TABLE)
    DECLARE
        describe_sql VARCHAR;
        rs RESULTSET;
    BEGIN
        describe_sql := 'DESCRIBE TABLE ' || :SOURCE_DATABASE || '.' || :SOURCE_SCHEMA || '.' || :SOURCE_TABLE;
        rs := (EXECUTE IMMEDIATE :describe_sql);

        -- Get information from results
        -- $1: Column name, $2: Data type, $10: Comment
        SELECT LISTAGG(
            $1 || ':' || $2 || CASE WHEN $10 IS NOT NULL AND $10 != '' THEN '(' || $10 || ')' ELSE '' END,
            ', '
        ) INTO column_info
        FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()));

    EXCEPTION
        WHEN OTHER THEN
            RETURN 'Error: Could not retrieve table information - ' || :SOURCE_DATABASE || '.' || :SOURCE_SCHEMA || '.' || :SOURCE_TABLE || '. Error: ' || SQLERRM;
    END;

    -- Error check
    IF (column_info IS NULL OR LENGTH(:column_info) = 0) THEN
        RETURN 'Error: Column information is empty - ' || :SOURCE_DATABASE || '.' || :SOURCE_SCHEMA || '.' || :SOURCE_TABLE;
    END IF;

    -- Get table comment
    SELECT COMMENT INTO table_comment
    FROM INFORMATION_SCHEMA.TABLES
    WHERE TABLE_CATALOG = :SOURCE_DATABASE
      AND TABLE_SCHEMA = :SOURCE_SCHEMA
      AND TABLE_NAME = :SOURCE_TABLE;

    -- Get sample data (first 5 rows)
    DECLARE
        sample_sql VARCHAR;
        sample_rs RESULTSET;
    BEGIN
        sample_sql := 'SELECT * FROM ' || :SOURCE_DATABASE || '.' || :SOURCE_SCHEMA || '.' || :SOURCE_TABLE || ' LIMIT 5';
        sample_rs := (EXECUTE IMMEDIATE :sample_sql);

        -- Get sample data in JSON format
        SELECT TO_VARCHAR(ARRAY_AGG(OBJECT_CONSTRUCT(*))) INTO sample_data_json
        FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()));

    EXCEPTION
        WHEN OTHER THEN
            sample_data_json := 'Sample data unavailable';
    END;

    -- ================================================================================
    -- STEP 2: Generate complete Semantic View definition with AI
    -- ================================================================================

    SELECT SNOWFLAKE.CORTEX.COMPLETE(
        :LLM_MODEL,
        CONCAT(
            '[IMPORTANT] Generate a Snowflake semantic view using the information below.',
            '\n\n★ DO NOT use any column names that do not exist ★',
            '\n\n[TABLE INFORMATION]',
            '\nTable: ', :SOURCE_DATABASE, '.', :SOURCE_SCHEMA, '.', :SOURCE_TABLE,
            '\nAlias: ', :SOURCE_TABLE,
            '\nTable Comment: ', COALESCE(:table_comment, 'None'),
            '\n\n[EXISTING COLUMNS (Use ONLY these)]',
            '\n', :column_info,
            '\n\n[SAMPLE DATA (For understanding data usage)]',
            '\n', SUBSTRING(:sample_data_json, 1, 1000),
            '\n\n[SQL TO GENERATE]',
            '\nCREATE OR REPLACE SEMANTIC VIEW ', :TARGET_DATABASE, '.', :TARGET_SCHEMA, '.', :TARGET_VIEW_NAME,
            '\n  TABLES (',
            '\n    ', :SOURCE_TABLE, ' AS ', :SOURCE_DATABASE, '.', :SOURCE_SCHEMA, '.', :SOURCE_TABLE,
            '\n      WITH SYNONYMS (''alias1'', ''alias2'', ''alias3'', ''alias4'')',
            '\n      COMMENT = ''Detailed table description''',
            '\n  )',
            '\n  FACTS (',
            '\n    ', :SOURCE_TABLE, '.fact_name AS existing_column_name',
            '\n      WITH SYNONYMS (''alias1'', ''alias2'')',
            '\n      COMMENT = ''Detailed column description''',
            '\n  )',
            '\n  DIMENSIONS (',
            '\n    ', :SOURCE_TABLE, '.dim_name AS existing_column_name',
            '\n      WITH SYNONYMS (''alias1'', ''alias2'')',
            '\n      COMMENT = ''Detailed column description''',
            '\n  )',
            '\n  METRICS (',
            '\n    ', :SOURCE_TABLE, '.metric_name AS SUM(existing_column_name)',
            '\n      WITH SYNONYMS (''alias1'', ''alias2'')',
            '\n      COMMENT = ''Detailed column description''',
            '\n  )',
            '\n  COMMENT = ''Detailed semantic view description'';',
            '\n\n[ABSOLUTE RULES]',
            '\n★★★ MOST IMPORTANT ★★★ Use ONLY column names from the above list',
            '\n\n0. Omit Primary Key:',
            '\n   - Do not write PRIMARY KEY clause',
            '\n   - Primary key constraints do not function in Snowflake and are optional',
            '\n\n1. Column Name Usage:',
            '\n   - Column names used on the right side of AS must also exist in the list',
            '\n   - Example OK: YEAR(TRANSACTION_DATE) ← only if TRANSACTION_DATE exists',
            '\n   - Example NG: YEAR(ORDER_DATE) ← if ORDER_DATE does not exist',
            '\n\n2. FACTS (Numeric Data):',
            '\n   - Quantity, amount, price, score, rate - numeric type columns',
            '\n   - Include both simple column references and calculated formulas',
            '\n   - Example (simple): QUANTITY, PRICE, AMOUNT',
            '\n   - Example (calculated): revenue AS QUANTITY * UNIT_PRICE, discount_amount AS TOTAL_PRICE * 0.1',
            '\n\n3. DIMENSIONS (Attribute Data):',
            '\n   - IDs, names, dates, categories, statuses',
            '\n   - Do not overlap with FACTS',
            '\n\n4. No Duplicate Names: All table.name combinations on the left must be unique',
            '\n\n5. Date Derivative Naming: original_column_name_year, original_column_name_month',
            '\n\n6. Rich Synonyms (Critical for Cortex Analyst to understand user requests):',
            '\n   - Tables: 4-5 business-relevant aliases',
            '\n   - Each DIMENSION, FACT, METRIC: 2-3 natural aliases',
            '\n   - Infer appropriate synonyms from sample data actual usage',
            '\n   - Example: sales amount → "total sales", "sales total", "revenue"',
            '\n   - Example: product ID → "product code", "item number", "SKU"',
            '\n\n7. Rich COMMENTS:',
            '\n   - Generate specific descriptions from sample data',
            '\n   - Include business meaning',
            '\n   - Specify units and ranges',
            '\n\n8. Utilize Sample Data:',
            '\n   - Understand actual usage from sample data content',
            '\n   - Generate more concrete, practical synonyms',
            '\n   - Reflect data characteristics (range, units, etc.) in COMMENTS',
            '\n\n9. Generate Abundant FACTS and DIMENSIONS:',
            '\n   - FACTS: Around 5-10 (simple columns + calculated formulas)',
            '\n   - DIMENSIONS: Around 5-15 (original columns + date derivatives)',
            '\n   - METRICS: Around 4-8 (various aggregation patterns)',
            '\n\n10. Output: SQL syntax only (no ```, no explanatory text)',
            '\n\nOutput only the CREATE SEMANTIC VIEW statement:'
        )
    ) INTO ai_response;

    -- Debug: Verify AI response length
    IF (:ai_response IS NULL OR LENGTH(:ai_response) = 0) THEN
        RETURN 'Error: AI_COMPLETE returned an empty response. Column info: ' || SUBSTRING(:column_info, 1, 500);
    END IF;

    -- Remove unnecessary characters
    create_view_sql := :ai_response;
    create_view_sql := REPLACE(:create_view_sql, '```sql', '');
    create_view_sql := REPLACE(:create_view_sql, '```', '');
    create_view_sql := REPLACE(:create_view_sql, '"', '''');
    create_view_sql := REPLACE(:create_view_sql, 'Revised:', '');
    create_view_sql := REPLACE(:create_view_sql, 'Modifications:', '');
    create_view_sql := REPLACE(:create_view_sql, '**', '');
    create_view_sql := TRIM(:create_view_sql);

    -- Final verification: Check SQL statement is not empty
    IF (:create_view_sql IS NULL OR LENGTH(:create_view_sql) < 50) THEN
        RETURN 'Error: Generated SQL is empty or too short. AI response: ' || SUBSTRING(:ai_response, 1, 1000);
    END IF;

    -- ================================================================================
    -- STEP 4: Create semantic view
    -- ================================================================================

    BEGIN
        EXECUTE IMMEDIATE :create_view_sql;
        RETURN 'Semantic view ' || :TARGET_DATABASE || '.' || :TARGET_SCHEMA || '.' || :TARGET_VIEW_NAME || ' created successfully. SQL: ' || :create_view_sql;
    EXCEPTION
        WHEN OTHER THEN
            RETURN 'Error: ' || SQLERRM || ' -- Generated SQL: ' || SUBSTRING(:create_view_sql, 1, 2000);
    END;

END;
$$;

-- ================================================================================
-- Usage Example
-- ================================================================================

/*
-- Basic usage
CALL GENERATE_SEMANTIC_VIEW(
    '<source_database>',
    '<source_schema>',
    '<source_table>',
    '<target_database>',
    '<target_schema>',
    '<semantic_view_name>',
    '<LLM_model_name>'
);

-- Verify created semantic view
DESCRIBE SEMANTIC VIEW <target_database>.<target_schema>.<semantic_view_name>;
*/
Enter fullscreen mode Exit fullscreen mode

Note: You can copy and paste this stored procedure directly into your Snowflake environment for immediate use.

How to Use

Prerequisites

To use this stored procedure, you'll need:

  • A Snowflake account
  • An environment where Cortex LLM is available (cross-region inference makes this available in most regions)
  • Permissions to create semantic views

Basic Usage Example

Here's how to generate a semantic view from the SALES_DATABASE.PUBLIC.ORDERS table:

-- Call the stored procedure
CALL GENERATE_SEMANTIC_VIEW(
    'SALES_DATABASE',           -- Source database
    'PUBLIC',                   -- Source schema
    'ORDERS',                   -- Source table
    'ANALYTICS_DATABASE',       -- Target database
    'SEMANTIC',                 -- Target schema
    'ORDERS_SEMANTIC_VIEW',     -- Semantic view name
    'claude-sonnet-4-5'         -- LLM model (optional)
);

-- Verify the created semantic view
DESCRIBE SEMANTIC VIEW ANALYTICS_DATABASE.SEMANTIC.ORDERS_SEMANTIC_VIEW;
Enter fullscreen mode Exit fullscreen mode

Example Generated Semantic View

For example, given this table structure:

-- Sample table
CREATE TABLE ORDERS (
    ORDER_ID NUMBER,
    CUSTOMER_ID NUMBER,
    ORDER_DATE DATE,
    PRODUCT_NAME VARCHAR,
    QUANTITY NUMBER,
    UNIT_PRICE NUMBER,
    TOTAL_AMOUNT NUMBER
);
Enter fullscreen mode Exit fullscreen mode

The stored procedure auto-generates a semantic view like this:

CREATE OR REPLACE SEMANTIC VIEW ANALYTICS_DATABASE.SEMANTIC.ORDERS_SEMANTIC_VIEW
  TABLES (
    ORDERS AS SALES_DATABASE.PUBLIC.ORDERS
      WITH SYNONYMS ('Order Table', 'Sales Data', 'Purchase History', 'Transactions')
      COMMENT = 'Table managing customer order information'
  )
  FACTS (
    ORDERS.quantity AS QUANTITY
      WITH SYNONYMS ('Qty', 'Amount Ordered', 'Number of Items')
      COMMENT = 'Quantity of products ordered',
    ORDERS.unit_price AS UNIT_PRICE
      WITH SYNONYMS ('Price', 'Item Price', 'Product Price')
      COMMENT = 'Price per product unit',
    ORDERS.total_amount AS TOTAL_AMOUNT
      WITH SYNONYMS ('Total', 'Order Amount', 'Purchase Total')
      COMMENT = 'Total order amount'
  )
  DIMENSIONS (
    ORDERS.order_id AS ORDER_ID
      WITH SYNONYMS ('Order Number', 'Transaction ID', 'Order Ref')
      COMMENT = 'Unique identifier for the order',
    ORDERS.customer_id AS CUSTOMER_ID
      WITH SYNONYMS ('Client ID', 'Customer Number', 'Account ID')
      COMMENT = 'Customer identifier',
    ORDERS.order_date AS ORDER_DATE
      WITH SYNONYMS ('Purchase Date', 'Transaction Date', 'Date Ordered')
      COMMENT = 'Date the order was placed',
    ORDERS.product_name AS PRODUCT_NAME
      WITH SYNONYMS ('Product', 'Item Name', 'Item')
      COMMENT = 'Name of the ordered product'
  )
  METRICS (
    ORDERS.total_sales AS SUM(TOTAL_AMOUNT)
      WITH SYNONYMS ('Total Revenue', 'Sales Total', 'Revenue')
      COMMENT = 'Total sales amount for all orders',
    ORDERS.avg_order_amount AS AVG(TOTAL_AMOUNT)
      WITH SYNONYMS ('Average Order Value', 'AOV', 'Avg Purchase')
      COMMENT = 'Average amount per order',
    ORDERS.total_orders AS COUNT(*)
      WITH SYNONYMS ('Order Count', 'Number of Orders', 'Total Transactions')
      COMMENT = 'Total number of orders'
  )
  COMMENT = 'Semantic view for order data analysis';
Enter fullscreen mode Exit fullscreen mode

Real-World Use Cases

Use Case 1: Fast Start

When introducing Cortex Analyst to a new project, use this stored procedure to quickly generate semantic views for key tables. Then customize only the necessary parts according to business requirements and add relationships between tables - several times faster than building from scratch, helping you set up your semantic layer efficiently.

Use Case 2: Data Catalog

Generated semantic views contain rich synonyms and comments, making them excellent table documentation. They serve as reference material for new team members learning the table structure or can be loaded into AI Agents as an AI-oriented data catalog.

Use Case 3: POC (Proof of Concept)

For POCs validating Cortex Analyst's effectiveness, you often don't want to spend much time creating semantic layers. This stored procedure lets you build a demo environment in minutes and immediately experience Cortex Analyst's value.

Customization Tips

While auto-generated semantic views may work as-is, customizing these points will achieve even more accurate query generation:

Note: You can customize semantic views not just through SQL commands (ALTER SEMANTIC VIEW), but also via GUI in Snowsight's AI/ML Studio. AI/ML Studio lets you visually confirm the semantic view structure while intuitively adding synonyms, editing comments, and adding new Metrics.

1. Adding/Modifying Synonyms

Add industry-specific terms and internal company terminology to help Cortex Analyst better understand users' natural language:

-- Example: Add industry terms
ALTER SEMANTIC VIEW ORDERS_SEMANTIC_VIEW 
MODIFY DIMENSION ORDERS.product_name 
  ADD SYNONYMS ('SKU', 'Item Code', 'Barcode');
Enter fullscreen mode Exit fullscreen mode

2. Adding Complex Calculated Metrics

Add advanced calculation formulas according to business logic:

-- Example: Add profit margin
ALTER SEMANTIC VIEW ORDERS_SEMANTIC_VIEW 
ADD METRIC ORDERS.profit_margin AS (SUM(TOTAL_AMOUNT) - SUM(COST)) / SUM(TOTAL_AMOUNT)
  WITH SYNONYMS ('Margin', 'Profit Rate', 'Return Rate')
  COMMENT = 'Profit ratio relative to sales';
Enter fullscreen mode Exit fullscreen mode

3. Integrating Multiple Tables into a Semantic View

To analyze across multiple tables, integrate individually generated semantic views and define relationships between tables:

-- Example: Create semantic view containing both ORDERS and CUSTOMERS tables
-- First, generate semantic views for each table individually,
-- Then integrate into a new semantic view

CREATE OR REPLACE SEMANTIC VIEW SALES_ANALYSIS_VIEW
  TABLES (
    ORDERS AS SALES_DATABASE.PUBLIC.ORDERS
      WITH SYNONYMS ('Orders', 'Sales Data'),
    CUSTOMERS AS SALES_DATABASE.PUBLIC.CUSTOMERS
      WITH SYNONYMS ('Customers', 'Client Info')
  )
  RELATIONSHIPS (
    ORDERS.CUSTOMER_ID = CUSTOMERS.CUSTOMER_ID
  )
  -- Facts, Dimensions, Metrics omitted
  ;
Enter fullscreen mode Exit fullscreen mode

For detailed customization options, see the official Snowflake documentation - Using SQL commands to create and manage semantic views.

Integration with Cortex Analyst

Generated semantic views are immediately usable with Cortex Analyst.

Verifying in Snowsight

After creating your semantic view, verify it works right away in Snowsight's AI/ML Studio Cortex Analyst playground. For implementation, Snowflake offers various options including calling from Snowflake Intelligence, calling from Cortex Agents, or configuring as a tool in Snowflake MCP Server.

Important Notes

Tool Limitations

This stored procedure has the following constraint:

  • Single Table Only: This tool generates semantic views from a single table only. To create semantic views combining multiple tables, first generate views for each table individually, then manually add relationships between tables.

Generated Semantic View Quality

As this uses AI auto-generation, please note:

  • Not Perfect: Always review and customize generated semantic views according to business requirements
  • Sample Data Dependency: Generated synonyms and descriptions vary based on sample data content
  • Complex Logic: Advanced business logic requires manual additions and modifications

Conclusion

This AI auto-generation stored procedure is a tool that dramatically accelerates the initial stages of semantic view creation. While semantic layer preparation is essential for Cortex Analyst to deliver "natural language data analysis" - that ideal world - it would be a shame if this necessary work becomes a barrier that delays your start!

Using this stored procedure, you can generate baseline semantic views in seconds, then customize necessary parts from there. As a result, you can deliver analytics environments to business departments in a much shorter timeframe.

Take the first step toward realizing data democratization and empowering business users to extract insights from data themselves - I hope this tool helps you get there!

Promotion

Snowflake What's New Updates on X

I share Snowflake What's New updates on X. Follow for the latest insights:

English Version

Snowflake What's New Bot (English Version)

Japanese Version

Snowflake's What's New Bot (Japanese Version)

Change Log

(20251009) Initial post

Original Japanese Article

https://zenn.dev/tsubasa_tech/articles/ce732d628a9bee

Top comments (0)