This article is an English translation of the Japanese version.
Hello, this is Sagara.
One of the new features announced at Snowflake Summit 2025 is "Cortex AISQL," which enables multimodal data analysis of text, images, and other data types using SQL commands.
https://www.snowflake.com/ja/blog/ai-sql-query-language/
I've tried out this Cortex AISQL feature, so I'd like to summarize my experience.
What is Cortex AISQL?
Cortex AISQL is a new feature that directly integrates AI capabilities into Snowflake's SQL engine, enabling analysis of diverse data types (multimodal data) such as text and images using familiar SQL commands.
https://docs.snowflake.com/en/user-guide/snowflake-cortex/aisql
The main characteristics include:
- AI utilization with familiar SQL: Data analysts can perform advanced analysis like AI engineers without complex coding.
- High-speed and low-cost: Deeply integrated with Snowflake's engine, it's over 30% faster than traditional AI implementations and achieves up to 60% cost reduction.
-
Support for diverse data types: The new
FILEdata type enables seamless analysis combining unstructured data like text and images with structured data.
The following Snowflake Summit session report article is also very helpful for understanding Cortex AISQL:
https://zenn.dev/finatext/articles/snowflake-summit-2025-report-day3
Differences from Previously Available LLM Functions
At this point, you might think, "Wait, wasn't AI usage within SQL already provided through a feature set called LLM Functions?"
These LLM Functions have been integrated into the Cortex AISQL feature set with the release of Cortex AISQL. The figure below is from the Cortex AISQL documentation, which states "(including previous LLM functions)."
Cost
The cost structure is as shown in the figure below (credit consumption per million tokens is listed).
※Quoted from the following link:
https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf
What We'll Try
There's an official Quickstart available for trying out Cortex AISQL, so let's give it a try.
Prerequisites
Following the "2.Setup" section from the link below, let's prepare the environment.
- Launch a worksheet in your Snowflake account and execute all queries in setup.sql from top to bottom
- Download all image files from this folder and upload them to the stage
@DASH_DB.DASH_SCHEMA.DASH_IMAGE_FILEScreated in step 1 - Launch a worksheet in your Snowflake account and execute all queries in images.sql from top to bottom
- When I tried it, there was
USE WAREHOUSE DASH_S;which caused an error, so I changed it toUSE WAREHOUSE DASH_WH_S;and executed it. - This query creates a table containing the
IMG_FILEcolumn ofFILEtype using theTO_FILEfunction based on the screenshots uploaded to the stage.
- When I tried it, there was
- Download cortex_aisql.ipynb and create a Notebook by uploading it to your Snowflake account following the steps shown below
After the Notebook launches, execute the top cell that imports libraries.
Trying Cortex AISQL by Executing SQL in the Notebook
AI_COMPLETE Function
First, let's execute the following query in Identify Customer Issues Across Text and Image Data Formats.
This query uses the AI_COMPLETE function to perform the following tasks. It also uses the PROMPT function to dynamically insert column values specified as arguments (using {0} to specify which argument column to insert):
- WITH clause
IMAGE_INSIGHTS: Summarizes the content of screenshot images in theIMAGEStable using AI. - WITH clause
EMAIL_INSIGHTS: Summarizes the content of inquiry emails in theEMAILStable using AI. - UNION and CREATE TABLE: Unions the above two results and finally saves them as a new table called
INSIGHTS.
create table if not exists insights as
with IMAGE_INSIGHTS as (
select created_at,user_id,relative_path as ticket_id,img_file as input_file,file_url as input_file_url,
AI_COMPLETE('pixtral-large', prompt('Summarize this issue shown in this screenshot in one concise sentence: {0}', img_file)) as summary, summary as content
from images
),
EMAIL_INSIGHTS as (
select created_at,user_id,ticket_id::text as ticket_id,null as input_file,'' as input_file_url,content as content,
AI_COMPLETE('claude-3-7-sonnet', prompt('Summarize this issue in one concise sentence.
If the user mentioned anything related to music preference, please keep that information: {0}', content)) as summary
from emails
)
select 'Image' as source, created_at, user_id, ticket_id, input_file, input_file_url, content, summary
from IMAGE_INSIGHTS
union
select 'Email' as source, created_at, user_id, ticket_id, input_file, input_file_url, content, summary
from EMAIL_INSIGHTS;
After creating this table, executing the following SQL in the Notebook's Consolidated Data Across Text, Image, and Audio Data Formats section shows summaries of the target screenshots and emails in the SUMMARY column, and the original email text in the CONTENT column. (Looking at the CTAS statement above, the image-related descriptions in the CONTENT column output exactly the same content as the SUMMARY column.)
AI_FILTER Function in JOIN Clauses
As background, the SOLUTION_CENTER_ARTICLES table summarizes solutions to troubles in the SOLUTION column as shown below.
The query in Semantically JOIN Issues with Solutions Library uses the AI_FILTER function to join and output results when the SOLUTION column in the SOLUTION_CENTER_ARTICLES table seems to match as a solution for the customer problems held in the CONTENT column of the INSIGHTS table created earlier using the AI_COMPLETE function.
Multiple solution columns may match in some cases, such as when the TICKET_ID column is 177 in the figure below.
select
c.ticket_id,
c.content as "CUSTOMER ISSUE",
s.solution,
c.created_at,
from
INSIGHTS c
left join
SOLUTION_CENTER_ARTICLES s
on AI_FILTER(prompt('You are provided a customer issue and a solution center article. Please check if the solution article can address customer concerns. Reminder to check if the error details are matching. Customer issues: {0}; \n\nSolution: {1}', content, s.solution))
order by created_at asc;
Important note: When executing JOIN operations using the AI_FILTER function, the number of rows in each table within the JOIN cannot exceed "500". (From the official docs) This makes sense, as JOINs with tens of thousands of rows would quickly result in exponentially large token counts...
To enable large-scale joins with AI_FILTER function JOIN operations, contact your account manager to enable the adaptive optimization preview.
AI_AGG Function
In Aggregate Top Pain Points By Month, Python code is written, but let's first look at just the SQL using the AI_AGG function contained in this Python code.
The following SQL uses the AI_AGG function to analyze the values in the column specified as the first argument with the instruction: "Analyze all the support ticket reviews and provide a comprehensive list of all issues mentioned. Format your response as a bulleted list of issues with their approximate frequency in percentage."
select
monthname(created_at) as month,
count(*) as total_tickets,
count(distinct user_id) as unique_users,
AI_AGG(summary,'Analyze all the support ticket reviews and provide a comprehensive list of all issues mentioned.
Format your response as a bulleted list of issues with their approximate frequency in percentage.') as top_issues_reported,
from (select * from insights order by random() limit 200)
group by month
order by total_tickets desc,month desc;
The Python code in the Notebook then visualizes these aggregation results using Streamlit. Although the target records are changed using the RANDOM function, it's concerning that the output format varies slightly between the first and second executions, but this might be unavoidable due to the nature of generative AI...
AGGREGATE_PROMPT = """
Analyze all the support ticket reviews and provide a comprehensive list of all issues mentioned.
Format your response as a bulleted list of issues with their approximate frequency in percentage.
"""
sql = f"""
select
monthname(created_at) as month,
count(*) as total_tickets,
count(distinct user_id) as unique_users,
AI_AGG(summary,'{AGGREGATE_PROMPT}') as top_issues_reported,
from (select * from insights order by random() limit 200)
group by month
order by total_tickets desc,month desc
"""
df = session.sql(sql).to_pandas()
rows_to_display = 1
for row in df[:rows_to_display].itertuples():
st.subheader(f"Aggregated Insights for {row.MONTH}")
st.metric("Total Tickets",row.TOTAL_TICKETS)
st.metric("Unique Users", row.UNIQUE_USERS)
st.subheader("Top Issues")
st.markdown(row.TOP_ISSUES_REPORTED)
st.divider()
df_long = df.melt(id_vars='MONTH', value_vars=['TOTAL_TICKETS', 'UNIQUE_USERS'],
var_name='Metric', value_name='Total')
chart = alt.Chart(df_long).mark_bar().encode(
y=alt.Y('MONTH:N', sort='-x'),
x=alt.X('Total:Q'),
color=alt.Color('Metric:N', scale=alt.Scale(scheme='tableau10')),
tooltip=['MONTH', 'Metric', 'Total']
).properties(height=300)
st.altair_chart(chart, use_container_width=True)
- First execution
- Second execution
AI_CLASSIFY Function
In Classification, the following query is described. This query uses both AI_FILTER and AI_CLASSIFY functions to perform the following:
- Use
AI_FILTERfunction to extract only music-related emails from theINSIGHTStable to create aFILTEREDtable - Use
AI_CLASSIFYfunction to classify which music genre from theMUSIC_GENRESspecification theFILTEREDtable corresponds to
SET FILTER_PROMPT = '
I am trying to find if the customer has mentioned any music genre perference in their comment.
Does this comment mentioning specific music genre preference from the customer?: {0}';
SET CLASSIFY_PROMPT = 'Please help me classify the music preference mentioned in this comment: ';
SET MUSIC_GENRES = 'Electronic/Dance Music (EDM), Jazz, Indie/Folk, Rock, Classical, World Music, Blues, Pop';
create table if not exists filtered as
select * from
(select * from insights order by random() limit 500)
where AI_FILTER(prompt($FILTER_PROMPT, summary));
select
source,
summary,
AI_CLASSIFY($CLASSIFY_PROMPT || summary,SPLIT($MUSIC_GENRES, ','))['labels'][0] as classified_label
from filtered;
Additionally, visualization using Streamlit was also performed based on these classification results.
Conclusion
I tried out "Cortex AISQL," one of the new features announced at Snowflake Summit 2025, which enables multimodal data analysis of text, images, and other data types using SQL commands.
As a genuine impression, functions like AI_FILTER and AI_AGG use completely different syntax from traditional SQL! However, I also felt that Cortex AISQL would be very useful in cases where clear join conditions cannot be created for text data or image data.














Top comments (0)