⚠️ Major revision notice (April 2026)
More than 18 months have passed since the initial 2024/7 post, and Snowflake's Cortex AI Functions have evolved dramatically. This article has been fully rewritten in April 2026. Key changes:
SNOWFLAKE.CORTEX.COUNT_TOKENShas been superseded byAI_COUNT_TOKENS, which is now GA. This article is now built around the new function.- Replaced
SNOWFLAKE.CORTEX.COMPLETEwithAI_COMPLETE, and added support for the new Cortex AI Functions family (AI_EMBED,AI_SENTIMENT,AI_TRANSLATE,AI_CLASSIFY, etc.).- The model list is now fetched dynamically from
SHOW MODELS IN SNOWFLAKE.MODELS.- The Streamlit code has been rewritten to use bind variables (parameter binding) to prevent SQL injection.
Introduction
Hello, I'm a Solutions Engineer at Snowflake. I'd like to share some of my experiences and experiments with you through various posts.
Back in 2024, the common need was simply "let me quickly check how many tokens a prompt uses for Cortex LLM." In 2026, with the rapid spread of RAG (Retrieval Augmented Generation) and AI Agent workloads, estimating the total token count — including prompts, context, classification categories, source/target languages, and more — has become a critical concern in real-world operations.
The recently GA'd AI_COUNT_TOKENS lets you estimate token counts more accurately on a per-function basis. In this article, I'll walk you through building a Streamlit in Snowflake (SiS) app that helps you check token counts and costs at a glance using this new function.
Note: This post represents my personal views and not those of Snowflake.
Note (April 2026): Some features covered here are in Preview or subject to change. Models and credit prices are added, removed, and adjusted frequently, so please always refer to the latest official documentation and the Snowflake Credit Consumption Table (PDF).
What is Streamlit in Snowflake (SiS)?
Streamlit is a Python library that lets you build web UIs with simple Python code — no HTML, CSS, or JavaScript required. Take a look at the App Gallery to get a feel for what's possible.
Streamlit in Snowflake (SiS for short) lets you develop and run Streamlit web apps directly on Snowflake. It's easy to get started with just a Snowflake account, and the biggest advantage is how naturally you can embed Snowflake tables and Cortex AI functions into your apps.
About Streamlit in Snowflake (Official Snowflake Documentation)
What are Snowflake Cortex AI and Cortex AI Functions?
Snowflake Cortex AI is the umbrella term for Snowflake's generative AI capabilities. Within that, Cortex AI Functions let you call LLMs, embedding models, and task-specific functions as one-line SQL functions running on Snowflake. They're also available from Python.
As of 2026, the main functions include:
| Function | Purpose |
|---|---|
AI_COMPLETE |
General-purpose LLM generation (summarization, Q&A, classification, code generation, anything) |
AI_EMBED |
Convert text (or images) into embedding vectors |
AI_CLASSIFY |
Classify text against user-defined label candidates |
AI_SENTIMENT |
Sentiment analysis (positive / negative / neutral, etc.) |
AI_SIMILARITY |
Numeric similarity between two inputs |
AI_TRANSLATE |
Multilingual translation |
AI_AGG / AI_SUMMARIZE_AGG / AI_FILTER / AI_EXTRACT / AI_TRANSCRIBE / AI_PARSE_DOCUMENT / AI_REDACT, etc. |
Aggregation, summarization, filtering, extraction, audio transcription, PDF parsing, PII masking |
The legacy SNOWFLAKE.CORTEX.COMPLETE and SNOWFLAKE.CORTEX.COUNT_TOKENS are still available, but Snowflake officially recommends using the new AI_-prefixed functions.
Snowflake Cortex AI Functions (Official Documentation)
Functions supported by AI_COUNT_TOKENS
AI_COUNT_TOKENS supports the following 6 Cortex AI Functions (explicitly listed in the official release notes):
| First argument | Purpose |
|---|---|
ai_complete |
Tokens in an LLM prompt (requires a model name) |
ai_embed |
Tokens fed into an embedding model (requires a model name) |
ai_classify |
Tokens including classification categories, task descriptions, and examples |
ai_sentiment |
Tokens including the system prompt used for sentiment analysis |
ai_similarity |
Combined tokens of the two inputs to similarity |
ai_translate |
Tokens including source/target language specifications |
To keep the UI simple, this app targets AI_COMPLETE, AI_EMBED, AI_SENTIMENT, and AI_TRANSLATE (I've included SQL examples for AI_CLASSIFY and AI_SIMILARITY since their inputs are more complex).
Supported models keep expanding
AI_COUNT_TOKENS already covers a wide range of models — Llama, Mistral, DeepSeek, Snowflake Arctic families, and increasingly OpenAI GPT-5, Claude 4, 4.5, 4.6, 3.7, and more. The really exciting part is that many Claude models — claude-opus-4-6, claude-sonnet-4-5, claude-haiku-4-5, claude-4-sonnet, claude-3-7-sonnet, and so on — can now be counted with AI_COUNT_TOKENS. If you'd given up on Claude because "we can't see token counts," this is great news.
That said, some freshly released or newer-generation models haven't caught up on the AI_COUNT_TOKENS side yet, and you may still see unknown model errors or NULL returns. The good news is this coverage is expanding steadily — it's not uncommon for a model that was unsupported a short while ago to quietly start working. When you're curious about a specific model, try a quick SQL like the following in your own environment:
SELECT AI_COUNT_TOKENS('ai_complete', 'claude-opus-4-6', 'Snowflake is awesome!');
SELECT AI_COUNT_TOKENS('ai_complete', 'llama4-maverick', 'Snowflake is awesome!');
SELECT AI_COUNT_TOKENS('ai_embed', 'snowflake-arctic-embed-l-v2.0', 'Snowflake is awesome!');
Since some models are explicitly unsupported and others will gradually be added, the app is designed to surface a helpful message whenever an error comes back (more on that later).
AI_COUNT_TOKENS Official Documentation
Fetching the model list dynamically
The previous version of this article hard-coded supported models, but Cortex AI Functions' supported model list changes on a monthly cadence — information goes stale the moment you publish. This app instead uses Snowflake's built-in mechanism to fetch the latest model list at app startup.
SHOW MODELS IN SNOWFLAKE.MODELS
Snowflake exposes Cortex-supported models as Snowflake objects, so you can list them with:
-- Run once as ACCOUNTADMIN (re-run whenever new models are released)
CALL SNOWFLAKE.MODELS.CORTEX_BASE_MODELS_REFRESH();
-- Then list them
SHOW MODELS IN SNOWFLAKE.MODELS;
Cortex model RBAC (Official Documentation)
Pricing is hard-coded
Unit prices could be derived dynamically from account usage history, but if you're new to Snowflake that history doesn't exist yet, which would silently break the app. To keep the app reliable, I hard-coded a dictionary of representative model prices. For accurate, up-to-date prices, always refer to the Credit Consumption Table (PDF).
Feature Overview
Image
The sample text shown in the screenshot is borrowed from Alice's Adventures in Wonderland by Lewis Carroll, a public-domain work available on Project Gutenberg.
Features
-
Choose a Cortex AI Function (
AI_COMPLETE/AI_EMBED/AI_SENTIMENT/AI_TRANSLATE) - Model list is fetched dynamically via
SHOW MODELS - Additional inputs (model, source/target language, etc.) shown based on the selected function
- Display character and token counts for the input text
- Show the token / character ratio
- Multiply by the model's unit price for an estimated cost
- Gracefully handle cases where
AI_COUNT_TOKENSdoesn't support a model
Prerequisites
- A Snowflake account with Cortex AI Functions enabled (regional availability)
- The executing role has the
SNOWFLAKE.CORTEX_USERdatabase role - Permission to run
SHOW MODELS IN SNOWFLAKE.MODELS(ACCOUNTADMIN has executedCORTEX_BASE_MODELS_REFRESH()at least once) - Python 3.9 or later (the SiS default runtime works fine)
Source Code
import streamlit as st
from snowflake.snowpark.context import get_active_session
from snowflake.snowpark.exceptions import SnowparkSQLException
# Page configuration
st.set_page_config(layout="wide", page_title="AI Token Counter")
# Get current session
session = get_active_session()
# Application title
st.title("Snowflake Cortex AI Token Counter")
st.caption("Estimate token counts and costs for Cortex AI Functions using AI_COUNT_TOKENS.")
# -----------------------------
# Fetch model list dynamically from SHOW MODELS (cached)
# -----------------------------
@st.cache_data(ttl=3600)
def get_available_models() -> list[str]:
"""Return the list of Cortex models registered under SNOWFLAKE.MODELS."""
try:
rows = session.sql("SHOW MODELS IN SNOWFLAKE.MODELS").collect()
return sorted({r["name"].lower() for r in rows})
except SnowparkSQLException as e:
st.warning(
"Could not list models from SNOWFLAKE.MODELS. "
"Make sure ACCOUNTADMIN has run "
"`CALL SNOWFLAKE.MODELS.CORTEX_BASE_MODELS_REFRESH();` "
"and that the current role has the appropriate privileges."
f"\n\nError: {e}"
)
return []
# -----------------------------
# Credits per 1M tokens — representative values as of April 2026.
# Always cross-check with the official Credit Consumption Table (PDF):
# https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf
# -----------------------------
MODEL_CREDITS_PER_1M = {
# Claude family
"claude-opus-4-7": 12.75,
"claude-sonnet-4-6": 2.55,
"claude-opus-4-6": 12.75,
"claude-sonnet-4-5": 2.55,
"claude-opus-4-5": 12.75,
"claude-haiku-4-5": 0.68,
"claude-4-opus": 12.75,
"claude-4-sonnet": 2.55,
"claude-3-7-sonnet": 2.55,
"claude-3-5-sonnet": 2.55,
# OpenAI family
"openai-gpt-5.4": 1.90,
"openai-gpt-5.2": 1.90,
"openai-gpt-5.1": 1.90,
"openai-gpt-5": 1.90,
"openai-gpt-5-mini": 0.40,
"openai-gpt-5-nano": 0.10,
"openai-gpt-5-chat": 1.90,
"openai-gpt-4.1": 1.60,
"openai-o4-mini": 0.40,
# Gemini family
"gemini-3.1-pro": 4.84,
"gemini-2.5-flash": 0.30,
"gemini-2.5-flash-lite": 0.10,
# Llama family
"llama4-maverick": 0.25,
"llama4-scout": 0.14,
"llama3.3-70b": 1.21,
"llama3.1-405b": 3.00,
"llama3.1-70b": 1.21,
"llama3.1-8b": 0.19,
# Mistral family
"mistral-large2": 1.95,
"mixtral-8x7b": 0.22,
"mistral-7b": 0.12,
# Others
"deepseek-r1": 1.03,
"snowflake-llama-3.3-70b": 0.29,
"snowflake-llama-3.1-405b": 0.96,
"snowflake-arctic": 0.84,
# Embedding models
"snowflake-arctic-embed-l-v2.0": 0.05,
"snowflake-arctic-embed-l-v2.0-8k": 0.05,
"snowflake-arctic-embed-m-v1.5": 0.03,
"snowflake-arctic-embed-m": 0.03,
"multilingual-e5-large": 0.05,
"voyage-multilingual-2": 0.07,
"nv-embed-qa-4": 0.05,
"e5-base-v2": 0.03,
}
# Embedding models for AI_EMBED
# SNOWFLAKE.MODELS only lists CORTEX_BASE LLMs, so embedding models need to be
# listed explicitly here (they change infrequently, so hard-coding is fine).
EMBED_MODELS = [
"snowflake-arctic-embed-l-v2.0",
"snowflake-arctic-embed-m-v1.5",
"snowflake-arctic-embed-m",
"voyage-multilingual-2",
"nv-embed-qa-4",
"e5-base-v2",
"multilingual-e5-large",
]
# Preferred default models (known to work with AI_COUNT_TOKENS)
PREFERRED_COMPLETE_MODEL = "llama4-maverick"
PREFERRED_EMBED_MODEL = "snowflake-arctic-embed-l-v2.0"
# -----------------------------
# Sidebar: choose a Cortex AI Function and model
# -----------------------------
st.sidebar.title("Cortex AI Functions settings")
ai_function = st.sidebar.selectbox(
"Select the function you want to use",
("ai_complete", "ai_embed", "ai_sentiment", "ai_translate"),
)
all_models = get_available_models()
def filter_models_for_function(function_name: str, models: list[str]) -> list[str]:
"""Return candidate models depending on the function."""
if function_name == "ai_embed":
# Embedding models are not registered in SNOWFLAKE.MODELS, so use our list
return EMBED_MODELS
if function_name == "ai_complete":
return [m for m in models
if ("embed" not in m)
and ("guard" not in m)
and not m.startswith("arctic-")
and not m.startswith("twelvelabs-")]
return []
model = None
source_lang = None
target_lang = None
if ai_function in ("ai_complete", "ai_embed"):
candidates = filter_models_for_function(ai_function, all_models)
preferred = PREFERRED_COMPLETE_MODEL if ai_function == "ai_complete" else PREFERRED_EMBED_MODEL
default_idx = candidates.index(preferred) if preferred in candidates else 0
if candidates:
model = st.sidebar.selectbox("Model", candidates, index=default_idx)
else:
st.sidebar.warning("No models available for this function.")
elif ai_function == "ai_translate":
col1, col2 = st.sidebar.columns(2)
with col1:
source_lang = st.text_input("Source language (e.g. ja)", value="ja")
with col2:
target_lang = st.text_input("Target language (e.g. en)", value="en")
# ai_sentiment does not require a model or extra input
# -----------------------------
# Input area
# -----------------------------
st.header("Token count & cost estimation")
input_text = st.text_area(
"Enter the text you want to check for token count",
height=200,
placeholder="e.g. The quick brown fox jumps over the lazy dog.",
)
credit_price = st.number_input(
"Price per Snowflake credit (USD)",
min_value=0.0,
value=2.0,
step=0.01,
)
# -----------------------------
# Count tokens safely using bind variables
# -----------------------------
def count_tokens(function_name: str, text: str, model_name: str | None = None,
src: str | None = None, tgt: str | None = None):
if function_name in ("ai_complete", "ai_embed"):
sql = "SELECT AI_COUNT_TOKENS(?, ?, ?) AS token_count"
params = [function_name, model_name, text]
elif function_name == "ai_translate":
sql = "SELECT AI_COUNT_TOKENS(?, ?, ?, ?) AS token_count"
params = [function_name, text, src, tgt]
elif function_name == "ai_sentiment":
sql = "SELECT AI_COUNT_TOKENS(?, ?) AS token_count"
params = [function_name, text]
else:
raise ValueError(f"Unsupported function: {function_name}")
result = session.sql(sql, params=params).collect()
value = result[0]["TOKEN_COUNT"]
return int(value) if value is not None else None
def is_unknown_model_error(err: Exception) -> bool:
return "unknown model" in str(err).lower()
def is_not_allowed_error(err: Exception) -> bool:
return "not allowed" in str(err).lower()
if st.button("Calculate token count", type="primary"):
if not input_text.strip():
st.warning("Please enter some text.")
st.stop()
char_count = len(input_text)
st.metric("Character count", f"{char_count:,}")
try:
token_count = count_tokens(
ai_function, input_text, model_name=model,
src=source_lang, tgt=target_lang,
)
except SnowparkSQLException as e:
if is_unknown_model_error(e):
st.error(
"This model's tokenizer is not yet supported by `AI_COUNT_TOKENS`. "
"Coverage is expanding over time — try switching to a different model "
"from the sidebar, or fall back to the vendor's official tokenizer SDK."
)
elif is_not_allowed_error(e):
st.error(
"This account is not allowed to use this model. "
"Ask your ACCOUNTADMIN to check `CORTEX_MODELS_ALLOWLIST` and RBAC grants."
)
else:
st.error(f"AI_COUNT_TOKENS failed: {e}")
st.stop()
if token_count is None:
st.warning(
"This model returned NULL from `AI_COUNT_TOKENS`. "
"Coverage is expanding over time — please try another model, "
"or use the vendor's official tokenizer SDK."
)
st.stop()
st.metric("Token count", f"{token_count:,}")
ratio = token_count / char_count if char_count > 0 else 0
st.metric("Token / Character ratio", f"{ratio:.2f}")
# Cost estimation using hard-coded prices
if model is not None and model in MODEL_CREDITS_PER_1M:
unit_price = MODEL_CREDITS_PER_1M[model]
credits_used = (token_count / 1_000_000) * unit_price
cost_usd = credits_used * credit_price
col1, col2, col3 = st.columns(3)
col1.metric("Unit price per 1M tokens", f"{unit_price:.2f} credits")
col2.metric("Credits used (estimate)", f"{credits_used:.6f}")
col3.metric("Estimated cost (USD)", f"${cost_usd:.6f}")
st.caption(
"Prices shown are representative values as of April 2026, based on input tokens only. "
"Actual charges also include output tokens and system-prompt overhead. "
"Please check the official Credit Consumption Table (PDF) for up-to-date pricing."
)
elif model is not None:
st.info(
f"No unit price is registered in the app dictionary for `{model}`. "
"See the [Credit Consumption Table (PDF)]"
"(https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf) "
"for the latest pricing."
)
else:
st.info(
"This function (e.g. AI_SENTIMENT) doesn't require a model selection, "
"so model-level cost estimation is skipped."
)
Verification & Troubleshooting
Success example
With AI_COMPLETE + llama4-maverick, a short Japanese phrase returns 22 tokens, openai-gpt-5 returns 14, and claude-sonnet-4-5 returns 15 — the token count can vary significantly across models, which is particularly noticeable for Japanese.
When you see unknown model errors or NULL responses
Some models don't yet have a tokenizer implementation on the AI_COUNT_TOKENS side, so calls can return an unknown model error or NULL. Support is expanding over time, so the pragmatic path is to either use a different model for an estimate or fall back to the vendor's official tokenizer SDK. If you try again later, you may find that the model you gave up on has quietly started working.
SQL examples with AI_COUNT_TOKENS
Stepping away from Streamlit, here are SQL patterns you can run directly — including AI_CLASSIFY and AI_SIMILARITY, which we deliberately left out of the app UI:
-- AI_COMPLETE input tokens (Llama 4 Maverick)
SELECT AI_COUNT_TOKENS(
'ai_complete',
'llama4-maverick',
'Please summarize the following passage in 50 words: ...'
) AS token_count;
-- AI_CLASSIFY including classification labels and task description
SELECT AI_COUNT_TOKENS(
'ai_classify',
'One day I will see the world',
[
{'label': 'travel', 'description': 'traveling related content'},
{'label': 'cooking', 'description': 'food preparation related content'}
],
{ 'task_description': 'Classify the topic of the input text' }
) AS token_count;
-- AI_SIMILARITY combined tokens of the two inputs
SELECT AI_COUNT_TOKENS(
'ai_similarity',
'Snowflake is a cloud data platform.',
'Snowflake is a cloud data warehouse and analytics platform.'
) AS token_count;
-- AI_TRANSLATE (ja -> en) input tokens
SELECT AI_COUNT_TOKENS(
'ai_translate',
'The quick brown fox jumps over the lazy dog.',
'ja',
'en'
) AS token_count;
For functions like AI_CLASSIFY that take a lot of auxiliary input (labels, task descriptions, examples), those auxiliary inputs often dominate the token budget rather than the prompt text itself. AI_COUNT_TOKENS lets you capture the real input token count up front, which is invaluable when tuning prompts.
Conclusion
In Japanese workloads in particular, the gap between character count and token count is large, which makes cost and context-window estimation tricky. Keeping this app handy in SiS gives you an instant way to sanity-check prompts and choose RAG chunk sizes.
Because the model list is fetched dynamically, the app stays up to date the day a new model is released. And since AI_COUNT_TOKENS only incurs compute cost — not token-based billing, you can experiment freely before rolling out to production workloads.
Announcements
Snowflake What's New Updates on X
I'm sharing Snowflake's What's New updates on X. Please feel free to follow if you're interested!
English Version
Snowflake What's New Bot (English Version)
https://x.com/snow_new_en
Japanese Version
Snowflake What's New Bot (Japanese Version)
https://x.com/snow_new_jp
Change History
(20240709) Initial post
(20240830) Overall formatting improvements, updated Cortex LLM models, updated code
(20240901) Added announcements section
(20240926) Updated announcements section, added Cortex LLM models llama3.2-3b llama3.2-1b jamba-1.5-large jamba-1.5-mini
(20241105) Updated pricing for Cortex LLM models llama3.2-3b llama3.2-1b jamba-1.5-large jamba-1.5-mini as pricing was published
(20250110) Updated announcements section, added Cortex LLM model claude-3-5-sonnet
(20250111) Added notes about Python library dependencies
(20250119) Added Cortex LLM models snowflake-llama-3.3-70b snowflake-llama-3.1-405b
(20250228) Removed unnecessary libraries, added Cortex LLM models llama3.3-70b deepseek-r1, reorganized LLM model list
(20250508) Updated announcements
(20250510) Added Cortex LLM model claude-3-7-sonnet, updated deepseek-r1 pricing due to price reduction, replaced outdated image
(20250524) Added Cortex LLM model claude-4-sonnet (Note: at the time, all Claude models were unsupported by COUNT_TOKENS), added Cortex LLM models llama4-maverick llama4-scout, added Cortex EMBED TEXT models e5-base-v2 snowflake-arctic-embed-m snowflake-arctic-embed-m-v1.5 snowflake-arctic-embed-l-v2.0 nv-embed-qa-4 multilingual-e5-large voyage-multilingual-2, added Cortex task-specific models extract_answer sentiment summarize translate
(20260423) Major revision. Switched to AI_COUNT_TOKENS; expanded coverage to the Cortex AI Functions family (AI_COMPLETE / AI_EMBED / AI_SENTIMENT / AI_TRANSLATE); model list is now fetched dynamically from SHOW MODELS IN SNOWFLAKE.MODELS; reflected that many Claude models are now supported by AI_COUNT_TOKENS; added graceful handling for unknown model, NULL, and permission errors; adopted bind variables; updated role from "Sales Engineer" to "Solutions Engineer".

zenn.dev
Top comments (0)