Tsubasa Kanno

Posted on Sep 14, 2024 • Edited on May 25

Snowflake Cortex LLM Token Counter: Build a Cost Calculator with Streamlit in Snowflake (SiS)

#snowflake #streamlit #python #llm

Introduction

Hello, I'm a Sales Engineer at Snowflake. I'd like to share some of my experiences and experiments with you through various posts.

Ever wondered how much your LLM requests actually cost? Token counting and cost estimation can be tricky, especially when working with different languages and models. In this article, I'll show you how to build a powerful token counting and cost estimation app using Streamlit in Snowflake for Cortex LLM.

This tool is particularly valuable when working with languages like Japanese, where the relationship between character count and token count can vary significantly across different models.

Note: This post represents my personal views and not those of Snowflake.

What You'll Learn

By the end of this tutorial, you'll have:

✅ A fully functional token counting app with 35+ AI models
✅ Real-time cost estimation based on Snowflake credit pricing
✅ Understanding of token-to-character ratios across different languages
✅ A deployable Streamlit app running directly on Snowflake
✅ Knowledge of Cortex LLM pricing and cost optimization strategies

What is Streamlit in Snowflake (SiS)?

Streamlit is a Python library that allows you to create web UIs with simple Python code, eliminating the need for HTML/CSS/JavaScript. You can see examples in the App Gallery.

Streamlit in Snowflake enables you to develop and run Streamlit web apps directly on Snowflake. It's easy to use with just a Snowflake account and great for integrating Snowflake table data into web apps.

About Streamlit in Snowflake (Official Snowflake Documentation)

What is Snowflake Cortex?

Snowflake Cortex is a suite of generative AI features in Snowflake. Cortex LLM allows you to call large language models running on Snowflake using simple functions in SQL or Python.

Large Language Model (LLM) Functions (Snowflake Cortex) (Official Snowflake Documentation)

Feature Overview

Image

Features

Users can select a Cortex LLM model
Display character and token counts for user-input text
Show the ratio of tokens to characters
Calculate estimated cost based on Snowflake credit pricing

Note: Cortex LLM pricing table (PDF)

Prerequisites

Snowflake account with Cortex LLM access
Python 3.8 or later

Note: Cortex LLM region availability (Official Snowflake Documentation)

Source Code

import streamlit as st
from snowflake.snowpark.context import get_active_session

# Page configuration
st.set_page_config(layout="wide")

# Get current session
session = get_active_session()

# Application title
st.title("Cortex AI Token Count Checker")

# AI settings
st.sidebar.title("AI Settings")
lang_model = st.sidebar.radio("Select the language model you want to use",
                              ("deepseek-r1",
                               "claude-4-sonnet", "claude-3-7-sonnet", "claude-3-5-sonnet",
                               "mistral-large2", "mixtral-8x7b", "mistral-7b",
                               "llama4-maverick", "llama4-scout",
                               "llama3.3-70b",
                               "llama3.2-1b", "llama3.2-3b",
                               "llama3.1-8b", "llama3.1-70b", "llama3.1-405b",
                               "snowflake-llama-3.1-405b", "snowflake-llama-3.3-70b",
                               "snowflake-arctic",
                               "reka-flash", "reka-core",
                               "jamba-instruct", "jamba-1.5-mini", "jamba-1.5-large",
                               "gemma-7b",
                               "mistral-large", "llama3-8b", "llama3-70b", "llama2-70b-chat",
                               "e5-base-v2", "snowflake-arctic-embed-m", "snowflake-arctic-embed-m-v1.5",
                               "snowflake-arctic-embed-l-v2.0", "nv-embed-qa-4", "multilingual-e5-large", "voyage-multilingual-2",
                               "extract_answer", "sentiment", "summarize", "translate"
                               )
)

# Function to count tokens (using Cortex's token counting function)
def count_tokens(model, text):
    result = session.sql(f"SELECT SNOWFLAKE.CORTEX.COUNT_TOKENS('{model}', '{text}') as token_count").collect()
    return result[0]['TOKEN_COUNT']

# Token count check and cost calculation
st.header("Token Count Check and Cost Calculation")

input_text = st.text_area("Select a language model from the left pane and enter the text you want to check for token count:", height=200)

# Let user input the price per credit
credit_price = st.number_input("Enter the price per Snowflake credit (in dollars):", min_value=0.0, value=2.0, step=0.01)

# Credits per 1M tokens for each model (20250524 Claude models are currently not supported by CORTEX.COUNT_TOKENS)
model_credits = {
    "deepseek-r1": 1.03,
    "claude-4-sonnet": 2.55,
    "claude-3-7-sonnet": 2.55,
    "claude-3-5-sonnet": 2.55,
    "mistral-large2": 1.95,
    "mixtral-8x7b": 0.22,
    "mistral-7b": 0.12,
    "llama4-maverick": 0.25,
    "llama4-scout": 0.14,
    "llama3.3-70b": 1.21,
    "llama3.2-1b": 0.04,
    "llama3.2-3b": 0.06,
    "llama3.1-8b": 0.19,
    "llama3.1-70b": 1.21,
    "llama3.1-405b": 3,
    "snowflake-llama-3.1-405b": 0.96,
    "snowflake-llama-3.3-70b": 0.29,
    "snowflake-arctic": 0.84,
    "reka-flash": 0.45,
    "reka-core": 5.5,
    "jamba-instruct": 0.83,
    "jamba-1.5-mini": 0.1,
    "jamba-1.5-large": 1.4,
    "gemma-7b": 0.12,
    "mistral-large": 5.1,
    "llama3-8b": 0.19,
    "llama3-70b": 1.21,
    "llama2-70b-chat": 0.45,
    "e5-base-v2": 0.03, 
    "snowflake-arctic-embed-m": 0.03, 
    "snowflake-arctic-embed-m-v1.5": 0.03,
    "snowflake-arctic-embed-l-v2.0": 0.05, 
    "nv-embed-qa-4": 0.05, 
    "multilingual-e5-large": 0.05, 
    "voyage-multilingual-2": 0.07,
    "extract_answer": 0.08, 
    "sentiment": 0.08, 
    "summarize": 0.1,
    "translate": 1.5
}

if st.button("Calculate Token Count"):
    if input_text:
        # Calculate character count
        char_count = len(input_text)
        st.write(f"Character count of input text: {char_count}")

        if lang_model in model_credits:
            # Calculate token count
            token_count = count_tokens(lang_model, input_text)
            st.write(f"Token count of input text: {token_count}")

            # Ratio of tokens to characters
            ratio = token_count / char_count if char_count > 0 else 0
            st.write(f"Token count / Character count ratio: {ratio:.2f}")

            # Cost calculation
            credits_used = (token_count / 1000000) * model_credits[lang_model]
            cost = credits_used * credit_price

            st.write(f"Credits used: {credits_used:.6f}")
            st.write(f"Estimated cost: ${cost:.6f}")
        else:
            st.warning("The selected model is not supported by Snowflake's token counting feature.")
    else:
        st.warning("Please enter some text.")

Conclusion

This app makes it easier to estimate costs for LLM workloads, especially when dealing with languages like Japanese where there's often a gap between character count and token count. I hope you find it useful!

Announcements

Snowflake What's New Updates on X

I'm sharing Snowflake's What's New updates on X. Please feel free to follow if you're interested!

English Version

Snowflake What's New Bot (English Version)
https://x.com/snow_new_en

Japanese Version

Snowflake What's New Bot (Japanese Version)
https://x.com/snow_new_jp

Change History

(20240709) Initial post
(20240830) Overall formatting improvements, updated Cortex LLM models, updated code
(20240901) Added announcements section
(20240926) Updated announcements section, added Cortex LLM models llama3.2-3b llama3.2-1b jamba-1.5-large jamba-1.5-mini
(20241105) Updated pricing for Cortex LLM models llama3.2-3b llama3.2-1b jamba-1.5-large jamba-1.5-mini as pricing was published
(20250110) Updated announcements section, added Cortex LLM model claude-3-5-sonnet
(20250111) Added notes about Python library dependencies
(20250119) Added Cortex LLM models snowflake-llama-3.3-70b snowflake-llama-3.1-405b
(20250228) Removed unnecessary libraries, added Cortex LLM models llama3.3-70b deepseek-r1, reorganized LLM model list
(20250508) Updated announcements
(20250510) Added Cortex LLM model claude-3-7-sonnet, updated deepseek-r1 pricing due to price reduction, replaced outdated image
(20250524) Added Cortex LLM model claude-4-sonnet (Note: All Claude models currently not supported by COUNT_TOKENS function), added Cortex LLM models llama4-maverick llama4-scout, added Cortex EMBED TEXT models e5-base-v2 snowflake-arctic-embed-m snowflake-arctic-embed-m-v1.5 snowflake-arctic-embed-l-v2.0 nv-embed-qa-4 multilingual-e5-large voyage-multilingual-2, added Cortex specialized function models extract_answer sentiment summarize translate

Original Japanese Article

https://zenn.dev/tsubasa_tech/articles/4dd80c91508ec4

DEV Community