Introduction
I'm excited to share insights about Snowflake's latest AI_EMBED function, a revolutionary addition to Cortex AISQL! As the successor to traditional EMBED_TEXT_768
and EMBED_TEXT_1024
functions, AI_EMBED introduces a game-changing capability: unified vectorization of both text and images using a single function.
Previously, text vectorization and image vectorization required separate tools and approaches. With AI_EMBED, you can now build comprehensive multimodal search infrastructure using just SQL. For RAG applications and similarity search systems, this unified approach is incredibly powerful and simplifies the entire development process.
If you're building AI applications that need to handle both text and visual content, this feature will transform how you approach multimodal data processing!
Note (2025/7/26): AI_EMBED function is currently in public preview, so features may undergo significant updates in the future.
Note: This article represents my personal views and not those of Snowflake.
Understanding Snowflake Cortex AISQL
Snowflake Cortex AISQL provides a comprehensive set of functions that enable direct AI functionality calls from SQL. A perfect example is the AI_COMPLETE function, which demonstrates the unified approach by processing both text and images using the same function interface:
-- Text processing
SELECT AI_COMPLETE('llama4-maverick', 'Explain the key features of Snowflake');
-- Image processing (same function!)
SELECT AI_COMPLETE('llama4-maverick', 'Describe this image', TO_FILE('@image_stage', 'dog.jpeg'));
Image processed by AI_COMPLETE function (Generated by Google Gemini)
-- Text processing result
"Snowflake is a cloud-based data warehouse solution with the following key features..."
-- Image processing result
This image shows a close-up of a dog's face with white fur and large eyes. The dog has its mouth open...
This same unified multimodal processing capability is now available for vectorization through the AI_EMBED function.
Previous Vectorization Approaches
To better understand AI_EMBED's value, let's review traditional vectorization methods in Snowflake. Previously available embedding functions included:
EMBED_TEXT_768
EMBED_TEXT_1024
For detailed analysis of these functions and their performance characteristics, I covered them extensively in my previous article about Snowflake vectorization options.
The key limitation was that text vectorization used these dedicated functions, while image vectorization required external tools or services, creating a fragmented development experience.
AI_EMBED Function Features
Unified Interface
The primary advantage of AI_EMBED is processing both text and images with the same function. This unified approach delivers several benefits:
- Simplified Learning Curve: No need to master multiple functions or methods for different data types
- Consistent Model Interface: Same function works across different embedding models
- Streamlined Data Governance: All vectorization processing happens within Snowflake's secure environment
- Easy Migration: Similar syntax to existing embedding functions enables smooth transitions
Available Models
AI_EMBED supports the following models:
Text Models
snowflake-arctic-embed-l-v2.0
snowflake-arctic-embed-l-v2.0-8k
nv-embed-qa-4
multilingual-e5-large
voyage-multilingual-2
Image Model
voyage-multimodal-3
Important: Only voyage-multimodal-3
supports image vectorization. Interestingly, this image model can also process text data effectively.
Model Characteristics
Understanding model selection is crucial for optimal results:
-
snowflake-arctic-embed-l-v2.0-8k
: Supports up to 8,000 tokens, ideal for technical documents and long articles. This extended context can potentially eliminate chunking preprocessing for certain documents -
nv-embed-qa-4
: English-only model, not suitable for multilingual environments - Other models: Multilingual support with excellent performance across various languages
Choose snowflake-arctic-embed-l-v2.0-8k
for long-form content and any model except nv-embed-qa-4
for multilingual applications.
Multimodal Vectorization Value
Business Value
Multimodal vectorization delivers substantial business benefits:
- Enhanced Search Accuracy: Unified text and image search reveals related content that traditional keyword searches miss
- Improved Customer Experience: Enables intuitive experiences like image-based product search or text-to-image discovery
- Operational Efficiency: Centralized management and search across documents, diagrams, photos, and notes significantly reduces information access time
- New Business Models: Enables previously impossible multimodal search services and recommendation engines
Technical Value
The technical advantages are equally compelling:
- Data Silo Elimination: Solves the problem of text and image data managed in separate systems through unified vector space
- Reduced Development Costs: Single platform approach with Snowflake eliminates system complexity compared to multiple specialized tools
- Scalability: Snowflake's cloud-native architecture efficiently handles large-scale multimodal data processing
- Security & Governance: Complete data and vectorization processing within Snowflake enables centralized governance management
Business Use Cases
AI_EMBED enables powerful business applications:
1. Multimodal Search Systems
Build e-commerce platforms where customers can search for similar products using both product images and text descriptions.
2. Content Management Systems
Create enterprise CMS solutions that enable unified search and classification of documents and visual assets.
3. Customer Support Enhancement
Develop systems that analyze both inquiry text and attached images to provide comprehensive, context-aware responses.
4. RAG Chatbots
Build enterprise chatbots that search across both textual documents and visual content to incorporate domain knowledge into LLM responses.
Practical Implementation Examples
Text Vectorization
Basic text vectorization is straightforward:
-- Text vectorization
SELECT AI_EMBED('snowflake-arctic-embed-l-v2.0-8k', 'Snowflake Summit 2025 introduced many exciting new features');
-- Text vectorization result
[0.001018,0.002565,-0.024353,0.004829, ...]
Image Vectorization
Image vectorization uses the same function with proper file handling:
-- Image vectorization
SELECT AI_EMBED('voyage-multimodal-3', TO_FILE('@image_stage', 'dog.jpeg'));
-- Image vectorization result
[-0.015381,0.008240,-0.012634,-0.024048, ...]
Vector Similarity Calculations
AI_EMBED generated vectors work seamlessly with Snowflake's vector similarity functions like cosine similarity. Here are three fundamental patterns:
1. Text-to-Text Similarity
-- Text similarity calculation
SELECT VECTOR_COSINE_SIMILARITY(
AI_EMBED('snowflake-arctic-embed-l-v2.0', 'Beautiful sunny weather today'),
AI_EMBED('snowflake-arctic-embed-l-v2.0', 'Today is blessed with great climate')
) as text_similarity;
-- Text similarity result
0.8324767643
2. Image-to-Image Similarity
-- Image similarity calculation
SELECT VECTOR_COSINE_SIMILARITY(
AI_EMBED('voyage-multimodal-3', TO_FILE('@image_stage', 'cat.jpeg')),
AI_EMBED('voyage-multimodal-3', TO_FILE('@image_stage', 'dog.jpeg'))
) as image_similarity;
-- Image similarity result
0.5069280956
3. Cross-Modal Text-to-Image Similarity
-- Cross-modal similarity calculation
SELECT VECTOR_COSINE_SIMILARITY(
AI_EMBED('voyage-multimodal-3', 'Close-up of a dog face with white fur and large eyes'),
AI_EMBED('voyage-multimodal-3', TO_FILE('@image_stage', 'dog.jpeg'))
) as cross_modal_similarity;
-- Cross-modal similarity result
0.6030817788
Cosine similarity ranges from -1 to 1, with values closer to 1 indicating higher similarity.
Summary
AI_EMBED represents a significant advancement in Snowflake's vectorization capabilities. The unified interface for processing both text and images makes developing multimodal AI applications significantly more accessible and efficient.
Migration from existing EMBED_TEXT_1024 functions is straightforward, enabling gradual application upgrades. As data workloads increasingly involve mixed text and image content, AI_EMBED provides the foundation for building next-generation data utilization platforms efficiently.
The future of search and AI applications is multimodal, and AI_EMBED positions Snowflake users to capitalize on these emerging opportunities. I encourage you to explore AI_EMBED and discover new possibilities for your business applications!
What multimodal use cases are you most excited to build with AI_EMBED? Share your thoughts in the comments below!
Promotion
Snowflake What's New Updates on X
I share Snowflake What's New updates on X. Follow for the latest insights:
English Version
Snowflake What's New Bot (English Version)
Japanese Version
Snowflake's What's New Bot (Japanese Version)
Change Log
(20250726) Initial post
Top comments (0)