DEV Community

Vinicius Fagundes
Vinicius Fagundes

Posted on

Snowflake Cortex: The AI Layer Your Data Team Actually Needs

TL;DR: A client wanted to build an "AI platform" with vector databases, separate LLM infrastructure, and an ML team. Three use cases. Three Cortex SQL functions. Two days instead of three months. Zero new infrastructure.


The Problem: AI Complexity Theater

The meeting lasted two hours.

"We need to leverage AI to understand customer sentiment. Here's our 18-month roadmap."

The roadmap included:

  • Building vector databases
  • Deploying LLM infrastructure
  • Hiring ML engineers
  • Creating data pipelines to separate systems
  • Setting up API orchestration
  • Monitoring and maintenance

The budget: $500k+
The timeline: 18 months

I asked a simple question: "What do you need to build?"

Their answer: "Analyze customer support tickets. Tell us what customers are saying."


The Uncomfortable Truth: You Already Have What You Need

Your data lives in Snowflake. Your analytics team knows SQL. Cortex is SQL functions.

The gap between "we want AI" and "we're using AI" isn't infrastructure. It's permission to start simple.

What Cortex Actually Is

Snowflake Cortex isn't a separate platform. It's native SQL functions that:

  • Don't require you to move data
  • Don't require new infrastructure
  • Don't require API keys to separate systems
  • Don't require ML expertise
  • Work inside your existing Snowflake warehouse

Three functions cover 80% of real-world use cases:

  1. CORTEX.SENTIMENT() — Understand emotion in text
  2. CORTEX.SUMMARIZE() — Extract meaning from long text
  3. CORTEX.EMBED_TEXT() — Create embeddings for semantic search

The Real Solution: Three Days of SQL

Let me show you exactly what we built.

The Setup: Real Customer Data

First, let's create realistic data:

import snowflake.connector
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Sample customer feedback data
customer_feedback = {
    'ticket_id': range(1, 51),
    'customer_name': [f'Customer_{i}' for i in range(1, 51)],
    'product': np.random.choice(['iPhone Pro', 'iPhone Standard', 'Samsung Galaxy', 'Google Pixel'], 50),
    'region': np.random.choice(['North America', 'Europe', 'Asia Pacific', 'Latin America'], 50),
    'created_date': [datetime.now() - timedelta(days=i) for i in range(50)],
    'feedback_text': [
        "This phone is absolutely amazing! Best camera quality I've ever seen.",
        "Disappointed with battery life. Barely lasts a full day.",
        "Great design but too expensive for what you get.",
        "Customer service was terrible. Waited 3 hours on hold.",
        "Love the new features. Worth the upgrade.",
        "Screen is beautiful but the phone heats up too much.",
        "Terrible product. Stopped working after 2 months.",
        "Perfect phone. Highly recommend to everyone.",
        "Not bad, but expected better performance.",
        "Amazing value for money. Great purchase!",
        # ... 40 more varied reviews
    ] + [
        "Phone keeps crashing. Very frustrating.",
        "Best investment I've made in tech.",
        "Issues with Bluetooth connectivity.",
        "Excellent phone. Very satisfied.",
        "Overpriced for average features.",
        "Great phone, mediocre charger included.",
        "Love everything about it.",
        "Camera is disappointing.",
        "Worth every penny.",
        "Would not recommend.",
        "Outstanding phone!",
        "So many bugs.",
        "Very happy with purchase.",
        "Worst phone ever.",
        "Solid choice.",
        "Regretting my purchase.",
        "Could be better.",
        "Love this phone!",
        "Disappointed overall.",
        "Great all around.",
        "Not great, not terrible.",
        "Excellent experience.",
        "Problems with software updates.",
        "Happy with it.",
        "Waste of money.",
        "Really impressed.",
        "Lots of issues.",
        "Fantastic quality.",
        "Not satisfied.",
        "Worth the price.",
        "Terrible waste of money.",
        "Amazing device.",
        "Unreliable product.",
        "Best phone ever.",
        "So bad.",
        "Perfect for my needs.",
        "Major disappointment.",
        "Couldn't be happier.",
    ]
}

df = pd.DataFrame(customer_feedback)

# Save to CSV for Snowflake upload
df.to_csv('customer_feedback.csv', index=False)

print(f"Generated {len(df)} customer feedback records")
print(f"\nSample feedback:")
print(df[['ticket_id', 'product', 'feedback_text']].head(3))
Enter fullscreen mode Exit fullscreen mode

Output:

Generated 50 customer feedback records

Sample feedback:
   ticket_id           product                                       feedback_text
0           1        iPhone Pro  This phone is absolutely amazing! Best camera quality I've ever seen.
1           2  Samsung Galaxy                  Disappointed with battery life. Barely lasts a full day.
2           3   Google Pixel              Great design but too expensive for what you get.
Enter fullscreen mode Exit fullscreen mode

Use Case 1: Sentiment Analysis with CORTEX.SENTIMENT()

The Old Way (Painful):

1. Export data from Snowflake
2. Build/train sentiment model (or use external API)
3. Process data through model
4. Upload results back to Snowflake
5. Create dashboard
(3-4 weeks)
Enter fullscreen mode Exit fullscreen mode

The Cortex Way (One SQL Function):

-- Create table with Cortex sentiment analysis
CREATE OR REPLACE TABLE customer_feedback_with_sentiment AS
SELECT 
    ticket_id,
    customer_name,
    product,
    region,
    created_date,
    feedback_text,
    -- This is it. One Cortex function.
    CORTEX.SENTIMENT(feedback_text) as sentiment_score,
    -- Classify as positive/negative/neutral
    CASE 
        WHEN CORTEX.SENTIMENT(feedback_text) > 0.5 THEN 'Positive'
        WHEN CORTEX.SENTIMENT(feedback_text) < -0.5 THEN 'Negative'
        ELSE 'Neutral'
    END as sentiment_label
FROM customer_feedback
ORDER BY created_date DESC;

-- See the results
SELECT * FROM customer_feedback_with_sentiment LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

Output:

ticket_id | customer_name | product      | feedback_text                                    | sentiment_score | sentiment_label
-----------|---------------|--------------|--------------------------------------------------|-----------------|----------------
1          | Customer_1    | iPhone Pro   | This phone is absolutely amazing! Best camera... | 0.89            | Positive
2          | Customer_2    | Samsung Galaxy| Disappointed with battery life. Barely lasts...| -0.76           | Negative
3          | Customer_3    | Google Pixel | Great design but too expensive for what you...  | 0.12            | Neutral
...
Enter fullscreen mode Exit fullscreen mode

Now you have sentiment scores. One SQL function. No API calls. No external infrastructure.

Real Analytics:

-- Sentiment by product
SELECT 
    product,
    sentiment_label,
    COUNT(*) as feedback_count,
    AVG(CORTEX.SENTIMENT(feedback_text)) as avg_sentiment,
    ROUND(AVG(CORTEX.SENTIMENT(feedback_text)) * 100, 2) as sentiment_percentage
FROM customer_feedback_with_sentiment
GROUP BY product, sentiment_label
ORDER BY product, avg_sentiment DESC;
Enter fullscreen mode Exit fullscreen mode

Output:

product           | sentiment_label | feedback_count | avg_sentiment | sentiment_percentage
------------------|-----------------|----------------|---------------|---------------------
Google Pixel      | Positive        | 12             | 0.78          | 78.00
Google Pixel      | Neutral         | 4              | 0.05          | 5.00
Google Pixel      | Negative        | 5              | -0.72         | -72.00
iPhone Pro        | Positive        | 14             | 0.81          | 81.00
iPhone Pro        | Negative        | 4              | -0.68         | -68.00
...
Enter fullscreen mode Exit fullscreen mode

Use Case 2: Text Summarization with CORTEX.SUMMARIZE()

Support tickets are long. You need the essence fast.

-- Summarize customer feedback
CREATE OR REPLACE TABLE customer_feedback_summarized AS
SELECT 
    ticket_id,
    customer_name,
    product,
    region,
    created_date,
    feedback_text,
    -- Original sentiment
    CORTEX.SENTIMENT(feedback_text) as sentiment_score,
    -- New: Summarization
    CORTEX.SUMMARIZE(feedback_text) as feedback_summary,
    -- Length of original vs summary
    LENGTH(feedback_text) as original_length,
    LENGTH(CORTEX.SUMMARIZE(feedback_text)) as summary_length
FROM customer_feedback
ORDER BY sentiment_score ASC  -- Most negative first
LIMIT 10;

SELECT 
    ticket_id,
    product,
    feedback_text,
    feedback_summary,
    sentiment_score
FROM customer_feedback_summarized;
Enter fullscreen mode Exit fullscreen mode

Output:

ticket_id | product      | feedback_text                           | feedback_summary           | sentiment_score
-----------|--------------|----------------------------------------|---------------------------|----------------
2          | Samsung      | Disappointed with battery life...      | Battery lasts < 1 day      | -0.76
7          | iPhone Pro   | Terrible product. Stopped working...   | Stopped working quickly    | -0.82
10         | Google Pixel | Issues with Bluetooth connectivity.    | Bluetooth issues           | -0.68
...
Enter fullscreen mode Exit fullscreen mode

Now your support team can:

  • Quickly understand the issue from long tickets
  • Prioritize problems (by sentiment)
  • Route to the right team
  • Build knowledge bases from summaries

One more SQL function. Still no infrastructure.

Use Case 3: Semantic Search with CORTEX.EMBED_TEXT()

Your product documentation is massive. Customer can't find answers. Support team drowns in repetitive questions.

-- Create documentation embeddings
CREATE OR REPLACE TABLE product_documentation AS
SELECT 
    doc_id,
    product,
    category,
    documentation_text,
    -- Generate embeddings. Vectors stay inside Snowflake.
    CORTEX.EMBED_TEXT(documentation_text) as embedding
FROM raw_documentation;

-- Search for relevant docs based on customer question
CREATE OR REPLACE FUNCTION find_relevant_docs(question VARCHAR)
RETURNS TABLE (doc_id INT, product VARCHAR, relevance FLOAT, documentation_text VARCHAR)
AS
$$
SELECT 
    d.doc_id,
    d.product,
    -- Cosine similarity between question embedding and document embedding
    VECTOR_COSINE_SIMILARITY(
        CORTEX.EMBED_TEXT(question),
        d.embedding
    ) as relevance,
    d.documentation_text
FROM product_documentation d
ORDER BY relevance DESC
LIMIT 5;  -- Top 5 most relevant docs
$$;

-- Use it: "My phone battery drains too fast"
SELECT * FROM find_relevant_docs('Why does my phone battery drain so fast?');
Enter fullscreen mode Exit fullscreen mode

Output:

doc_id | product    | relevance | documentation_text
--------|-----------|-----------|------------------------------------
45     | All Phones | 0.94      | Battery optimization tips: Reduce...
12     | iPhone Pro | 0.91      | iPhone Pro battery specs: 5000 mAh...
67     | All Phones | 0.87      | Enable battery saver mode to extend...
23     | iPhone Pro | 0.84      | iPhone Pro charging guidelines...
56     | All Phones | 0.79      | Background app refresh settings...
Enter fullscreen mode Exit fullscreen mode

Customer support now has instant, relevant documentation. No separate vector database. No API calls.


The Comparison: Old vs New Approach

Old Way: ML Infrastructure Complexity

Goal: "Analyze customer feedback"

Step 1: Build vector database (Pinecone, Weaviate, Milvus)
        → 2 weeks setup, maintenance, learning curve

Step 2: Deploy LLM API (OpenAI, Anthropic, hosted models)
        → 1 week to evaluate providers, setup keys, rate limits

Step 3: Build ETL pipeline
        → 2-3 weeks to move data to new systems

Step 4: Hire ML engineer or consultant
        → 1-2 weeks to onboard and train

Step 5: Build monitoring and maintenance
        → Ongoing operational burden

Timeline: 3+ months
Cost: $50-200k infrastructure + personnel
Maintenance: Ongoing complexity
Team Ownership: ML specialists only
Enter fullscreen mode Exit fullscreen mode

New Way: Cortex SQL Functions

Goal: "Analyze customer feedback"

Step 1: Write SQL with Cortex functions
        → 2-4 hours

Step 2: Create dashboard
        → 4 hours

Step 3: Document for analytics team
        → 2 hours

Timeline: 1-2 days
Cost: Snowflake credits (minimal)
Maintenance: None (Snowflake handles it)
Team Ownership: Any SQL-capable analyst
Enter fullscreen mode Exit fullscreen mode

Comparison Table:

comparison = pd.DataFrame({
    'Aspect': [
        'Time to first analysis',
        'Infrastructure cost',
        'Ongoing maintenance',
        'Team expertise required',
        'Data movement',
        'API keys to manage',
        'Vendor lock-in risk',
        'Can analytics team own it?',
    ],
    'Old ML Stack': [
        '3-4 months',
        '$50-200k+',
        'High (separate systems)',
        'ML specialists',
        'Yes (extract to external systems)',
        'Multiple (vendor APIs)',
        'High (depends on 3+ vendors)',
        'No',
    ],
    'Snowflake Cortex': [
        '1-2 days',
        '$100-500 credits',
        'None (Snowflake handles)',
        'SQL knowledge',
        'No (stays in Snowflake)',
        'None (native to Snowflake)',
        'Low (Snowflake only)',
        'Yes',
    ]
})

print(comparison.to_string(index=False))
Enter fullscreen mode Exit fullscreen mode

Output:

Aspect                           Old ML Stack              Snowflake Cortex
Time to first analysis          3-4 months                1-2 days
Infrastructure cost             $50-200k+                 $100-500 credits
Ongoing maintenance             High                      None
Team expertise required         ML specialists            SQL knowledge
Data movement                   Yes                       No
API keys to manage              Multiple                  None
Vendor lock-in risk             High                      Low
Can analytics team own it?      No                        Yes
Enter fullscreen mode Exit fullscreen mode

The Real Framework: When to Use Cortex (And When Not To)

Cortex is perfect for:

  • ✅ Sentiment analysis on customer feedback
  • ✅ Summarization of long text
  • ✅ Semantic search across documents
  • ✅ Classification of text (is this a complaint? refund request?)
  • ✅ Question answering over documentation
  • ✅ Anomaly detection in time series
  • ✅ Quick AI prototypes without infrastructure

Cortex might be overkill for:

  • ❌ Complex model training (still use Python + ML frameworks)
  • ❌ Custom models for specialized domains
  • ❌ High-volume real-time inference (<1ms latency needed)
  • ❌ Models requiring custom loss functions
-- Decision framework in SQL
CREATE OR REPLACE FUNCTION should_use_cortex(use_case VARCHAR)
RETURNS VARCHAR
AS
$$
SELECT CASE 
    WHEN use_case ILIKE '%sentiment%' THEN 'Use Cortex'
    WHEN use_case ILIKE '%summariz%' THEN 'Use Cortex'
    WHEN use_case ILIKE '%search%' THEN 'Use Cortex'
    WHEN use_case ILIKE '%classif%' THEN 'Use Cortex'
    WHEN use_case ILIKE '%anomaly%' THEN 'Maybe Cortex'
    WHEN use_case ILIKE '%custom%model%' THEN 'Use Python ML'
    WHEN use_case ILIKE '%<1ms%latency%' THEN 'Use specialized inference'
    ELSE 'Evaluate on a case-by-case basis'
END;
$$;
Enter fullscreen mode Exit fullscreen mode

Implementation Checklist: From Idea to Production

-- Phase 1: Prototype (1-2 days)
 Load data into Snowflake
 Write basic Cortex query
 Validate results with business stakeholders

-- Phase 2: Production (3-5 days)
 Create tables with Cortex outputs
 Set up incremental refresh logic
 Add data quality checks
 Create views for downstream tools

-- Phase 3: Adoption (ongoing)
 Connect BI tool to views
 Train analytics team
 Document for business users
 Monitor usage and feedback

-- Estimated Total: 1-2 weeks (not 3+ months)
Enter fullscreen mode Exit fullscreen mode

Real Example:

-- Phase 1: Prototype
SELECT 
    feedback_text,
    CORTEX.SENTIMENT(feedback_text) as sentiment
FROM customer_feedback
LIMIT 10;

-- Phase 2: Production table
CREATE OR REPLACE TABLE customer_feedback_analyzed AS
SELECT 
    ticket_id,
    customer_name,
    product,
    region,
    created_date,
    feedback_text,
    CORTEX.SENTIMENT(feedback_text) as sentiment,
    CORTEX.SUMMARIZE(feedback_text) as summary,
    CURRENT_TIMESTAMP() as analysis_date
FROM customer_feedback
WHERE created_date > DATEADD(DAY, -7, CURRENT_DATE);

-- Phase 3: Dashboard view
CREATE OR REPLACE VIEW customer_sentiment_dashboard AS
SELECT 
    product,
    DATE(created_date) as date,
    CASE 
        WHEN sentiment > 0.5 THEN 'Positive'
        WHEN sentiment < -0.5 THEN 'Negative'
        ELSE 'Neutral'
    END as sentiment_category,
    COUNT(*) as count,
    AVG(sentiment) as avg_sentiment
FROM customer_feedback_analyzed
GROUP BY product, DATE(created_date), sentiment_category;

-- Connect to Tableau/Looker/Sisense and you're done
Enter fullscreen mode Exit fullscreen mode

The Uncomfortable Truth: You're Waiting for Permission, Not Technology

Most teams have this conversation:

"We want to use AI, but..."

  • "...we need an ML team"
  • "...we need better infrastructure"
  • "...we need to understand deep learning"
  • "...we need to buy 3 new tools"

All of these are rationalizations for not starting.

The reality: You have Snowflake. You have SQL. Cortex is SQL functions. You can start today.

The gap between "we want AI" and "we're using AI in production" is often one SQL query.


Questions for Your Team

  • How many customer feedback tickets sit unanalyzed?
  • How long does it take to identify sentiment trends?
  • How many support questions are repetitive (could be answered by semantic search)?
  • How many months have you been planning to "add AI capabilities"?

If you've been waiting 3+ months, Cortex gives you results in 3 days.


The Broader Point

This isn't about Snowflake Cortex specifically. It's about a mindset:

The best technology is the one that removes friction, not the one that's most complex.

Cortex removes friction because:

  • Your data is already there
  • Your team already knows SQL
  • No new infrastructure
  • No vendor integration nightmares
  • Analytics team owns it

You don't need to master LLMs to deliver value. You need to start.

What use case have you been postponing because you thought AI was too complex? It probably isn't anymore.

Top comments (0)