A Production LLMOps Architecture for Snowflake

#snowflake #llmops #ai #architecture

If you've ever hardcoded a prompt, deployed it to production, and then needed to tweak it three weeks later, you know the pain: full code deployments, service restarts, zero rollback capability, and no visibility into which version is actually running. After building LLM-powered insurance claim processing pipelines on Snowflake, I've learned that treating prompts like code is fundamentally wrong—they're artifacts that need independent versioning, deployment, and evaluation strategies. This article shares the complete architecture that solved this: using Snowflake's Model Registry as a prompt registry, deploying via Snowpark Container Services for streaming and Stored Procedures for workflows, and implementing dual evaluation with TruLens and Experiment Tracking. The result? Change prompts without touching application code, A/B test in production with confidence, and maintain full observability across your entire LLM stack—all native to Snowflake.

Architecture Breakdown

┌─────────────────────────────────────────────────────┐
│          PROMPT TEMPLATES in Model Registry         │
│  - Version control for prompts as artifacts         │
│  - Evaluated with: TruLens + ExperimentTracking     │
└────────────────────────────┬────────────────────────┘
                             │
                    ┌────────┴────────┐
                    │     Serving     │
    ┌───────────────▼───────┐  ┌──────▼─────────────┐
    │  Container Services   │  │  Stored Procedures │
    │  - Online inference   │  │  - LLM Workflows   │
    │  - Token streaming    │<─│  - Business logic  │
    │  - FastAPI endpoints  │  │                    │
    │  - Real-time apps     │  │                    │
    └───────────────┬───────┘  └──────┬─────────────┘
                    │    Analyzing    │
                    └────────┬────────┘
                             │
           ┌─────────────────▼────────────────────┐
           │   EVALUATION & OBSERVABILITY         │
           │  - Event tables and OTel for tracing │
           │  - TruLens for trace evaluation      │
           └──────────────────────────────────────┘

Why This Architecture Works

1. Prompt Templates in Model Registry

Treat prompts as versioned artifacts
Reference by semantic version: v1.2.0 instead of hardcoded strings
A/B testing different prompt versions becomes trivial
Rollback capability when a prompt version underperforms

2. Deployment Strategy Based on Use Case**

SPCS for Streaming:

# FastAPI in SPCS
@app.post("/process-claim/stream")
async def stream_claim_analysis(claim_text: str):
    # Load prompt from registry
    prompt_template = registry.get_model("claim_analyzer").version("v2.1")

    # Stream tokens back to user
    async for token in llm_client.stream(
        prompt=prompt_template.render(claim=claim_text)
    ):
        yield token

SPCS for Calling Stored Procedures:

# FastAPI in SPCS
@app.post("/process-claim/llm-workflow")
async def workflow_claim_analysis(claim_text: str):
    # Call stored procedure containing an LLM workflow
    return session.call("claim_analysis_sproc", claim_text)

Stored Procedures for Workflows:

def process_claims_workflow(session: Session, claim_ids: list):
    # Load multiple prompt versions
    extractor = registry.get_model("extractor_prompt").version("v1.2")
    classifier = registry.get_model("classifier_prompt").version("v2.0")

    # Sequential processing with business logic
    for claim_id in claim_ids:
        extracted = extract_with_prompt(extractor, claim_id)

        # Business logic between LLM calls
        if extracted['amount'] > 100000:
            classification = classify_high_value(classifier, extracted)
        else:
            classification = classify_standard(classifier, extracted)

        # Structured output to table
        save_to_snowflake(claim_id, classification)

3. Dual Evaluation Strategy

Prompt Template Level:

# Experiment Tracking for prompt engineering
with ExperimentTracking(name="claim_extractor_prompts") as exp:
    # Test different prompt versions
    for version in ["v1.0", "v1.1", "v2.0"]:
        prompt = registry.get_model("extractor").version(version)
        results = evaluate_on_test_set(prompt)
        exp.log_metrics({
            f"{version}_accuracy": results['accuracy'],
            f"{version}_latency": results['latency']
        })

# TruLens for prompt quality
tru_prompt = TruCustomApp(prompt_evaluator, app_id="prompt_v2.0")
feedback = tru_prompt.run_feedback_functions(test_cases)

Workflow Level:

from snowflake import telemetry
def claim_processing_workflow(claim_data):
    telemetry.set_span_attribute("example.proc.do_tracing", "begin")

    extracted = extract_claims(claim_data)
    classified = classify_claims(extracted)

    telemetry.add_event(
        "event_with_attributes", 
        {
            "example.extracted": extracted,
            "example.classified": classified 
        }
    )

    return summarize_claims(classified)

# Traces go to Snowflake AI Observability

Additional Considerations

Version Pinning Strategy:

# In stored procedure, pin versions explicitly
PROMPT_VERSIONS = {
    'extractor': 'v1.2.0',
    'classifier': 'v2.1.0',
    'summarizer': 'v1.0.1'
}

# This makes your workflow reproducible and testable

Migration Path: You could even start with stored procedures and promote successful workflows to SPCS when you need real-time:

Develop workflow in stored procedure (easier to iterate)
Test with TruLens
Once stable, wrap in FastAPI container for SPCS
Same Model Registry artifacts, different deployment

Cost Optimization:

SPCS: Pay for uptime (good for high-frequency, low-latency)
Stored Procedures: Pay per execution (good for batch, scheduled jobs)

This architecture gives you the flexibility to choose the right deployment model without rewriting your prompts or evaluation logic.

Are you planning to document this architecture in your Medium article? This would be incredibly valuable for the community - I haven't seen many people thinking about LLM deployments on Snowflake with this level of architectural maturity.