If you've ever hardcoded a prompt, deployed it to production, and then needed to tweak it three weeks later, you know the pain: full code deployments, service restarts, zero rollback capability, and no visibility into which version is actually running. After building LLM-powered insurance claim processing pipelines on Snowflake, I've learned that treating prompts like code is fundamentally wrong—they're artifacts that need independent versioning, deployment, and evaluation strategies. This article shares the complete architecture that solved this: using Snowflake's Model Registry as a prompt registry, deploying via Snowpark Container Services for streaming and Stored Procedures for workflows, and implementing dual evaluation with TruLens and Experiment Tracking. The result? Change prompts without touching application code, A/B test in production with confidence, and maintain full observability across your entire LLM stack—all native to Snowflake.
Architecture Breakdown
┌─────────────────────────────────────────────────────┐
│ PROMPT TEMPLATES in Model Registry │
│ - Version control for prompts as artifacts │
│ - Evaluated with: TruLens + ExperimentTracking │
└────────────────────────────┬────────────────────────┘
│
┌────────┴────────┐
│ Serving │
┌───────────────▼───────┐ ┌──────▼─────────────┐
│ Container Services │ │ Stored Procedures │
│ - Online inference │ │ - LLM Workflows │
│ - Token streaming │<─│ - Business logic │
│ - FastAPI endpoints │ │ │
│ - Real-time apps │ │ │
└───────────────┬───────┘ └──────┬─────────────┘
│ Analyzing │
└────────┬────────┘
│
┌─────────────────▼────────────────────┐
│ EVALUATION & OBSERVABILITY │
│ - Event tables and OTel for tracing │
│ - TruLens for trace evaluation │
└──────────────────────────────────────┘
Why This Architecture Works
1. Prompt Templates in Model Registry
- Treat prompts as versioned artifacts
- Reference by semantic version:
v1.2.0instead of hardcoded strings - A/B testing different prompt versions becomes trivial
- Rollback capability when a prompt version underperforms
2. Deployment Strategy Based on Use Case**
SPCS for Streaming:
# FastAPI in SPCS
@app.post("/process-claim/stream")
async def stream_claim_analysis(claim_text: str):
# Load prompt from registry
prompt_template = registry.get_model("claim_analyzer").version("v2.1")
# Stream tokens back to user
async for token in llm_client.stream(
prompt=prompt_template.render(claim=claim_text)
):
yield token
SPCS for Calling Stored Procedures:
# FastAPI in SPCS
@app.post("/process-claim/llm-workflow")
async def workflow_claim_analysis(claim_text: str):
# Call stored procedure containing an LLM workflow
return session.call("claim_analysis_sproc", claim_text)
Stored Procedures for Workflows:
def process_claims_workflow(session: Session, claim_ids: list):
# Load multiple prompt versions
extractor = registry.get_model("extractor_prompt").version("v1.2")
classifier = registry.get_model("classifier_prompt").version("v2.0")
# Sequential processing with business logic
for claim_id in claim_ids:
extracted = extract_with_prompt(extractor, claim_id)
# Business logic between LLM calls
if extracted['amount'] > 100000:
classification = classify_high_value(classifier, extracted)
else:
classification = classify_standard(classifier, extracted)
# Structured output to table
save_to_snowflake(claim_id, classification)
3. Dual Evaluation Strategy
Prompt Template Level:
# Experiment Tracking for prompt engineering
with ExperimentTracking(name="claim_extractor_prompts") as exp:
# Test different prompt versions
for version in ["v1.0", "v1.1", "v2.0"]:
prompt = registry.get_model("extractor").version(version)
results = evaluate_on_test_set(prompt)
exp.log_metrics({
f"{version}_accuracy": results['accuracy'],
f"{version}_latency": results['latency']
})
# TruLens for prompt quality
tru_prompt = TruCustomApp(prompt_evaluator, app_id="prompt_v2.0")
feedback = tru_prompt.run_feedback_functions(test_cases)
Workflow Level:
from snowflake import telemetry
def claim_processing_workflow(claim_data):
telemetry.set_span_attribute("example.proc.do_tracing", "begin")
extracted = extract_claims(claim_data)
classified = classify_claims(extracted)
telemetry.add_event(
"event_with_attributes",
{
"example.extracted": extracted,
"example.classified": classified
}
)
return summarize_claims(classified)
# Traces go to Snowflake AI Observability
Additional Considerations
Version Pinning Strategy:
# In stored procedure, pin versions explicitly
PROMPT_VERSIONS = {
'extractor': 'v1.2.0',
'classifier': 'v2.1.0',
'summarizer': 'v1.0.1'
}
# This makes your workflow reproducible and testable
Migration Path: You could even start with stored procedures and promote successful workflows to SPCS when you need real-time:
- Develop workflow in stored procedure (easier to iterate)
- Test with TruLens
- Once stable, wrap in FastAPI container for SPCS
- Same Model Registry artifacts, different deployment
Cost Optimization:
- SPCS: Pay for uptime (good for high-frequency, low-latency)
- Stored Procedures: Pay per execution (good for batch, scheduled jobs)
This architecture gives you the flexibility to choose the right deployment model without rewriting your prompts or evaluation logic.
Are you planning to document this architecture in your Medium article? This would be incredibly valuable for the community - I haven't seen many people thinking about LLM deployments on Snowflake with this level of architectural maturity.
Top comments (0)