DEV Community

Siri Varma Vegiraju
Siri Varma Vegiraju

Posted on

Conversing with Large Language Models using Cloud Native Infra

Why this matters

LLMs like ChatGPT are everywhere now used across industries like healthcare, media, transportation, and so on. But with great power comes great risk: these models can accidentally leak sensitive info (like PII), or be tricked by “divergence attacks” to expose private details. Think email addresses, credit card numbers, etc.


What happens with off-the-shelf models

If you’re using a public LLM (e.g., ChatGPT), you don’t have much control so the safest bet is never send sensitive info to the model in the first place.


If you host your own LLM: two main scenarios

1. Fine-tuning with org data

You train a generic LLM on internal data. If that data isn't anonymized, the model might regurgitate PII.
Fix: Mask or anonymize all PII before training. Example: run a pipeline that strips names, emails, SSNs, etc., before dumping data into your data lake.

2. RAG-based models

You build a chatbot that retrieves documents (like HR files) using Retrieval-Augmented Generation.
Two risks:

  • Sensitive info hiding in your docs.
  • The LLM might spit out PII when answering a query.

Fixes:

  • Anonymize before ingestion (same as above).
  • Apply filters on responses e.g. using AWS Comprehend, Microsoft Presidio, or Dapr’s conversation API to strip emails, phone numbers, SSNs, etc.

Example: Using Dapr to filter PII

Here’s a rough how-to using Dapr (on .NET/Windows):

  1. Install Dapr CLI
  2. Run dapr init
  3. Add a conversation.yml config that points to your OpenAI key and desired model
  4. Run your app via dapr run on ports 3500/4500
  5. In your code, use the Dapr.AI package to hook into the conversation API Dapr will filter out PII like emails, IPs, CCs, SSNs before sending or returning data.

✨ Bottom line

  • With public LLMs = simply don’t feed sensitive data
  • With self-hosted/fine-tuned setups = mask PII before training or ingestion
  • With RAG = combine ingestion filtering + response filtering (via tools like Dapr)

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.