Sean Falconer for AWS Community Builders

Posted on Nov 28, 2023

Adding a Privacy Layer to AWS PartyRock

#partyrock #privacy #security #llms

AWS recently unveiled PartyRock – an Amazon Bedrock Playground. PartyRock lets users leverage foundation models from Amazon and other leading AI companies in an intuitive and code-free playground to quickly create AI-powered applications that can handle an array of specialized tasks.

Whether you need to orchestrate your re:Invent schedule, optimize marketing strategies, or develop a diabetes-management diet planner, PartyRock is an amazing tool for transforming ideas into applications with minimal effort.

However, while the excitement surrounding PartyRock and the capabilities of generative AI is well-founded, it’s important to be mindful of data privacy concerns. The lack of a “delete” button for AI models raises substantial privacy and security concerns, because if users reveal sensitive data to an AI model, it can’t be deleted the same way you can delete a row from a relational database.

Consider, for example, a contract analysis assistant application operating on PartyRock. While this application proves invaluable in parsing complex contracts and extracting pertinent information, you need to put privacy measures in place to use this application because many contracts inevitably contain confidential data. Sharing such sensitive information with the underlying AI model presents a significant privacy risk.

So, how can you use Personally Identifiable Information (PII) in AI-driven applications?

To navigate the potential privacy limitations of any AI-based application it’s imperative that we add a data privacy layer to limit PII exposure. To demonstrate this, we built a Chrome Extension to protect unintended PII sharing with apps built on PartyRock. The data privacy layer leverages Skyflow LLM Privacy Vault. Using Skyflow, the extension detects and de-identifies PII so that PartyRock's models remain fully functional without compromising the privacy of sensitive details. The video below shows the complete functionality.

In this blog post, I’ll show how to create a privacy-preserving Chrome Extension. I’ll also share insights on how you can leverage the functionality offered by PartyRock, or any other AI model, while using a data privacy vault to protect sensitive data and safeguard user data privacy.

What is Skyflow LLM Privacy Vault?

Skyflow LLM Privacy Vault is a technology that’s purpose-built to isolate, protect, and govern sensitive customer data seamlessly throughout the lifecycle of LLMs. It’s not limited to working strictly with Amazon Bedrock – you can use Skyflow LLM Privacy Vault with any LLM, including a public model, a fine-tuned foundation model like those provided by PartyRock, or your own custom model.

Privacy During Model Training

Whether you’re constructing foundation models, fine-tuning models, or developing Retrieval Augmented Generation (RAG) models, the privacy vault works like a privacy firewall or a data transformation layer. It detects and de-identifies sensitive data during collection and processing, regardless of whether the source data originates from a single source, or is compiled from multiple sources.

The plaintext sensitive data that’s detected by Skyflow is stored in the vault and replaced by de-identified data. Then, LLM training can proceed as normal, with a de-identified and privacy-safe dataset.

Using a privacy vault for privacy-preserving model training.

Privacy in Inference

Users interact with AI models in a variety of different ways, with the most popular one being a front-end UI like the ones used by PartyRock applications. Users can also upload files to AI models. In both cases, these models use inference to collect data that users provide to them, including sensitive data – unless that data is first de-identified.

Using a privacy vault, sensitive data isn’t just de-identified; it's securely stored. All sensitive customer data (and even core IP) is kept out of LLMs entirely. This data can only be re-identified by authorized users. This approach preserves data privacy during inference when AI models provide responses because PII is protected by fine-grained access controls. These controls restrict who can see what data, when, where, and for how long.

Using a privacy vault for privacy-preserving inference.

Detect and De-identify PII

So, how does this work, and how can you add a privacy vault to any PartyRock application?

The first step is to detect sensitive data, including PII, from a dataset. The same approach is applied to model training datasets, and to any data supplied by a user during inference.

To detect PII, Skyflow provides a detect API endpoint that can accept text or files. This endpoint automatically identifies hundreds of forms of PII, and returns a privacy-safe version of the input where each piece of detected PII is replaced by vault-generated tokens. Note that vault-generated tokens are distinct from the LLM-generated tokens that are used to chunk and process information within AI models.

In the sample API call below, I’m calling the detect API with a sentence containing a name and phone number. When working with an LLM, either in training or inference, I typically don’t want to share these details or any other PII.

curl -s -X POST "https://manage.skyflowapis.com/v1/detect" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
    "vault_id": "t6dadfbc3f4d4cdfbf12bb38b694b144",
    "data": {
        "blob": "Hi, my name is Sean Falconer and my phone number is 123-456-7890.",
        "send_back_entities": true
     }
}'

This API call returns a response like the following example, where name and phone number are detected and replaced by vault-generated tokens. The context of each of these entities – my name and my phone number – remains intact, which is all the LLM needs to draw context for training and inference.

In this example, the name is replaced by a token formatted as a UUID while the phone number is replaced by a format-preserving token that still resembles a phone number. You can generate tokens in a variety of formats depending on your use case.

{
    "processed_text": "Hi, my name is NAME:576a5b26-5cca-4cdc-b409-ea2c39b53f21 and my phone number is PHONE:(765) 978-2342."
    "entities": [
        {
            "processed_text": "NAME:576a5b26-5cca-4cdc-b409-ea2c39b53f21",
            "text": "Sean Falconer",
            "location": {
                "stt_idx": "16",
                "end_idx": "28",
                "stt_idx_processed": "16",
                "end_idx_processed": "56"
            }
        },
        {
            "processed_text": "PHONE:(765) 978-2342",
            "text": "123-456-7890",
            "location": {
                "stt_idx": "53",
                "end_idx": "64",
                "stt_idx_processed": "81",
                "end_idx_processed": "100"    
            }
        }
    ]
}

For end-to-end LLM data protection when creating or fine-tuning your own AI models, you would use the Skyflow detect API during both training and inference. For PartyRock applications, we can’t control the training process because we don’t have access to the backend service. However, we do have control over what gets shared during inference.

In the following sections, we dive into how to build a Chrome Extension that uses Skyflow LLM Privacy Vault to carefully monitor what’s shared with PartyRock and filter out PII.

Creating a Chrome Extension

Chrome Extensions are custom-built programs that enable users to customize the Chrome browsing experience. They are relatively simple to create.

They consist of a manifest.json file that describes the extension’s capabilities and configuration. The manifest I created for my Skyflow extension is shown below.

{
 "manifest_version": 3,
 "name": "Skyflow",
 "version": "1.0",
 "description": "Prevent PII sharing with AWS PartyRock Apps",
 "icons": {
   "16": "images/skyflow-16.png",
   "32": "images/skyflow-32.png",
   "48": "images/skyflow-48.png",
   "128": "images/skyflow-128.png"
 },
 "content_scripts": [
   {
     "js": ["scripts/jquery-3.7.1.min.js", "scripts/detect-and-tokenize.js"],
     "run_at": "document_end",
     "matches": [
       "https://partyrock.aws/u/*"
     ]
   }
 ]
}

The extension runs on any website matching the https://partyrock.aws domain and u (i.e. user) route. It imports two Javascript files:

jQuery, which I’m using to help provide shorthand for some of the DOM manipulation and matching I need to monitor input and output from a PartyRock app
detect-and-tokenize.js, the main program that integrates with Skyflow to monitor inference data for PII

Monitor, Detect, and De-identify PII

To prevent potential sharing of PII with the model, we need to monitor an app’s input fields, capture the user input, and then use Skyflow to detect and remove PII. The de-identified version is then swapped into the user input fields and passed along to the model for inference.

For example, in the image below, both areas that are boxed in red represent user input fields where PII might be intentionally or accidentally shared.

Example PartyRock app highlighting the areas in red where a user might share PII.

PartyRock apps load dynamically, so the input fields aren’t rendered until after the page loads. This means that in order to monitor user input, we need to wait for the page to load before attaching an input listener to the <textarea> element where users interact with an app.

Once the page loads, for each <textarea> input, we attach an input listener which is executed as a user types input. To avoid calling the Skyflow API on every keystroke, the setTimeout function is used to delay each call by 500 milliseconds. If there’s new input by the user, the delayed call is cleared and a new one starts.

textarea.on('input', function () {
  if (callback) {
    clearTimeout(callback);
  }

  callback = setTimeout(tokenizePii, 500);
});

The tokenizePii function takes the input field’s text value and calls an AWS Lambda function, which in turn calls the Skyflow detect endpoint, as shown in steps 1 and 2 below:

Using a Chrome extension and Skyflow to provide end-to-end AI data privacy for PII.

When we use an app like Contract Assistant with this Chrome Extension, PII contained within a contract is replaced by de-identified vault-tokenized values, as shown on the right side of the following illustration:

Plaintext PII in a contract is replaced with vault-tokenized values.

Monitor, Detect, and Re-identify PII

Now that the ingress messages to the PartyRock backend are free of PII, responses coming back from the LLM may contain de-identified values, which is ideal for data privacy but could be puzzling for app users. So, the next step is to re-identify these de-identified values to provide authorized users with PII from the vault, subject to fine-grained access controls.

To do this, we need our Chrome Extension to monitor the <div> element where responses are generated and automatically restore the de-identified values to the original values to give the user a readable, truly usable contract analysis application.

I used the MutationObserver interface to look for new child nodes being added to the <div>, indicating the presence of new response data. Similar to the ingress logic shown above, I’m applying a delay of 500ms so that I can avoid excessive processing and only re-identify the response after it fully loads.

var config = { childList: true };

// Callback function to execute when mutations are observed
var mutationCallback = function(mutationsList) {
  for (var mutation of mutationsList) {
    if (mutation.type == 'childList') {
      if (callback) {
        clearTimeout(callback);
      }

      responseText = $(responseArea).html();

      callback = setTimeout(reIdentifyData, 500);
    }
  }
};

// Create an observer instance linked to the callback function
var observer = new MutationObserver(mutationCallback);

To re-identify any vault-tokenized values, we could use the Skyflow API to return these tokens with the original plaintext PII values, subject to fine-grained access controls. However, because this is an example application and this particular use case likely doesn’t require a very large amount of PII information, I’m caching the tokens and original values in this example Chrome Extension.

This way re-identification is completely done client side, as shown below:

function reIdentifyData() {
 let originalString = responseText;
 let referenceObject = responseArea;

 if(originalString !== undefined) {
   for(let [token, pii] of tokenMap) {
     if(originalString.indexOf(token) >= 0) {
       let modifiedString = originalString.replace(new RegExp(token, 'gi'), pii);

       originalString = modifiedString;

       $(referenceObject).html(originalString);
     }
   }
 }
}

Of course, caching PII in a Chrome Extension wouldn’t work for an industrial-grade version of this application. For that, we’d need to enhance this Chrome Extension to call Skyflow’s detokenize API endpoint, allowing it to de-tokenize vault-generated tokens in contract assistant responses for multiple users – as governed by strict fine-grained access controls.

Final Thoughts

AWS PartyRock provides an exciting set of capabilities for anyone who wants to explore the world of AI application development. It’s exciting to see such a broad range of applications available to run on PartyRock less than two weeks after its release!

But, to move AI applications that handle PII or other sensitive data beyond the proof–of-concept phase, it’s critically important to get a handle on data privacy. Using a Chrome Extension like the one shown here along with Skyflow LLM Privacy Vault enhances the privacy of PartyRock applications so you can harness the potential of Amazon Bedrock, or any LLM, without impacting data privacy.

The best part is, this approach doesn’t impact the usefulness of PartyRock applications because PII de-identification is reversible – so the user experience is unaffected by keeping PII out of AI models.

I hope you have a great time building privacy-preserving applications with PartyRock!

DEV Community

Adding a Privacy Layer to AWS PartyRock

What is Skyflow LLM Privacy Vault?

Privacy During Model Training

Privacy in Inference

Detect and De-identify PII

Creating a Chrome Extension

Monitor, Detect, and De-identify PII

Monitor, Detect, and Re-identify PII

Final Thoughts

Top comments (0)

Read next

API Authentication: Part I. Basic Authentication

Caught in the Crunch My Journey from Snacks to 2 Million Exposed Users Privacy

Forward Proxy vs Reverse Proxy vs Load Balancers

Beyond Docker - A DevOps Engineer's Guide to Container Alternatives