Introduction
In this blog we build a secure sentiment analysis app using TensorFlow Lite and deploy it on the Cape Privacy secure cloud enclave system. The app is available to use here.
What is Sentiment Analysis and How Cape Makes it Secure?
Sentiment analysis is an application of natural language processing (NLP) that classifies the sentiment of text, typically as either positive or negative. Because vast amounts of data exist in textual form, sentiment analysis has a lot of practical applications including social media monitoring, customer feedback analysis, news analysis, market research etc. Processing this type of data in an automated manner therefore allows for extracting valuable information efficiently.
However, what if the textual input data that we need to analyze is sensitive or needs to stay confidential? This is where the Cape Privacy secure enclave system comes in. Cape Privacy provides a confidential computing platform based on AWS Nitro enclaves for security and privacy-minded developers. Cape allows for running serverless functions on encrypted data and ensures that sensitive data or intellectual property within apps is protected.
Cape provides a command line interface (CLI) and also a Python and JavaScript software development kits (SDKs) called pycape and cape-js that allow developers to deploy their apps and allow users to interact with them in a secure manner.
There are three essential components that enable this: cape encrypt, cape deploy, and cape run. The command cape encrypt encrypts inputs that can be sent into the Cape enclave for processing, cape deploy performs all needed actions for deploying a function into the enclave, and finally cape run invokes the deployed function with an input that was previously encrypted with cape encrypt. Learn more on the Cape docs.
How to Build a Sentiment Analysis App with Cape?
Training a Text Classification Model
The function that we wish to deploy and run in case of sentiment analysis is a text classification model and therefore, we need to first define its architecture and train it.
To make the model light weight we choose to use TensorflowLite and its Model Maker library. The training data that we use is the SST-2 (Stanford Sentiment Treebank), a commonly used dataset published by Socher et al. (2013) that consists of over 60,000 movie reviews that have been labeled as positive or negative. For the model architecture we use the average word embedding, which produces a model that is small and therefore, can perform fast inference. The following code snippet shows the model definition and training procedure and exports the trained model and its vocabulary as a TensorFlow Lite model.
Install dependencies:
!sudo apt -y install libportaudio2
!pip install -q tflite-model-maker-nightly
Import libraries:
import numpy as np
import os
from tflite_model_maker import model_spec
from tflite_model_maker import text_classifier
from tflite_model_maker.config import ExportFormat
from tflite_model_maker.text_classifier import AverageWordVecSpec
from tflite_model_maker.text_classifier import DataLoader
import tensorflow as tf
import pandas as pd
assert tf.__version__.startswith('2')
tf.get_logger().setLevel('ERROR')
Prepare training data:
df.to_csv(new_file)
# Replace the label name for both the training and test dataset. Then write the
# updated CSV dataset to the current folder.
replace_label(os.path.join(os.path.join(data_dir, 'train.tsv')), 'train.csv')
replace_label(os.path.join(os.path.join(data_dir, 'dev.tsv')), 'dev.csv')
spec = model_spec.get('average_word_vec')
train_data = DataLoader.from_csv(
filename='train.csv',
text_column='sentence',
label_column='label',
model_spec=spec,
is_training=True)
test_data = DataLoader.from_csv(
filename='dev.csv',
text_column='sentence',
label_column='label',
model_spec=spec,
is_training=False)
Train model:
model = text_classifier.create(train_data, model_spec=spec, epochs=10)
Evaluate model:
loss, acc = model.evaluate(test_data)
Export model as Tensorflow Lite:
model.export(export_dir='model')
model.export(export_dir='model', export_format=[ExportFormat.LABEL, ExportFormat.VOCAB])
Create a Function
Any function that is deployed with Cape needs to be named app.py, where app.py needs to contain a function called cape_handler() that takes the input that the function processes and returns the results. In the case of the sentiment analysis app the input is the text that we wish to classify and the output is the sentiment that can be negative or positive.
The code snippet below shows our app.py. We can see that the cape_handler() function loads the TensorFlow Lite model that we previously trained and also its vocabulary. Additionally, the handler also vectorizes the text inputs using the vocabulary such that the inputs are encoded as numeric vectors before we run inference on them. The model then predicts the sentiment of this encoded text and outputs its predicted sentiment.
Import libraries:
import numpy as np
from tflite_runtime.interpreter import Interpreter
import contractions
Load vocabulary function:
def load_vocab(path):
vocabulary = {}
with open(path, "r") as f:
for i, line in enumerate(f.readlines()):
item = line.strip().split(" ")
word = item[0]
encoding = int(item[1])
vocabulary[word] = encoding
return vocabulary
Text vectorization function:
def vectorize_text(text, vocabulary, input_shape):
encoded_text = []
# Fix contractions
expanded_words = []
for word in text.split():
expanded_words.append(contractions.fix(word))
text = " ".join(expanded_words)
text = text.split(" ")
for word in text:
word = word.lower() # convert to lower case
# account for words not in vocabulary
if word in vocabulary.keys():
word_encoding = vocabulary[word]
else:
word_encoding = vocabulary["<UNKNOWN>"]
encoded_text.append(word_encoding)
encoded_text = np.array(encoded_text, dtype=np.int32)
encoded_text = np.pad(
encoded_text, (0, input_shape[1] - len(encoded_text)), "constant"
)
encoded_text = np.reshape(encoded_text, (input_shape[0], input_shape[1]))
return encoded_text
Cape handler:
def cape_handler(text):
text = text.decode("utf-8")
# Load vocabulary
vocabulary = load_vocab("./vocab.txt")
# Load the TFLite model and allocate tensors.
interpreter = Interpreter(model_path="./model.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Predict
input_shape = input_details[0]["shape"]
input_data = vectorize_text(
text=text, vocabulary=vocabulary, input_shape=input_shape
)
interpreter.set_tensor(input_details[0]["index"], input_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]["index"])
output_result = np.argmax(output_data)
if output_result == 1:
result = "positive"
else:
result = "negative"
prob = output_data[0][output_result] * 100
return (str(float(f'{prob:.2f}')) + "% " + result) or "You've stumped me! Please try a different phrase."
Deploy with Cape
To deploy our function with Cape, we first need to create a folder that contains all needed dependencies. For this sentiment analysis app, that deployment folder needs to contain the app.py above, the trained TFLite model and its vocabulary. Additionally, because the app.py program imports some external libraries, the deployment folder needs to have those as well. We can save a list of those dependencies into a requirements.txt file and run docker to install those dependencies into our deployment folder called app as follows:
sudo docker run -v `pwd`:/build -w /build --rm -it python:3.9-slim-bullseye pip install -r requirements.txt --target ./app/
Now that we have everything ready, we can log into Cape:
cape login
Your CLI confirmation code is: GZPN-KHMT
Visit this URL to complete the login process: https://login.capeprivacy.com/activate?user_code=GZPN-KHMT
Congratulations, you're all set!
And after that we can deploy the app:
cape deploy ./app
Deploying function to Cape ...
Success! Deployed function to Cape
Function ID ➜ CzFFUHDyjq6Uqm8MCVfdVc
Checksum ➜ eb989a5ef2fabf377a11ad5464b81b67757fada90a268c8c6d8f2d95013c4681
Invoke with Cape
Now that the app is deployed, we can pass it an input and invoke it with cape run:
cape run CzFFUHDyjq6Uqm8MCVfdVc "This was a great film"
78.08% positive
JavaScript Front-end with Cape SDK
In addition to the CLI, Cape also provides Python and JavaScript SDKs. Moreover, the CLI also allows developers to generate tokens for their functions as follows:
cape token <function ID> -- expires <number of seconds>
We can then use cape-js to invoke the function deployed in the enclave. First, we need to install cape-js with:
npm install @capeprivacy/cape-sdk
Or:
yarn add @capeprivacy/cape-sdk
Then we can import it to our JavaScript program:
import { Cape } from '@capeprivacy/cape-sdk';
Within our JavaScript program that we used to create the front-end, we can use the function token to connect to the enclave using cape-js as follows.
const client = new Cape ({ functionToken: <function token>});
The function id is then used to run the function that we previously deployed in the enclave with cape deploy.
await client.run({ id: '<FUNCTION_ID>', data: 'input' });
Using JavaScript and cape-js, we have created a front-end for the sentiment analysis application that allows users to go to a website, enter any text, click a button, and see the predicted sentiment. Go ahead, and try it yourself here.
Conclusion
In this blog we walked through one example use case for Cape’s confidential computing platform based on AWS Nitro enclaves. Specifically, we built a sentiment analysis application with TensorFlow lite that classifies the sentiment of any text as positive or negative. We have shown how this app can be seamlessly deployed with Cape’s CLI to ensure that the textual data processing is performed securely. In addition to the CLI, we have also showcased how cape-js, Cape’s JavaScript SDK that can be used within JavaScript programs, can connect to an enclave and run any deployed function. The front-end that we built gives Cape’s users a GUI interface for interacting with the sentiment analysis app in addition to the CLI.
Top comments (0)