DEV Community

Cover image for Confidential Optical Character Recognition Service With Cape
Yann Dupis
Yann Dupis

Posted on • Edited on

Confidential Optical Character Recognition Service With Cape

Cape has recently deployed a confidential optical character recognition (OCR) service. Anyone can try it through the Cape UI after signing up through Cape’s website. You can also integrate the confidential OCR service with your application using the cape-js and pycape SDKs. In this blog post, we will use this OCR service example to demonstrate how you can benefit from a machine learning service while maintaining users’ data confidentiality.

OCR services are becoming increasingly popular in various industries, from financial services to healthcare, when you need to process a large number of documents for the following reasons:

  • Efficiency: eliminate the need for manual data entry and streamline the document management process.
  • Accuracy: reduce the chance of errors when extracting data manually from documents.
  • Searchable: OCR can convert a scanned document into text, allowing one to search for specific information within the document.
  • Cost-saving: helps organizations save money by reducing the time and effort required to process documents.

Even though the benefits of this type of machine learning service are clear, they often involve our most sensitive information from credit scores, bank accounts, financial statements, medical records, etc. As consumers, we can access OCR commercial services such as Google Vision API or Amazon Textract. However, even if they use strong security practices, such as encrypting the data “in transit” and “at rest”, they often don’t guarantee the confidentiality of this data during the entire process. For example, the data is decrypted while being processed by the machine learning service, thus exposing users’ confidential data. They will also often use this data to train their machine learning model to improve the accuracy of their service.

As a startup working with businesses, one of the questions your clients will always ask is: “how do you guarantee the security of our data?” If you use one of the commercial applications, it will be a challenge to address your clients’ concerns, because you can’t control the security and confidentiality of this data during the entire process.

How does Cape protect data confidentiality?

To maintain data confidentiality, Cape uses a combination of two technologies: encryption and secure enclave. With Cape, when you call the OCR service, your data is immediately encrypted locally. Once the data is encrypted, it can only be processed within a secure enclave, which is an isolated VM. The enclave has no storage, no network, and no interactive user access. No one can see what the enclave is processing; it's a black box. Only the secure enclave has access to your private decryption keys to process your data. You have a direct, end-to-end encrypted connection with the enclave to send inputs and receive results. And when the enclave finishes running, it is destroyed forever, leaving no traces in memory.

The OCR model

For its OCR service, Cape uses the excellent Python docTR library. Some of the critical benefits of docTR are its ease of use, flexibility, and matching state-of-the-art performance. The OCR model consists of two steps: text detection and text recognition. Cape uses a pre-trained DB Resnet50 architecture for detection, and for recognition, it uses a MobileNetV3 Small architecture. To learn more about the level of OCR accuracy you can expect for your document, you can consult these benchmarks provided by docTR. As you will see, model performance is very competitive compared to other commercial services.

Invoke the OCR service with cape-js

In this blog post, we focus on invoking the OCR service. However, if you want to learn how to deploy your machine learning model with Cape, you can check this example, where we deploy an image classification model with the ONNX runtime.

Alright, let’s invoke the confidential OCR service on a PDF. Before invoking the service, you need to first sign up with Cape from Cape’s website. Then to authenticate with Cape from the SDKs, you need to generate a personal access token. You can create it from your account page by navigating to the “Personal Access Token” section. Note that you can also sign up and generate a personal access token using Cape’s CLI.

Once you have your access token, you can invoke the OCR service. Here is a code snippet showing you how to do it with cape-js.

import { Cape } from '@capeprivacy/cape-sdk'
import * as fs from 'fs'
import * as crypto from 'crypto'
import * as pkijs from 'pkijs'

// If you run this script from a node environment
// set the engine to "nodeEngine"
const name = 'nodeEngine'
pkijs.setEngine(
  name,
  new pkijs.CryptoEngine({
    name,
    crypto: crypto.webcrypto,
  })
)

// Load your PDF
const pdf = fs.readFileSync('./path/to/some-file.pdf')

// Get a personal access token from the UI or the CLI with
// cape token create --name ocr
const authToken = '<YOUR TOKEN>'

// Instantiate a Cape object with your auth token and the URL
// "wss://ocr.capeprivacy.com". Setting the URL to "wss://ocr.capeprivacy.com"
// will guarantee the OCR model is deployed to larger instances with required
// dependencies.
const cape = new Cape({ authToken, enclaveUrl: 'wss://ocr.capeprivacy.com' })

// Invoke the OCR service on your PDF by setting the function ID to
// 'capedocs/ocr-doctr-onnx-1.0'
const result = await cape.run({ id: 'capedocs/ocr-doctr-onnx-1.0', data: pdf })

// Print OCR transcript
console.log(JSON.parse(result).ocr_transcript)
Enter fullscreen mode Exit fullscreen mode

As you can see, you just need to instantiate a Cape object with your token and the Cape URL for machine learning services. Then you just need to call cape.run on your PDF with the ID set to the capedocs/ocr-doctr-onnx-1.0 to select the OCR service. Your PDF is immediately encrypted and processed within the secure enclave during this process. Nobody, even at Cape, can access your PDF. Once the OCR process is complete, the service will return a transcript and the bounding boxes for every word detected in the PDF. You can learn more about the model output in the documentation.

With Cape, you can also easily encrypt your data and store the encrypted result in a file for later use. This can be extremely useful, for example, when you collect PDFs on the client side from the browser, then run the OCR on these encrypted PDFs on the server side. Here is a code snippet showing how you can encrypt your data and then invoke the OCR service:

// Encrypt your PDF with cape.encrypt. When invoking this method, by default,
// the SDK will retrieve the public encryption key associated with
// your account
const encryptedPdf = await cape.encrypt(pdf)

// Invoke the OCR service on your encrypted PDF by setting the function ID to
// "capedocs/ocr-doctr-onnx-1.0"
const result = await client.run({ id:"capedocs/ocr-doctr-onnx-1.0", data: encryptedPdf });

// Print OCR transcript
console.log(JSON.parse(result).ocr_transcript);
Enter fullscreen mode Exit fullscreen mode

To encrypt your PDF you just need to call cape.encrypt. After encrypting the PDF, the only place it can be decrypted is within the secure enclave. One of the critical benefits of Cape is that you don't need to manage keys, select encryption protocols, or spend any time implementing them. This allows every developer to make their applications secure. If you want to learn more about Cape encryption, you can check out the documentation.

Invoke the OCR service with pycape

You can also call the OCR service from Python using pycape:

from pycape import Cape

# Load your PDF
with open('./path/to/some-file.pdf', "rb") as f:
       pdf = f.read()

# Instantiate a Cape object with the URL "wss://ocr.capeprivacy.com". 
# Setting the URL to "wss://ocr.capeprivacy.com" will guarantee the OCR model is 
# deployed to larger instances with required dependencies. 
cape = Cape(url="wss://ocr.capeprivacy.com")

# Get a personal access token from the UI or the CLI with
# cape token create --name ocr
auth_token = "<YOUR TOKEN>"
t = cape.token(auth_token)

# Encrypt your PDF with cape.encrypt. When invoking this method, by default,
# the SDK will retrieve the public encryption key associated with
# your account
encrypted_pdf = cape.encrypt(pdf)

# Select the Cape function you would like to invoke.
# Since we want invoke the ocr service, set the function ID
# to "capedocs/ocr-doctr-onnx-1.0"
f = cape.function("capedocs/ocr-doctr-onnx-1.0")

# Invoke the OCR service
result = cape.run(f, t, encrypted_pdf)

# Print the transcript
print(f"OCR transcript: {json.loads(result)['ocr_transcript']}")

# Print the bounding boxes
print(f"OCR records: {json.loads(result)['ocr_records']}")
Enter fullscreen mode Exit fullscreen mode

If you don’t need to store the encrypted PDF, you can simply invoke the OCR service on the loaded PDF with cape.run.

The code examples presented in this blog post are in the Cape's function examples repository. And you can learn more from the documentation.

I hope this OCR service example using Cape demonstrates how you can benefit from machine learning services while keeping your users’ most sensitive data confidential -- even if you don’t have experience in cryptography!

Check out the Getting Started Docs to try Cape for free. We'd love to hear what you think.

Top comments (0)