DEV Community

Doug Sillars for Mindee

Posted on • Originally published at mindee.com

Parsing international passports with Mindee's API

Parsing international passports with Mindee's API

 

Many onboarding processes in mobile or web apps require to extract some data from ID documents. In this tutorial, you will learn how to automatically extract data from passports to offer to your users the best onboarding experience.

 

We will walk through the steps to use Mindee’s Internation Passport Parsing API.  Let’s get started! 

 

API Prerequisites

  1. You’ll need a free Mindee account. Sign up and confirm your email to login.
  2. A picture of a passport.  You can use yours safely as data protection is one of our priority and we won't send your images to any third party application. You can also download a fake one here.

 

 

 

 

Setting up the API

 

Log into your Mindee account and access your passport API environment by clicking the International Passport card:

 

 

passport api mindee

 

To activate the API, click the “Try for Free” button. This will give you access the API for free 50 times a month.  There are four sections on the API landing page, noted in the left navigation. You are currently on the dashboard, and there are additional links to “Documentation”, “Credentials” and “Live Interface”.  The docs tab has all of the technical details you’ll need to build for the passports API endpoint, and the Live Interface is a cool interactive demo.

We'll start by clicking "credentials" and creating a new API token, in this case, named Tutorial:

 

 

Click “Add New Key” and you’ll be able to see your API token,.

 

If you move back to the documentation tab, you can pick your API token, and the language, and create an API call.  Here is the API call in cURL:

curl -X POST \ https://api.mindee.net/products/passport/v1/predict \ 
-H 'X-Inferuser-Token: {apiToken}’ \
-H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \
-F file=@/path/to/your/file.png

 

Simply replace {apiToken} with your new API token and /path/to/your/file/png with the path to your receipt. 


That's it! We're all set to run this command in the terminal, and see the data that Mindee can extract.

In this tutirla, I'll use a sample [Cyprus passport](https://commons.wikimedia.org/wiki/File:Cyprus_passport_data_page.jpg) from Wikimedia.

Cyprus Passport used in this example

 

Extracted fields

 

The JSON response can be a bit verbose, so we will break it into sections to describe what we are seeing.  The initial sections describe the API and the API response:

"call": {
        "endpoint": {
            "name": "passport",
            "version": "1.0"
        },
        "finished_at": "2020-09-04T20:57:51+00:00",
        "id": "7d81f5fe-fce1-4b63-b3de-cc5948c39b55",
        "n_documents": 1,
        "n_inputs": 1,
        "processing_time": 1.988,
        "started_at": "2020-09-04T20:57:49+00:00"
    },
    "documents": [
        {
            "id": "57aded44-a216-45be-859c-f949cc5b011d",
            "name": "1024px-Cyprus_passport_data_page.jpg"
        }
    ],

Here, we can see that passport API was called, and it processed one document in about 2 seconds.  What we are really interested in seeing is the results, and these can be found in the predictions section of the JSON file:

 

Predictions

 

The API response is in alphabetical order of items extracted from the document:

 

Birth Date

"birth_date": {
                "probability": 1.0,
                "segmentation": {
                    "bounding_box": [
                        [.053,  0.912],
                        [0.955,0.912],
                        [0.955,0.969],
                        [0.053,0.969]
                    ]
                },
                "value": "1970-01-01"
            },

The API matched the brithdate in the MRX fields (more on that in a minute), so is 100% confident that the birthdate is 1 January, 1970. The bounding box gives four (x,y) points in the image where the date of birth can be found.

 

Country of issuance

The API then attempts to parse the country that issued the passport. 

"country": {
                "probability": 0.99,
                "segmentation": {
                    "bounding_box": [
                        [0.479,0.199],
                        [0.522,0.199],
                        [0.522,0.234],
                        [0.479,0.234]
                    ]
                },
                "value": "CYP"
            },

 

The API is 99% certain that Cyprus is the issuing country. This is really incredible, as the API team tells me that there are no Cyprus passports in the data training set. Again, 4 (x,y) coordinates in the image point out where CYP is used.

 

Expiration Date

"expiry_date": {
                "probability": 1.0,
                "segmentation": {
                    "bounding_box": [
                        [0.053,0.912],
                        [0.955,0.912],
                        [0.955,0.969],
                        [0.053,0.969]
                    ]
                },
                "value": "2020-12-01"
            },

Many countries do not allow entry with under six months of validity left on the passport.  This woman should probably start the renewal process,  as the API has identified that this passport will expire on 1 December, 2020.

 

Gender

            "gender": {
                "probability": 0.1,
                "segmentation": {
                    "bounding_box": [
                        [0.053,0.912],
                        [0.955,0.912],
                        [0.955,0.969],
                        [0.053,0.969]
                    ]
                },
                "value": "F"
            },

The API's confidence is not high on this one, just 10%, but it does correctly identify the gender as female.

 

Given Name

            "given_names": [
                {
                    "probability": 0.99,
                    "segmentation": {
                        "bounding_box": [
                            [0.048,0.844],
                            [0.952,0.844],
                            [0.952,0.895],
                            [0.048,0.895]
                        ]
                    },
                    "value": "AFRODITI"
                }
            ],

The API is really certain (99% confidence) that the given name on the passport is Afroditi, meaning that our passport holder was named after the Greek goddess of beauty and love.

 

ID Number

            "id_number": {
                "probability": 1.0,
                "segmentation": {
                    "bounding_box": [
                        [0.673,0.197],
                        [0.774,0.197],
                        [0.774,0.233],
                        [0.673,0.233]
                    ]
                },
                "value": "K00000413"
            },

While we are not numbers, in many ways, we are identified by numbers. In this case, the passport ID number, which must be used when booking flights, checking into hotels, etc.  The API has found the location (marked by the 4 (x,y) points in the bounding box, for the Passport number, which is K00000413.

 

Issuance Date

Contining the alphabetical trip through the passport fields, we next come to the date the passport was issued:

 

            "issuance_date": {
                "probability": 0.18,
                "segmentation": {
                    "bounding_box": [
                        [0.353,0.586],
                        [0.426,0.586],
                        [0.426,0.62],
                        [0.353,0.62]
                    ]
                },
                "value": "2014-01-21"
            },

In this case, the API missed the issuance date.  However, while our API does miss in it's predictions, in this case, the issue is due to the fact that the passport is a sample (we'll have a blog post on this soon).  The issuance date is missing from the MRZ zone of the sample passport, so the API is unable to accurately measure the issuance date.

 

Surname

"surname": {
                "probability": 0.98,
                "segmentation": {
                    "bounding_box": [
                        [0.33,0.272],
                        [0.444,0.272],
                        [0.444,0.309],
                        [0.33,0.309]
                    ]
                },
                "value": "ANONYMOU"
            }

Our fake passport holder has a surname of Anonymou, which is correctly identified by the API, along with bounding boxes.

 

MRZ1 and MRZ2

The last two regions identified by the API (which I pulled *slightly* out of alphabetical order for readability) are the 2 Machine Readable Zone (MRZ) lines:

"mrz1": {
                "probability": 0.99,
                "segmentation": {
                    "bounding_box": [
                        [0.048,0.844],
                        [0.952,0.844],
                        [0.952,0.895],
                        [0.048,0.895]
                    ]
                },
                "value": "P<CYPANONYMOU<<AFRODITI<<<<<<<<<<<<<<<<<<<<<"
            },
"mrz2": {
                "probability": 0.1,
                "segmentation": {
                    "bounding_box": [
                        [0.053,0.912],
                        [0.955,0.912],
                        [0.955,0.969],
                        [0.053,0.969]
                    ]
                },
                "value": "K000004134CYP7001017F2012010<<<<<<<<<<<<<<<4"
            },

The MRZ zones (if you cannot tell from the JSON above) are the two lines of machine readable text at the bottom of every passport page.  MRZ1 is line one, and MRZ2 is line 2.  Most of the identifying information can be pulled from this area.  The confidence interval for line one is high, as the fake passport has this area filled out correctly.  However, the confidence level is low, as the algorithm expected more characters other than "<" to supply the issuance date.

 

Conclusion

 

And there you have it! Mindee's passport API parsed a passport in under 2 seconds, matching nearly all of the sections of the passport quickly and accurately. Give it try with our free tier, and let us know in the chat how it works for you!

Top comments (0)