One article to understand Huawei HMS ML Kit text recognition, bank card recognition, general card identification

irene83018774 profile image Irene ・7 min read

1 About This Document

Check out the machine learning service business introduction on the Huawei Developer website

It can be seen that Huawei HMS divides machine learning services into four major services: text-related services, language-related service, image-related services, and face/body-related services. One of them is text-related services. Including text recognition, document recognition, bank card recognition, general card recognition, what are the differences and associations between these sub-services?I will try to explain.

2 Application Scenario Differences

First, let's look at the sub-services of the text-related services and the scenario differences.
Alt Text
Text service SDKs are classified into device APIs and cloud APIs. Device APIs are processed and analyzed only on the device side and use the computing resource such as the CPU and GPU of the device. Cloud APIs need to send data to the cloud and use the server resources on the cloud for processing and analysis, all the services have device-side APIs except the document identification service, which requires a large amount of computing data to be processed on the cloud. To simplify the analysis scope, we only describe the device-side API service in this document.

2.1 Scenario Comparison

As shown in the preceding table, the application scenarios of different capabilities are different.

  • 2.1.1 Text recognition: It is more like a versatile talent. Anything can be done, as long as it is text, it can be recognized. Alt Text Text OCR application scenarios Alt Text Text OCR does not provide a UI. The UI is implemented by developers.
  • 2.1.2 Bank card identification: more like a partial student, only a certain subject is excellent.
  • default customized box is provided for bank cards. You can quickly extract bank card numbers by directly aligning with the box. Alt Text Bank card identification
  • 2.1.3 General cards: Between the above two categories, with certain attainments in a certain field. Can extract text from all cards. In addition, a card alignment box is provided to prompt users to align cards to be identified. Alt Text General card identification

2.2 How to Choose

Bank Card OCR are selected for identification bank cards. For other types of cards, general cards identification are used for identification. For other scenarios, text recognition is used.

3 Service Integration Differences

Compilation Dependency Differences

In order to facilitate everyone's understanding, first explain the following concepts:
Alt Text

  • Basic SDK APIs provided for developers. All APIs are opened through the basic SDK.
  • Plug-in The calibration box mentioned in the previous scene comparison summary provides an interface to verify the input quality of the image frame. If it does not meet the requirements, can prompt the user to reposition it.
  • Model package This is the core of Huawei's HMS ML Kit services. It contains a large number of samples input through a machine learning platform to learn and generate interpreter model files. The following table summarizes the compilation dependencies of different services. Alt Text

Compilation Dependency Summary

According to the preceding compilation dependency, all services need to integrate the corresponding basic SDK and model package. However, Bank Card recognition, and General Card recognition have corresponding plug-ins, which are the calibration boxes mentioned above. In terms of models, Bank Card recognition use a dedicated model package, while General Card recognition and text recognition uses a general model package.

Development Differences

First, let's see how to integrate the services. The detailed steps are not described here. You can view the development steps of the corresponding services on the Developer website. https://developer.huawei.com/consumer/en/doc/development/HMS-Guides/ml-introduction-4 The following describes the main development procedure of the corresponding service:

Text recognition

  1. Create an identifier. MLTextAnalyzer analyzer = MLAnalyzerFactory.getInstance().getLocalTextAnalyzer(setting);
  2. Create a fram object and transfer the image bitmap. MLFrame frame = MLFrame.fromBitmap(bitmap);
  3. Send the frame object to the recognizer for recognition. Task task = analyzer.asyncAnalyseFrame(frame);
  4. Result handling Task task = analyzer.asyncAnalyseFrame(frame); task.addOnSuccessListener(new OnSuccessListener() { @Override public void onSuccess(MLText text) { // Recognition success. } }).addOnFailureListener(new OnFailureListener() { @Override public void onFailure(Exception e) { // Recognition failed. } });

Bank Card recognition

  1. Start the UI to identify the bank card. private void startCaptureActivity(MLBcrCapture.Callback callback) {
  2. Rewrite the callback function to process the recognition result. private MLBcrCapture.Callback callback = new MLBcrCapture.Callback() { @Override public void onSuccess(MLBcrCaptureResult bankCardResult){ // Identify the success. } };

General Card recognition

  1. Start the interface to identify the general card. private void startCaptureActivity(Object object, MLGcrCapture.Callback callback)
  2. Rewrite the callback function to process the recognition result. private MLGcrCapture.Callback callback = new MLGcrCapture.Callback() { @Override public int onResult(MLGcrCaptureResult cardResult){ //Successful identification processing The return MLGcrCaptureResult.CAPTURE_STOP;// processing is complete, and the identification is exited. } };

Development Summary

According to the preceding comparison, the processing logic is similar except that no GUI is provided for text recognition. The images to be recognized are transmitted to the SDK and the recognition result is obtained through the callback function. The core difference is that the returned structured data is different. To facilitate understanding, the following tables are provided:

  • Return the content summary. Alt Text According to the preceding comparison, the bank card recognition return the directly processed identification content. You can directly obtain the bank card number through the interface without considering how the content is extracted. However, the text recognition and general card recognition return the full identification information, it contains text content such as blocks, lines, and words. If you want to obtain the required information, you need to extract the full information that is identified. For example, you can use the regular expression to match consecutive x digits to identify a card number or match the content after a recognized keyword. Based on the preceding analysis, the development difficulty comparison is as follows:
  • Development difficulty comparison summary Alt Text

4 Technical Difference Analysis

Based on the preceding difference analysis, we can see that text-related services are different in scenarios, service integration, also has some association. For example, Text recognition and General Card recognition use the same general machine learning model. The following analyzes and explains the technical differences from the technical perspective. As described in the compilation dependency analysis, the basic SDK and model package need to be integrated for text services, and plug-ins need to be integrated for some services to generate calibration boxes. What is the model package? You may be familiar with machine learning. Machine learning is usually divided into the collection of training samples, feature extraction, data modeling, prediction, etc. The model is actually a "mapping function" learned through training samples, feature extraction and other actions in machine learning. In HUAWEI HMS ML Kit, this mapping function is not enough. It needs to be executed, which is called the interpreter framework. In addition, some algorithms need to perform pre-processing and post-processing on the image, for example, converting an image frame into a corresponding eigenvector. To facilitate understanding, the preceding content is collectively referred to as a model file. To enable these model files to run on the mobile phone, the model files further need to be optimized, for example, a running speed of the model files on the mobile phone terminal is optimized, and a size of the model files is reduced.

Differences and association analysis

Now, let's look at the differences and relationships between text services. To facilitate understanding, the following figure shows the differences and relationships between text services.
Alt Text
Text recognition integration mode

Text recognition
The training is carried out using a general text data set. His advantages are wide application range and high flexibility. As long as the text content can be recognized.

General card recognition
It is the same as the data set used for text recognition, so there is no difference between the model files, but a general card plug-in is added. The main function is to ensure that the user points the card to the center of the camera, and also recognizes the reflective and blurred images , if the requirements are not met, the user is prompted to readjust, so that the recognition accuracy of the card can be improved.

Bank Card OCR
The bank card recognition service uses the dedicated data training set of the bank card. We all know that the characters on the bank card are greatly different from those in common print. In addition, the characters are convex. If the general model is used, it is difficult to achieve high accuracy, the training uses the dedicated data sets of bank cards and ID cards to improve the accuracy of ID card and bank card identification. In addition, targeted pre-processing is performed for bank cards. For example, the image quality and tilt angle can be dynamically detected in real time, and an alignment box can be generated to restrict the location of cards, if the image is blurred, reflected, or not aligned with the calibration box, the user is prompted to re-align the image.

5 Summary

Based on the preceding analysis, the conclusion is as follows:
Alt Text
How, after reading this article, what feeling, come to express your opinion quickly!

DemoGithub address:

Questions and discussions:

Posted on May 26 by:


markdown guide