DEV Community

Mehmet Koca
Mehmet Koca

Posted on

Image Recognition with CoreML

Information on entry-level machine learning

In this documentation, basic information about image recognition is explained with CoreML. For the sake of clarity, sample coding will be done on the subject.

alt text

What is CoreML?

Apple's machine learning framework.

With Core ML, you can integrate trained machine learning models into your app.
alt text
A trained model is the result of applying a machine learning algorithm to a set of training data. The model makes predictions based on new input data. For example, a model that's been trained on a region's historical house prices may be able to predict a house's price when given the number of bedrooms and bathrooms.
-Apple Documentation

Note: To use CoreML framework, you must have the minimum version of Xcode 9 or later.

Github Repo

What is Machine Learning?

Machine Learning (ML) is a branch of Artificial Intelligence (AI). Its purpose is to take any input data (images, text, speech, statistics) then predict the features of behaviors found in the data.
ML allows computers, in our case smartphones, to find hidden insights without being explicitly programmed where to look.
Example;

  • Face detection
  • Facial features detection
  • Image Recognition
  • Age prediction

input --> Trained Model --> output

Why is machine learning important?

ML can perform tasks that we thought only humans could do. It adds a human feeling to your apps. It makes apps feel β€œSmart”.
Conclusion

  • ML is not new (only native ios is new)
  • ML field is growing exponentially
  • ML can make your apps do more

Creating User Interface

For a basic application an imageview, a label and a button will suffice.
Note: Designed version of the application is available on Github repo.

alt text

Choosing Photo with UIImagePickerController

First of all, in the file ViewController.swift, image view, label and button clicked function are defined.

@IBOutlet weak var imageView: UIImageView!
@IBOutlet weak var resultLabel: UILabel!

override func viewDidLoad() {
        super.viewDidLoad()
        // ...
}

@IBAction func selectBtnTapped(_ sender: Any) {
        // ...
} 

Before implementing the selectBtnTapped method, you need to declare UIImagePickerControllerDelegate and UINavigationControllerDelegate next to UIViewController to use Picker View.

class ViewController: UIViewController, UIImagePickerControllerDelegate, UINavigationControllerDelegate {
...

It's ready to implement picker view now.

@IBAction func selectBtnTapped(_ sender: Any) {
        let picker = UIImagePickerController()
        picker.delegate = self
        picker.allowsEditing = true
        picker.sourceType = .photoLibrary
        self.present(picker, animated: true, completion: nil)
}

And then, implement the didFinishPickingMediaWithInfo function to complete the photo selection.

func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [String : Any]) {
        imageView.image = info[UIImagePickerControllerEditedImage] as? UIImage
        self.dismiss(animated: true, completion: nil)
}

Finally, in the Info.plist file we create a row named "Privacy - Photo Library Usage Description". In section "Value" you can write the message you want.

alt text

Downloading Model and Creating Functions

Download Places205-GoogleNet model from Apple/Machine-Learning page.
Let's GoogLeNetPLaces.mlmodel embed into our project.

alt text

And then, import CoreML and Vision into ViewController.swift.

import UIKit
import CoreML
import Vision

Implement the recognizeImage function below didFinishPickingMediaWithInfo. The function takes 1 parameter as CIImage named image.

func recognizeImage(image: CIImage) {
        // ...
}

Define selectedImage variable of type CIImage.

@IBOutlet weak var imageView: UIImageView!
    @IBOutlet weak var resultLabel: UILabel!

    var selectedImage = CIImage()

Convert the image taken as UIImage to CIImage, and run recognizeImage() defined inside didFinishPickingMediaWithInfo.

func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [String : Any]) {
        imageView.image = info[UIImagePickerControllerEditedImage] as? UIImage
        self.dismiss(animated: true, completion: nil)

        if let ciImage = CIImage(image: imageView.image!) {
            self.selectedImage = ciImage
        }

        recognizeImage(image: selectedImage)
}

Creating Request with VNCoreMLRequest

First of all, assign "I'm investigating..." to resultLabel.text.

func recognizeImage(image: CIImage) {

        resultLabel.text = "I'm investigating..."
}

Assign the return value of VNCoreMLModel function to variable model.

func recognizeImage(image: CIImage) {

        resultLabel.text = "I'm investigating..."

        if let model = try? VNCoreMLModel(for: GoogLeNetPlaces().model) {

        }
}

Create the request with VNCoreMLRequest function and send the variable model we created above as parameter which is type model.

func recognizeImage(image: CIImage) {

        resultLabel.text = "I'm investigating..."

        if let model = try? VNCoreMLModel(for: GoogLeNetPlaces().model) {
                let request = VNCoreMLRequest(model: model, completionHandler: { (vnrequest, error) in

                })
        }
}

Assign the results of type VNClassificationObservation to results.

func recognizeImage(image: CIImage) {

        resultLabel.text = "I'm investigating..."

        if let model = try? VNCoreMLModel(for: GoogLeNetPlaces().model) {
                let request = VNCoreMLRequest(model: model, completionHandler: { (vnrequest, error) in
                        if let results = vnrequest.results as? [VNClassificationObservation] {

                        }
                })
        }
}

Assign the first result to topResult.

func recognizeImage(image: CIImage) {

        resultLabel.text = "I'm investigating..."

        if let model = try? VNCoreMLModel(for: GoogLeNetPlaces().model) {
                let request = VNCoreMLRequest(model: model, completionHandler: { (vnrequest, error) in
                        if let results = vnrequest.results as? [VNClassificationObservation] {
                                let topResult = results.first
                        }
                })
        }
}

Implement the DispatchQueue.main.async method that will work on the background to create an assumption.

func recognizeImage(image: CIImage) {

        resultLabel.text = "I'm investigating..."

        if let model = try? VNCoreMLModel(for: GoogLeNetPlaces().model) {
                let request = VNCoreMLRequest(model: model, completionHandler: { (vnrequest, error) in
                        if let results = vnrequest.results as? [VNClassificationObservation] {
                                let topResult = results.first
                                DispatchQueue.main.async {

                                }
                        }
                })
        }
}

Compute the confidence rate of the value stored in topResult and assign it to confidenceRate. And then, for display purposes resultLabel.text is used.

func recognizeImage(image: CIImage) {

        resultLabel.text = "I'm investigating..."

        if let model = try? VNCoreMLModel(for: GoogLeNetPlaces().model) {
                let request = VNCoreMLRequest(model: model, completionHandler: { (vnrequest, error) in
                        if let results = vnrequest.results as? [VNClassificationObservation] {
                                let topResult = results.first
                                DispatchQueue.main.async {
                                        let confidenceRate = (topResult?.confidence)! * 100
                                        self.resultLabel.text = "\(confidenceRate)% it's \(String(describing: topResult?.identifier))"
                                }
                        }
                })
        }
}

Handler with VNImageRequestHandler

Firstly below model method, Create handler constant with VNImageRequestHandler(Option with ciImage).

let handler = VNImageRequestHandler(ciImage: image)

And then, implement DispatchQueue.global(qos: .userInteractive).async method that will work on the background as we did before while computing the confidence rate.

alt text

let handler = VNImageRequestHandler(ciImage: image)
DispatchQueue.global(qos: .userInteractive).async {

}

Create do-catch block and process request with handler.perform().

let handler = VNImageRequestHandler(ciImage: image)
DispatchQueue.global(qos: .userInteractive).async {
    do {
            try handler.perform([request])
    } catch {
            print("Err :(")
    }
}

it's done. The final state of recognizeImage function:

func recognizeImage(image: CIImage) {

        resultLabel.text = "I'm investigating..."

        if let model = try? VNCoreMLModel(for: GoogLeNetPlaces().model) {
            let request = VNCoreMLRequest(model: model, completionHandler: { (vnrequest, error) in
                if let results = vnrequest.results as? [VNClassificationObservation] {
                    let topResult = results.first
                    DispatchQueue.main.async {
                        let confidenceRate = (topResult?.confidence)! * 100
                        let rounded = Int (confidenceRate * 100) / 100
                        self.resultLabel.text = "\(rounded)% it's \(topResult?.identifier ?? "Anonymous")"
                    }
                }
            })
            let handler = VNImageRequestHandler(ciImage: image)
            DispatchQueue.global(qos: .userInteractive).async {
                do {
                    try handler.perform([request])
                } catch {
                    print("Err :(")
                }
            }
        }
    }

Final Status and Trying

Simple CoreML application is completed using Places205-GoogLeNet.

alt text

Github Repo
Read on Gitbook

Latest comments (1)

Collapse
 
bigmit2011 profile image
bigmit2011

Nice tutorial. It seems you don't resize images? I want to resize images using bilnear filtering (one method of resizing), because that's what my model expects.

How would I go about doing so?