You Should Accept the Occasional Faults by Machine Learning

Deebul Nair — Mon, 14 Aug 2017 12:59:12 +0000

One will never achieve a 100% accurate machine learning algorithm. If you have one then please take the trouble to prove us that its not over-fitting the training dataset.

The best performing machine learning algorithms for any particular task achieves around 90% accuracy. For example, parts-of-speech tagging problem in the field of natural language processing, the state-of-art is 97% . Similarly in the most famous Imagenet challenge for Large scale visual recognition, the 2016 results show 97% accuracy for object detection.

Thus the conclusion being you will never achieve 100% accuracy. Which implies some of the output from the machine learning will be incorrect .

Based on these 2 facts, there is one most important conclusion we can draw, we should not write deterministic code based on the outputs of the machine learning algorithms. Currently all our codes are based on deterministic style, basically we know if the bit is either 0 or 1.
For example

int i = 5

we know that the value of i is 5, full deterministic. But if the value of i came from a machine learning algorithm, we cannot not be sure that the value is 5.
For a real world example, if the image classifier says the object is a table we cannot be fully deterministic that the object is table because there is 4% chance that the object is not a table.

This information about machine learning has greater impact on the software developed which uses prediction of machine learning algorithms in their software architecture.

We face this problem based on our experience from our participation in Robocup@work competition. The competition is about mobile robot in an industrial environment and solving the pick and place task.

Before picking robot perceives the location and decides which object to pick.

We use a CNN based deep learning algorithm to classify the objects that needs to be picked with 99% accuracy on our training dataset.

The problem is sometimes the objects are mis-classified. But as we were not taking care of this corner condition, we ended up picking wrong objects and loosing lot of points during the competition. These problems can be addressed using some solutions to accommodate the corner conditions.

There are different solutions for robustly using machine learning predictions in your software architectures. Some solutions which we have used are:

Using Confidence Parameter of the Machine Learning algorithms

Most of the state-of-art machine learning algorithms(SVM, Random forest, CNN etc) provide a confidence parameter for their predictions. This should be used to determine which predictions should be used.
But not algorithms provide the expected confidence measures. For example in CNN classification sometimes it mis-classifies with 100% confidence. This makes it difficult to use this as parameter for discarding the faulty prediction.

Repeated Predictions

Never use just 1 time prediction in real world. For example, if doing image classification we should take multiple images from different angles or different cameras and do multiple prediction. Then find a knowledge fusion method to combine the predictions from the multiple prediction to determine what is the correct prediction.

These are some simple solutions but more complex methods using filters and Bayesian methods will improve results.

Conclusion

The main take away point from the discussion is that machine learning algorithms output are not deterministic. So the software which uses these output(predictions) cannot be just simple if-else condition based, on the contrary they should have mechanisms through which it can accommodate the faults of the predictions.

Please comment on the various methods you have used to accommodate the faults in prediction of your algorithms.

CPP output coloured text in console

Deebul Nair — Thu, 29 Jun 2017 11:43:01 +0000

Coloured Debug output

Print statements are the most widely used debug method for coding in any programming. Its always useful to have a bag of methods for good print statements for debugging.

One of the method is to color the output on the screen. Coloured debug statements are far more effective that plain white statements. For example can have green for positive statements and red for error statements.

In cpp there are tons of ways of doing things. I always like the KISS (keep it simple silly) method . The below code seems the most simplest way of doing it.

#include <iostream>
#include <string>


int main()
{
    const std::string red("\033[0;31m");
    const std::string green("\033[1;32m");
    const std::string yellow("\033[1;33m");
    const std::string cyan("\033[0;36m");
    const std::string magenta("\033[0;35m");
    const std::string reset("\033[0m");

    std::cout << yellow << " Hello color yellow " << reset << std::endl;
    std::cout << red << " Hello color red " << reset << std::endl;
    std::cout << green << " Hello color green " << reset << std::endl;
    std::cout << cyan <<" Hello color cyan " << reset << std::endl;
    std::cout << magenta << " Hello color magenta " << reset << std::endl;
     return 0;
}

Obviously it has its drawbacks. I don't know now , but feel free to comment ✏️ of it doesn't work for you .

Hi, I'm Deebul Nair

Deebul Nair — Sat, 24 Jun 2017 23:15:45 +0000

I have been coding for [10] years.

You can find me on GitHub as deebuls

I live in [Bonn, Germany].

I work for [Bonn-Rhine-Sieg University]

I mostly program in these languages: [c++, python].

I am currently learning more about [Bayesian learning, Deep learning and NLP].

Nice to meet you.

DEV Community: Deebul Nair