TIL: Google's Cloud AutoML Vision is live!

#todayilearned #googlecloud #computervision #automl

What I Learned

Today, Dr. Fei-Fei Li [unveiled the Cloud AutoML "Vision" product] from Google (https://www.blog.google/topics/google-cloud/cloud-automl-making-ai-accessible-every-business/), one more piece of their broader mission to "democratize AI".

At a high level, what this allows developers/businesses to do is to leverage the scale of Google's Cloud ML infrastructure & their state-of-the-art Machine Learning knowledge & models -- but bring their own training data. This way, they are effectively teaching the machine a more relevant/targeted set of concepts that can lead to efficiencies and more effective usage of that learning later.

I have yet to dive into the new product, or find out what its full capabilities and limitations are, but I am excited to explore it. I can already imagine so many uses where we have collections of images that have specific (but non-traditional) labels of interest that can now suddenly be tagged and thus become discoverable in various contexts. And that makes all the difference.

And why exactly is this interesting?

This was interesting to me for multiple reasons.

AutoML as Product. I was among many at Google IO17 who heard Sundar Pichai talk about AutoML or "Learning to Learn" as a key component of their push into making machine learning more mainstream. Since then, I saw the Cloud AutoML
page unveiled, but hadn't seen any concrete developer-friendly products or services launch around it. Today made the AutoML vision more tangible (pun-intended).
Custom Training Models for Computer Vision. I had been familiar with the existing Google Cloud Vision API and even explored it a little, with more interest in the landmarks & labels detection side of things for a passion project related to tagging historical images. However, this API used pre-trained models, which were great for generics but not always perfect for recognizing and labeling specific instances - particularly if the dataset was not a publicly visible/sourced one that could be considered a "known entity" to Google. Given that the pre-trained models used Google's own (and let's admit, not insignificant) corpus of images, there was still a lot of useful information to be acquired - but when I applied it to this particular passion project of mine (i.e., "how can I take a repository of scanned historical photos and use computer vision to automatically detect occurrences of the same person, landmark, text or logo across images?"), I soon realized custom training was key.
Google AutoML vs. Clarifai Custom Training While I am undoubtedly a Google technology fan and advocate, I am also a huge fan of Clarifai, a thought-leader in Visual Recognition AI that just happens to be based out of NYC (yay!). I remember Matt Zeiler showing us their Custom Training demo back in Nov 2016 (at the first DevFest NYC). In the demo, you could get an album of photos auto-annotated to recognize the presence of a "dog" -- but you could also tag a specific instance ("Rufus") and custom train the model to recognize/label the presence of this specific dog in those pictures. At the time, I didn't know of another service/API that gave me that capability. In that sense, I think Cloud AutoML Vision is now a direct competitor. And that is interesting because, as they say, a rising tide lifts all boats. And I believe that both researchers and developers will benefit from the new advances/understanding that such competition brings.