(Note: these posts are migrated from my previous medium.com blog)
I first learned about Google’s Cloud Vision API at this year’s Google I/O. Though it’s been out in beta since 2015, I had not heard of it, nor had the chance to try it out till today. I came across this blog post and was intrigued by the YouTube demo:
As always, I have an Intel Edison lying around so I decided to give it a try.
Before you begin:
Make sure your Edison has been updated to the latest firmware and has Wi-Fi setup, use the setup/configuration tool found here to do so.
You will also need a Google Cloud account with the Vision API enabled. Follow these instructions here to do so before proceeding.
Things you’ll need:
Intel Edison w/ Arduino Breakout Board (You could also use the mini breakout but you might need a USB adapter to connect a webcam
Logitech C270 Webcam (Any other USB webcam supported by Linux UVC drivers would work too)
Power Supply
Here’s how it’s all connected:
Let’s go!
For the USB webcam to work, make sure UVC drivers are installed and enabled; you can find instructions here on how to do that.
-
Install ffmpeg. Git clone the edi-cam repository and run the shell script to install ffmpeg:
root@edison:~# cd /edi-cam/bin root@edison:~# ./install_ffmpeg.sh
-
Install gcloud. This is the Google Cloud NodeJS module that allows you to easily use Google Cloud APIs.
root@edison:~# npm install gcloud
Copy over your service account key JSON created during setup (scp/sftp). You can create a new one here if you’ve lost it.
Run the code! Copy & Paste this snippet into VIM or transfer the file over:
-
var childProcess = require('child_process'); | |
//take a snapshot using the webcam with ffmpeg | |
childProcess.exec('/home/root/bin/ffmpeg/ffmpeg -loglevel panic -y -s 320x240 -f video4linux2 -i /dev/video0 -vframes 1 ./capture.jpeg'); | |
/*********************************************************** | |
ffmpeg flags: | |
* -loglevel - set the logging level used by the library | |
* -y - overwrite output files without asking | |
* -s - set frame size WxH | |
* -f - force format | |
* -i - input filename | |
* -vframes - set the number of video frames to output | |
* ./capture.jpeg - output name | |
***********************************************************/ | |
var gcloud = require('gcloud')({ | |
projectId: 'YOUR_PROJECT_ID', | |
keyFilename: 'YOUR_KEY_LOCATION' | |
}); | |
var vision = gcloud.vision(); | |
vision.detectLabels('./capture.jpeg', { verbose : true }, function(err, labels){ | |
if(err){ | |
console.log(err); | |
}else{ | |
console.log(labels); | |
} | |
}); |
root@edison:~# node capture.js
Results
Here’s the image that was captured by my webcam:
And here’s the returned JSON:
root@edison:~# node capture.js
[ { desc: ‘cartoon’, mid: ‘/m/0215n’, score: 85.945672 },
{ desc: ‘machine’, mid: ‘/m/0dkw5’, score: 74.98506900000001 },
{ desc: ‘robot’, mid: ‘/m/06fgw’, score: 69.911 },
{ desc: ‘gadget’, mid: ‘/m/02mf1n’, score: 67.246151 } ]
…I thought that was pretty cool :)
The Google Cloud Vision API actually has a lot of other powerful features, including analyzing emotional facial attributes, text extraction & detection, and detecting any [faces, landmarks, labels, logos, properties] in your images.
Vision capabilities perfectly complement robotic applications (e.g. a drone that tazes you if you’re not smiling, a spray paint bot that corrects graffiti grammar, etc.). I can’t wait to see what kind of cool things people will make with this!
Top comments (0)