Discussion on: Creating an image recognition solution with Azure IoT Edge and Azure Cognitive Services

View post

Replies for: Hey there, I'm pretty sure that the contents of the container are fine, and they are isolated too. Do you have a Raspberry Pi you can test against?...

Dave Glover Microsoft Azure • Oct 10 '19 • Edited

also try curl to localhost curl -X POST localhost/image -F imageData=@red-apple.jpg or by hostname curl -X POST mydevice.local/image -F imageData=@red-apple.jpg. I've seen issues where name resolution doesnt always work as you'd expect...

Andrea Marson • Oct 17 '19 • Edited

Hi Dave,
unfortunately, neither localhost nor mydevice.local worked :(

So I tried the other approach that doesn't make use of any container.
For convenience, I first tried to make it work on my development PC. I followed this tutorial, but it didn't work either :(

Apart from several warning messages, the simple Python program I wrote crashes because of this error:

2019-10-17 09:53:43.957158: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3092910000 Hz
2019-10-17 09:53:43.957622: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1da3970 executing computations on platform Host. Devices:
2019-10-17 09:53:43.957668: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
Traceback (most recent call last):
  File "/home/sysadmin/.vscode/extensions/ms-python.python-2019.10.41019/pythonFiles/ptvsd_launcher.py", line 43, in <module>
    main(ptvsdArgs)
  File "/home/sysadmin/.vscode/extensions/ms-python.python-2019.10.41019/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 432, in main
    run()
  File "/home/sysadmin/.vscode/extensions/ms-python.python-2019.10.41019/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 316, in run_file
    runpy.run_path(target, run_name='__main__')
  File "/usr/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/usr/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/sysadmin/devel/azure/custom-vision/glover-image-classifier/image-classifier.py", line 143, in <module>
    main()
  File "/home/sysadmin/devel/azure/custom-vision/glover-image-classifier/image-classifier.py", line 138, in main
    predict_image()
  File "/home/sysadmin/devel/azure/custom-vision/glover-image-classifier/image-classifier.py", line 115, in predict_image
    predictions, = sess.run(prob_tensor, {input_node: [augmented_image] })
  File "/home/sysadmin/devel/azure/custom-vision/glover-image-classifier/glover-image-classifier-venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/sysadmin/devel/azure/custom-vision/glover-image-classifier/glover-image-classifier-venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1149, in _run
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1,) for Tensor 'Placeholder:0', which has shape '(?, 224, 224, 3)'
Terminated

I'll try to figure out what's going on, but I don't think I'll be able to solve it quickly, as I'm not an Tensorflow expert ...
That being said, as far as I know, I can't exclude that the docker version of the classifier doesn't work on my embedded device for the same problem ...

Andrea Marson • Oct 17 '19

I had a stupid bug in my code.
I fixed it and now everything works fine. I'm gonna run it on my embedded device.

Dave Glover Microsoft Azure • Oct 19 '19

Yah awesome!

Andrea Marson • Oct 21 '19 • Edited

Hi Dave,

installing tensorflow and all its dependencies wasn't easy on armbian at all!

I tried several TF/Python combinations, but none of them worked :(
This table lists the combinations I tried and the reason why they fail.

I think that the Illegal instruction problem might explain why your container doesn't work either on this device.

By the way, does your container make use of Python 2.x o 3.x?

In the meantime, I think I'm gonna try a different distro.

Andrea Marson • Oct 23 '19 • Edited

Hi Dave
I also tried Armbian Stretch (Debian 9), but nothing changed. I got an Illegal Instruction error as well.

Then I managed to get an RPi 3. I set it up by following this tutorial. On this platform, my simple test program runs correctly:

pi@raspberrypi:~/devel/glover-image-classifier-0.1.0 $ python3 image-classifier.py             
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py:98: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py:98: The name tf.AttrValue is deprecated. Please use tf.compat.v1.AttrValue instead.

WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py:98: The name tf.COMPILER_VERSION is deprecated. Please use tf.version.COMPILER_VERSION instead.

WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py:98: The name tf.CXX11_ABI_FLAG is deprecated. Please use tf.sysconfig.CXX11_ABI_FLAG instead.

WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py:98: The name tf.ConditionalAccumulator is deprecated. Please use tf.compat.v1.ConditionalAccumulator instead.

2019-10-22 15:42:53,478 - DEBUG - Starting ...
2019-10-22 15:42:53,479 - DEBUG - Importing the TF graph ...
Classified as: Red Apple
2019-10-22 15:42:58,061 - DEBUG - Prediction time = 1.8572380542755127 s
Avocado 2.246000076411292e-05
Banana 3.769999921132694e-06
Green Apple 0.029635459184646606
Hand 4.4839998736279085e-05
Orange 0.0009084499906748533
Red Apple 0.9693851470947266
2019-10-22 15:42:58,067 - DEBUG - Exiting ...

I used mounted the same raspbian root file system used with RPi from my embedded platform and I got an Illegal Instruction error again.
So it seems there is a structural incompatibility between one of the software layers (maybe TensorFlow) and my platform, which is based on NXP i.MX6Q.

Dave Glover Microsoft Azure • Oct 23 '19

Hey, I had a brief look at armbian and I spotted that it was on a fairly old kernel release - 3.x from memory. I think Stretch on RPi was on 4.3 or something similar. I did wonder if that was where the issue is. There is nothing to stop you from retargeting the Custom Vision model Docker image to different base a image... I think you said you got the CV/Tensorflow running directly on Armbian so that might be a good starting point...

Andrea Marson • Oct 23 '19 • Edited

Actually, I used only the armbian root file system.
Regarding the Linux kernel, I used the one that belongs to the latest official BSP of our platform. It is based on release 4.9.11.
Anyway, I agree with you, in the sense that I can't exclude that the root cause is somehow related to the kernel.

Andrea Marson • Oct 25 '19

Hi Dave,
finally, I managed to solve the problem.
The root cause is related to how the Tensor Flow packages I used were built. Because of the compiler's flags, these packages make use of instructions that are not supported by the i.MX6Q SoC.

So I rebuilt TF with the proper flags ... et voilà:

$ python3 image-classifier.py 
2019-10-25 11:17:15,288 - DEBUG - Starting ...
2019-10-25 11:17:15,289 - DEBUG - Importing the TF graph ...
Classified as: Red Apple
2019-10-25 11:17:21,591 - DEBUG - Prediction time = 2.567471504211426 s
Avocado 2.246000076411292e-05
Banana 3.769999921132694e-06
Green Apple 0.029635440558195114
Hand 4.4839998736279085e-05
Orange 0.0009084499906748533
Red Apple 0.9693851470947266
2019-10-25 11:17:21,594 - DEBUG - Exiting ...

Dave Glover Microsoft Azure • Oct 27 '19

Woohoo, well done!