Discussion on: Creating an image recognition solution with Azure IoT Edge and Azure Cognitive Services

View post

Replies for: hey from Bash I just did 'time curl ....' and just used the curl example in the readme from the downloaded custom vision docker container

Hi Dave

I tested the custom docker container on my PC first and it worked fine:

$ time curl -X POST http://127.0.0.1:32769/image -F imageData=@red-apple.jpg
{"created":"2019-10-01T12:40:10.052750","id":"","iteration":"","predictions":[{"boundingBox":null,"probability":1.5830000847927295e-05,"tagId":"","tagName":"Avocado"},{"boundingBox":null,"probability":2.420000100755715e-06,"tagId":"","tagName":"Banana"},{"boundingBox":null,"probability":0.026290949434041977,"tagId":"","tagName":"Green Apple"},{"boundingBox":null,"probability":2.8750000637955964e-05,"tagId":"","tagName":"Hand"},{"boundingBox":null,"probability":0.00048392999451607466,"tagId":"","tagName":"Orange"},{"boundingBox":null,"probability":0.9731781482696533,"tagId":"","tagName":"Red Apple"}],"project":""}

real    0m0,285s
user    0m0,005s
sys     0m0,008s

Then, I built your project for arm32v7 architecture and pulled the resulting image from my embedded device (I had to use a registry on the docker Hub because I couldn't pull from the local registry running on my PC).
I tried to run the same test on my embedded device running armbian distribution, but it didn't work although the container seems up and running:

root@sbcx:~# docker images
REPOSITORY                                                    TAG                 IMAGE ID            CREATED             SIZE
dave1am/image-classifier-service                              1.1.91-arm32v7      804d48001df8        6 days ago          1.05GB
mcr.microsoft.com/azureiotedge-simulated-temperature-sensor   1.0                 a626b1a36236        2 months ago        200MB
mcr.microsoft.com/azureiotedge-hub                            1.0                 3a84bfb86c7d        2 months ago        252MB
mcr.microsoft.com/azureiotedge-agent                          1.0                 58276103181c        2 months ago        238MB
mcr.microsoft.com/azureiotedge-diagnostics                    1.0.8               a480fa622e2a        2 months ago        7.34MB
root@sbcx:~# docker run -P -d 804d48001df8
9f197d878088d97b33f5ef6338bbd5a1eeaa87fd8890a94e05f78614af1ebdc6
root@sbcx:~# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                            NAMES
9f197d878088        804d48001df8        "/usr/bin/entry.sh p…"   34 seconds ago      Up 27 seconds       0.0.0.0:32769->80/tcp, 0.0.0.0:32768->5679/tcp   sweet_kepler
root@sbcx:/home/armbian/devel/azure-iot-edge/image-classifier# time curl -X POST http://127.0.0.1:32769/image -F imageData=@red-apple.jpg
curl: (52) Empty reply from server                                                                                                   

real    0m1.669s                                                                                                                                 
user    0m0.030s
sys     0m0.040s

Any advice on how I could analyze this issue?

Andrea Marson • Oct 8 '19

I just noticed that, after running this test on the embedded device, the container stops and the following warning message appears in its log:

# docker logs --details -f 9f197d878088
 Loading model... * Serving Flask app "app" (lazy loading)
...
WARNING:tensorflow:From /app/predict.py:123: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

Dave Glover Microsoft Azure • Oct 9 '19

Hey there, I've not tried armbian. Those Tensorflow messages are just warnings. You can try the arm image I built from glovebox/image-classifier-service:1.1.111-arm32v7 ie docker run -it --rm -p 80:80 glovebox/image-classifier-service:1.1.111-arm32v7. And test with 'curl -X POST xxx.xxx.xxx.xxx/image -F imageData=@image.jpg' My Pi is running Docker version 19.03.3, build a872fc2. Cheers Dave

Andrea Marson • Oct 9 '19

Hi Dave,
I verified the docker version running on my board:

# docker version
Client: Docker Engine - Community
 Version:           19.03.3
 API version:       1.40
 Go version:        go1.12.10
 Git commit:        a872fc2
 Built:             Tue Oct  8 01:12:57 2019
 OS/Arch:           linux/arm
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.3
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.10
  Git commit:       a872fc2
  Built:            Tue Oct  8 01:06:58 2019
  OS/Arch:          linux/arm
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Unfortunately, the outcome is the same even with your image:

# curl -X POST 127.0.0.1/image -F imageData=@red-apple.jpg
curl: (52) Empty reply from server
root@sbcx:/home/armbian/devel/azure-iot-edge/image-classifier# docker ps
CONTAINER ID        IMAGE                                               COMMAND                  CREATED             STATUS              PORTS                          NAMES
71f9b808440d        glovebox/image-classifier-service:1.1.111-arm32v7   "/usr/bin/entry.sh p…"   3 minutes ago       Up 3 minutes        0.0.0.0:80->80/tcp, 5679/tcp   admiring_mccarthy
root@sbcx:/home/armbian/devel/azure-iot-edge/image-classifier# curl -X POST 127.0.0.1/image -F imageData=@red-apple.jpg
curl: (7) Failed to connect to 127.0.0.1 port 80: Connection refused
root@sbcx:/home/armbian/devel/azure-iot-edge/image-classifier# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

I'm afraid I have to debug at a lower level to understand what's going on (that is quite common for embedded devices ...).
I'm not an expert of Azure-based development approach, so I don't know what is the best thing to do in such a situation.

If there are no better ideas, I'm thinking of:

writing a simple Python application to exercise the model by following this tutorial
remote debugging it as described in this article you wrote.

Dave Glover Microsoft Azure • Oct 10 '19

Hey there, I'm pretty sure that the contents of the container are fine, and they are isolated too. Do you have a Raspberry Pi you can test against? There is nothing to stop you running the contents of the docker project that is exported by Custom Vision directly on the device (ie outside of a container). dg

Dave Glover Microsoft Azure • Oct 10 '19 • Edited

also try curl to localhost curl -X POST localhost/image -F imageData=@red-apple.jpg or by hostname curl -X POST mydevice.local/image -F imageData=@red-apple.jpg. I've seen issues where name resolution doesnt always work as you'd expect...

Andrea Marson • Oct 17 '19 • Edited

Hi Dave,
unfortunately, neither localhost nor mydevice.local worked :(

So I tried the other approach that doesn't make use of any container.
For convenience, I first tried to make it work on my development PC. I followed this tutorial, but it didn't work either :(

Apart from several warning messages, the simple Python program I wrote crashes because of this error:

2019-10-17 09:53:43.957158: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3092910000 Hz
2019-10-17 09:53:43.957622: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1da3970 executing computations on platform Host. Devices:
2019-10-17 09:53:43.957668: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
Traceback (most recent call last):
  File "/home/sysadmin/.vscode/extensions/ms-python.python-2019.10.41019/pythonFiles/ptvsd_launcher.py", line 43, in <module>
    main(ptvsdArgs)
  File "/home/sysadmin/.vscode/extensions/ms-python.python-2019.10.41019/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 432, in main
    run()
  File "/home/sysadmin/.vscode/extensions/ms-python.python-2019.10.41019/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 316, in run_file
    runpy.run_path(target, run_name='__main__')
  File "/usr/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/usr/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/sysadmin/devel/azure/custom-vision/glover-image-classifier/image-classifier.py", line 143, in <module>
    main()
  File "/home/sysadmin/devel/azure/custom-vision/glover-image-classifier/image-classifier.py", line 138, in main
    predict_image()
  File "/home/sysadmin/devel/azure/custom-vision/glover-image-classifier/image-classifier.py", line 115, in predict_image
    predictions, = sess.run(prob_tensor, {input_node: [augmented_image] })
  File "/home/sysadmin/devel/azure/custom-vision/glover-image-classifier/glover-image-classifier-venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/sysadmin/devel/azure/custom-vision/glover-image-classifier/glover-image-classifier-venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1149, in _run
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1,) for Tensor 'Placeholder:0', which has shape '(?, 224, 224, 3)'
Terminated

I'll try to figure out what's going on, but I don't think I'll be able to solve it quickly, as I'm not an Tensorflow expert ...
That being said, as far as I know, I can't exclude that the docker version of the classifier doesn't work on my embedded device for the same problem ...

Andrea Marson • Oct 17 '19

I had a stupid bug in my code.
I fixed it and now everything works fine. I'm gonna run it on my embedded device.

Dave Glover Microsoft Azure • Oct 19 '19

Yah awesome!

Andrea Marson • Oct 21 '19 • Edited

Hi Dave,

installing tensorflow and all its dependencies wasn't easy on armbian at all!

I tried several TF/Python combinations, but none of them worked :(
This table lists the combinations I tried and the reason why they fail.

I think that the Illegal instruction problem might explain why your container doesn't work either on this device.

By the way, does your container make use of Python 2.x o 3.x?

In the meantime, I think I'm gonna try a different distro.

Andrea Marson • Oct 23 '19 • Edited

Hi Dave
I also tried Armbian Stretch (Debian 9), but nothing changed. I got an Illegal Instruction error as well.

Then I managed to get an RPi 3. I set it up by following this tutorial. On this platform, my simple test program runs correctly:

pi@raspberrypi:~/devel/glover-image-classifier-0.1.0 $ python3 image-classifier.py             
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py:98: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py:98: The name tf.AttrValue is deprecated. Please use tf.compat.v1.AttrValue instead.

WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py:98: The name tf.COMPILER_VERSION is deprecated. Please use tf.version.COMPILER_VERSION instead.

WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py:98: The name tf.CXX11_ABI_FLAG is deprecated. Please use tf.sysconfig.CXX11_ABI_FLAG instead.

WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py:98: The name tf.ConditionalAccumulator is deprecated. Please use tf.compat.v1.ConditionalAccumulator instead.

2019-10-22 15:42:53,478 - DEBUG - Starting ...
2019-10-22 15:42:53,479 - DEBUG - Importing the TF graph ...
Classified as: Red Apple
2019-10-22 15:42:58,061 - DEBUG - Prediction time = 1.8572380542755127 s
Avocado 2.246000076411292e-05
Banana 3.769999921132694e-06
Green Apple 0.029635459184646606
Hand 4.4839998736279085e-05
Orange 0.0009084499906748533
Red Apple 0.9693851470947266
2019-10-22 15:42:58,067 - DEBUG - Exiting ...

I used mounted the same raspbian root file system used with RPi from my embedded platform and I got an Illegal Instruction error again.
So it seems there is a structural incompatibility between one of the software layers (maybe TensorFlow) and my platform, which is based on NXP i.MX6Q.

Dave Glover Microsoft Azure • Oct 23 '19

Hey, I had a brief look at armbian and I spotted that it was on a fairly old kernel release - 3.x from memory. I think Stretch on RPi was on 4.3 or something similar. I did wonder if that was where the issue is. There is nothing to stop you from retargeting the Custom Vision model Docker image to different base a image... I think you said you got the CV/Tensorflow running directly on Armbian so that might be a good starting point...

Andrea Marson • Oct 23 '19 • Edited

Actually, I used only the armbian root file system.
Regarding the Linux kernel, I used the one that belongs to the latest official BSP of our platform. It is based on release 4.9.11.
Anyway, I agree with you, in the sense that I can't exclude that the root cause is somehow related to the kernel.

Andrea Marson • Oct 25 '19

Hi Dave,
finally, I managed to solve the problem.
The root cause is related to how the Tensor Flow packages I used were built. Because of the compiler's flags, these packages make use of instructions that are not supported by the i.MX6Q SoC.

So I rebuilt TF with the proper flags ... et voilà:

$ python3 image-classifier.py 
2019-10-25 11:17:15,288 - DEBUG - Starting ...
2019-10-25 11:17:15,289 - DEBUG - Importing the TF graph ...
Classified as: Red Apple
2019-10-25 11:17:21,591 - DEBUG - Prediction time = 2.567471504211426 s
Avocado 2.246000076411292e-05
Banana 3.769999921132694e-06
Green Apple 0.029635440558195114
Hand 4.4839998736279085e-05
Orange 0.0009084499906748533
Red Apple 0.9693851470947266
2019-10-25 11:17:21,594 - DEBUG - Exiting ...

Dave Glover Microsoft Azure • Oct 27 '19

Woohoo, well done!