I tested the custom docker container on my PC first and it worked fine:
$ time curl -X POST http://127.0.0.1:32769/image -F imageData=@red-apple.jpg
{"created":"2019-10-01T12:40:10.052750","id":"","iteration":"","predictions":[{"boundingBox":null,"probability":1.5830000847927295e-05,"tagId":"","tagName":"Avocado"},{"boundingBox":null,"probability":2.420000100755715e-06,"tagId":"","tagName":"Banana"},{"boundingBox":null,"probability":0.026290949434041977,"tagId":"","tagName":"Green Apple"},{"boundingBox":null,"probability":2.8750000637955964e-05,"tagId":"","tagName":"Hand"},{"boundingBox":null,"probability":0.00048392999451607466,"tagId":"","tagName":"Orange"},{"boundingBox":null,"probability":0.9731781482696533,"tagId":"","tagName":"Red Apple"}],"project":""}
real 0m0,285s
user 0m0,005s
sys 0m0,008s
Then, I built your project for arm32v7 architecture and pulled the resulting image from my embedded device (I had to use a registry on the docker Hub because I couldn't pull from the local registry running on my PC).
I tried to run the same test on my embedded device running armbian distribution, but it didn't work although the container seems up and running:
root@sbcx:~# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
dave1am/image-classifier-service 1.1.91-arm32v7 804d48001df8 6 days ago 1.05GB
mcr.microsoft.com/azureiotedge-simulated-temperature-sensor 1.0 a626b1a36236 2 months ago 200MB
mcr.microsoft.com/azureiotedge-hub 1.0 3a84bfb86c7d 2 months ago 252MB
mcr.microsoft.com/azureiotedge-agent 1.0 58276103181c 2 months ago 238MB
mcr.microsoft.com/azureiotedge-diagnostics 1.0.8 a480fa622e2a 2 months ago 7.34MB
root@sbcx:~# docker run -P -d 804d48001df8
9f197d878088d97b33f5ef6338bbd5a1eeaa87fd8890a94e05f78614af1ebdc6
root@sbcx:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9f197d878088 804d48001df8 "/usr/bin/entry.sh p…" 34 seconds ago Up 27 seconds 0.0.0.0:32769->80/tcp, 0.0.0.0:32768->5679/tcp sweet_kepler
root@sbcx:/home/armbian/devel/azure-iot-edge/image-classifier# time curl -X POST http://127.0.0.1:32769/image -F imageData=@red-apple.jpg
curl: (52) Empty reply from server
real 0m1.669s
user 0m0.030s
sys 0m0.040s
Hey there, I've not tried armbian. Those Tensorflow messages are just warnings. You can try the arm image I built from glovebox/image-classifier-service:1.1.111-arm32v7 ie docker run -it --rm -p 80:80 glovebox/image-classifier-service:1.1.111-arm32v7. And test with 'curl -X POST xxx.xxx.xxx.xxx/image -F imageData=@image.jpg' My Pi is running Docker version 19.03.3, build a872fc2. Cheers Dave
Hi Dave,
I verified the docker version running on my board:
# docker version
Client: Docker Engine - Community
Version: 19.03.3
API version: 1.40
Go version: go1.12.10
Git commit: a872fc2
Built: Tue Oct 8 01:12:57 2019
OS/Arch: linux/arm
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.3
API version: 1.40 (minimum version 1.12)
Go version: go1.12.10
Git commit: a872fc2
Built: Tue Oct 8 01:06:58 2019
OS/Arch: linux/arm
Experimental: false
containerd:
Version: 1.2.6
GitCommit: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc:
Version: 1.0.0-rc8
GitCommit: 425e105d5a03fabd737a126ad93d62a9eeede87f
docker-init:
Version: 0.18.0
GitCommit: fec3683
Unfortunately, the outcome is the same even with your image:
# curl -X POST 127.0.0.1/image -F imageData=@red-apple.jpg
curl: (52) Empty reply from server
root@sbcx:/home/armbian/devel/azure-iot-edge/image-classifier# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
71f9b808440d glovebox/image-classifier-service:1.1.111-arm32v7 "/usr/bin/entry.sh p…" 3 minutes ago Up 3 minutes 0.0.0.0:80->80/tcp, 5679/tcp admiring_mccarthy
root@sbcx:/home/armbian/devel/azure-iot-edge/image-classifier# curl -X POST 127.0.0.1/image -F imageData=@red-apple.jpg
curl: (7) Failed to connect to 127.0.0.1 port 80: Connection refused
root@sbcx:/home/armbian/devel/azure-iot-edge/image-classifier# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
I'm afraid I have to debug at a lower level to understand what's going on (that is quite common for embedded devices ...).
I'm not an expert of Azure-based development approach, so I don't know what is the best thing to do in such a situation.
If there are no better ideas, I'm thinking of:
writing a simple Python application to exercise the model by following this tutorial
remote debugging it as described in this article you wrote.
Hey there, I'm pretty sure that the contents of the container are fine, and they are isolated too. Do you have a Raspberry Pi you can test against? There is nothing to stop you running the contents of the docker project that is exported by Custom Vision directly on the device (ie outside of a container). dg
also try curl to localhost curl -X POST localhost/image -F imageData=@red-apple.jpg or by hostname curl -X POST mydevice.local/image -F imageData=@red-apple.jpg. I've seen issues where name resolution doesnt always work as you'd expect...
Hi Dave,
unfortunately, neither localhost nor mydevice.local worked :(
So I tried the other approach that doesn't make use of any container.
For convenience, I first tried to make it work on my development PC. I followed this tutorial, but it didn't work either :(
Apart from several warning messages, the simple Python program I wrote crashes because of this error:
2019-10-17 09:53:43.957158: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3092910000 Hz
2019-10-17 09:53:43.957622: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1da3970 executing computations on platform Host. Devices:
2019-10-17 09:53:43.957668: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
Traceback (most recent call last):
File "/home/sysadmin/.vscode/extensions/ms-python.python-2019.10.41019/pythonFiles/ptvsd_launcher.py", line 43, in <module>
main(ptvsdArgs)
File "/home/sysadmin/.vscode/extensions/ms-python.python-2019.10.41019/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 432, in main
run()
File "/home/sysadmin/.vscode/extensions/ms-python.python-2019.10.41019/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 316, in run_file
runpy.run_path(target, run_name='__main__')
File "/usr/lib/python3.6/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/usr/lib/python3.6/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/sysadmin/devel/azure/custom-vision/glover-image-classifier/image-classifier.py", line 143, in <module>
main()
File "/home/sysadmin/devel/azure/custom-vision/glover-image-classifier/image-classifier.py", line 138, in main
predict_image()
File "/home/sysadmin/devel/azure/custom-vision/glover-image-classifier/image-classifier.py", line 115, in predict_image
predictions, = sess.run(prob_tensor, {input_node: [augmented_image] })
File "/home/sysadmin/devel/azure/custom-vision/glover-image-classifier/glover-image-classifier-venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/home/sysadmin/devel/azure/custom-vision/glover-image-classifier/glover-image-classifier-venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1149, in _run
str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1,) for Tensor 'Placeholder:0', which has shape '(?, 224, 224, 3)'
Terminated
I'll try to figure out what's going on, but I don't think I'll be able to solve it quickly, as I'm not an Tensorflow expert ...
That being said, as far as I know, I can't exclude that the docker version of the classifier doesn't work on my embedded device for the same problem ...
Hi Dave
I also tried Armbian Stretch (Debian 9), but nothing changed. I got an Illegal Instruction error as well.
Then I managed to get an RPi 3. I set it up by following this tutorial. On this platform, my simple test program runs correctly:
pi@raspberrypi:~/devel/glover-image-classifier-0.1.0 $ python3 image-classifier.py
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py:98: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py:98: The name tf.AttrValue is deprecated. Please use tf.compat.v1.AttrValue instead.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py:98: The name tf.COMPILER_VERSION is deprecated. Please use tf.version.COMPILER_VERSION instead.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py:98: The name tf.CXX11_ABI_FLAG is deprecated. Please use tf.sysconfig.CXX11_ABI_FLAG instead.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py:98: The name tf.ConditionalAccumulator is deprecated. Please use tf.compat.v1.ConditionalAccumulator instead.
2019-10-22 15:42:53,478 - DEBUG - Starting ...
2019-10-22 15:42:53,479 - DEBUG - Importing the TF graph ...
Classified as: Red Apple
2019-10-22 15:42:58,061 - DEBUG - Prediction time = 1.8572380542755127 s
Avocado 2.246000076411292e-05
Banana 3.769999921132694e-06
Green Apple 0.029635459184646606
Hand 4.4839998736279085e-05
Orange 0.0009084499906748533
Red Apple 0.9693851470947266
2019-10-22 15:42:58,067 - DEBUG - Exiting ...
I used mounted the same raspbian root file system used with RPi from my embedded platform and I got an Illegal Instruction error again.
So it seems there is a structural incompatibility between one of the software layers (maybe TensorFlow) and my platform, which is based on NXP i.MX6Q.
Hey, I had a brief look at armbian and I spotted that it was on a fairly old kernel release - 3.x from memory. I think Stretch on RPi was on 4.3 or something similar. I did wonder if that was where the issue is. There is nothing to stop you from retargeting the Custom Vision model Docker image to different base a image... I think you said you got the CV/Tensorflow running directly on Armbian so that might be a good starting point...
Actually, I used only the armbian root file system.
Regarding the Linux kernel, I used the one that belongs to the latest official BSP of our platform. It is based on release 4.9.11.
Anyway, I agree with you, in the sense that I can't exclude that the root cause is somehow related to the kernel.
Hi Dave,
finally, I managed to solve the problem.
The root cause is related to how the Tensor Flow packages I used were built. Because of the compiler's flags, these packages make use of instructions that are not supported by the i.MX6Q SoC.
So I rebuilt TF with the proper flags ... et voilà:
$ python3 image-classifier.py
2019-10-25 11:17:15,288 - DEBUG - Starting ...
2019-10-25 11:17:15,289 - DEBUG - Importing the TF graph ...
Classified as: Red Apple
2019-10-25 11:17:21,591 - DEBUG - Prediction time = 2.567471504211426 s
Avocado 2.246000076411292e-05
Banana 3.769999921132694e-06
Green Apple 0.029635440558195114
Hand 4.4839998736279085e-05
Orange 0.0009084499906748533
Red Apple 0.9693851470947266
2019-10-25 11:17:21,594 - DEBUG - Exiting ...
Hi Dave
I tested the custom docker container on my PC first and it worked fine:
Then, I built your project for
arm32v7
architecture and pulled the resulting image from my embedded device (I had to use a registry on the docker Hub because I couldn't pull from the local registry running on my PC).I tried to run the same test on my embedded device running armbian distribution, but it didn't work although the container seems up and running:
Any advice on how I could analyze this issue?
I just noticed that, after running this test on the embedded device, the container stops and the following warning message appears in its log:
Hey there, I've not tried armbian. Those Tensorflow messages are just warnings. You can try the arm image I built from glovebox/image-classifier-service:1.1.111-arm32v7 ie docker run -it --rm -p 80:80 glovebox/image-classifier-service:1.1.111-arm32v7. And test with 'curl -X POST xxx.xxx.xxx.xxx/image -F imageData=@image.jpg' My Pi is running Docker version 19.03.3, build a872fc2. Cheers Dave
Hi Dave,
I verified the docker version running on my board:
Unfortunately, the outcome is the same even with your image:
I'm afraid I have to debug at a lower level to understand what's going on (that is quite common for embedded devices ...).
I'm not an expert of Azure-based development approach, so I don't know what is the best thing to do in such a situation.
If there are no better ideas, I'm thinking of:
Hey there, I'm pretty sure that the contents of the container are fine, and they are isolated too. Do you have a Raspberry Pi you can test against? There is nothing to stop you running the contents of the docker project that is exported by Custom Vision directly on the device (ie outside of a container). dg
also try curl to localhost curl -X POST localhost/image -F imageData=@red-apple.jpg or by hostname curl -X POST mydevice.local/image -F imageData=@red-apple.jpg. I've seen issues where name resolution doesnt always work as you'd expect...
Hi Dave,
unfortunately, neither localhost nor mydevice.local worked :(
So I tried the other approach that doesn't make use of any container.
For convenience, I first tried to make it work on my development PC. I followed this tutorial, but it didn't work either :(
Apart from several warning messages, the simple Python program I wrote crashes because of this error:
I'll try to figure out what's going on, but I don't think I'll be able to solve it quickly, as I'm not an Tensorflow expert ...
That being said, as far as I know, I can't exclude that the docker version of the classifier doesn't work on my embedded device for the same problem ...
I had a stupid bug in my code.
I fixed it and now everything works fine. I'm gonna run it on my embedded device.
Yah awesome!
Hi Dave,
installing tensorflow and all its dependencies wasn't easy on armbian at all!
I tried several TF/Python combinations, but none of them worked :(
This table lists the combinations I tried and the reason why they fail.
I think that the Illegal instruction problem might explain why your container doesn't work either on this device.
By the way, does your container make use of Python 2.x o 3.x?
In the meantime, I think I'm gonna try a different distro.
Hi Dave
I also tried Armbian Stretch (Debian 9), but nothing changed. I got an Illegal Instruction error as well.
Then I managed to get an RPi 3. I set it up by following this tutorial. On this platform, my simple test program runs correctly:
I used mounted the same raspbian root file system used with RPi from my embedded platform and I got an Illegal Instruction error again.
So it seems there is a structural incompatibility between one of the software layers (maybe TensorFlow) and my platform, which is based on NXP i.MX6Q.
Hey, I had a brief look at armbian and I spotted that it was on a fairly old kernel release - 3.x from memory. I think Stretch on RPi was on 4.3 or something similar. I did wonder if that was where the issue is. There is nothing to stop you from retargeting the Custom Vision model Docker image to different base a image... I think you said you got the CV/Tensorflow running directly on Armbian so that might be a good starting point...
Actually, I used only the armbian root file system.
Regarding the Linux kernel, I used the one that belongs to the latest official BSP of our platform. It is based on release 4.9.11.
Anyway, I agree with you, in the sense that I can't exclude that the root cause is somehow related to the kernel.
Hi Dave,
finally, I managed to solve the problem.
The root cause is related to how the Tensor Flow packages I used were built. Because of the compiler's flags, these packages make use of instructions that are not supported by the i.MX6Q SoC.
So I rebuilt TF with the proper flags ... et voilà:
Woohoo, well done!