Adventure with Docker: Conflicts with UIDs of the container and the host

#docker #glusterfs #elasticsearch #swarm

There are so days when you deal with problems you would never have expected.
We had the goal to operate an ElasticSearch Cluster in our Docker-Swarm. After some time, a finished image was found which makes this halfway possible (problematic is the discovery of other nodes, just as a hint).

To ensure that all containers in the ElasticSearch Clusters have all the data on the different Docker Swarm Nodes, we use GlusterFS as a distributed file system.

The whole construct was running very well, until we have noticed: Hey NTP is not running on our servers. Ok quickly thrown into the Ansible Playbook and run on the hosts. BAM! The ElasticSearch reports: I do not like you any more dude…

Now, of course, you wonder what happened here? A look at the hosts showed that now all the files of ElasticSearch belong to the user “systemd-timesync” .. eh? The first guess was, of course, the somewhat worn GlusterFS.

The actual error is in principle not an error. If you add a host volume to a container, all the files of the container are created with the user running inside the container. Normally many containers run with the user root, which causes no problems, but ElasticSearch after version 5.x does not run as root…

Now comes the chance in the game, the Image for the ElasticSearch is based on Alpine, here the users begin with the UID 100, unfortunately exactly the UID which has now used by our little friend the “systemd-timesync” user.

Unfortunately, a really good solution was not available for us, but as a Workaround, the Dockerfile was changed, and we assigned the ElasticSearch user a UID 1200+. Now the files are created with this UID.

If you find a more clever solution, I’m looking forward to a comment, and if you prefer this text in german, you can find it at the Geek Pub.

Top comments (2)

Christopher McClellan • Jul 22 '17

Funny. I recently ran into an issue where someone ran chown -R 1000:999. Took a while to hunt down. One of our servers had a user and group with those ids. The other, well... did not. It had the same user and group names though. Chowned that bad boy by name and the permission issues just disappeared. Why anyone thought that it was a good idea to use a numeric id is beyond me. I'll second the use of the -u userName flag though. It's a lifesaver.

Stefan Gangefors • Jul 17 '17 • Edited

I usually never put new users into the dockerfile, instead I try to solve it by starting the container using the "--user/--group" options and chown everything in the exposed volumes in a startup script. Using this approach you can use any ids you want (just make sure that containers with shared data use the same ids).
But when using published images that you have no control over it's not always possible.

Note:
It seems that the latest official ES image uses udi/gid 1000/1000 now.
elastic.co/guide/en/elasticsearch/...