This is the third post in my series called "Demystifying Docker". You are going to need an understanding of Docker Images and Containers which I explained in the second post, to understand this one. So if you're new I recommend you first go check them out and then start with this.
Okay so with that out of the way let's begin. In this post, I'm going to talk about how to deal with data in Docker. This includes making your application data persist even after you remove a container and making a two-way connection between your local machine and the docker container. A benefit for the latter could be you not needing to rebuild docker images every time you make a change in your code.
So how exactly can we solve these two very important problems?
And what are Docker Volumes?
Docker volumes are simply folders on your local machine which can
interact with the Container.
That means you can do stuff like store files from your container in the local machine and vice versa.
I'll now be covering these two Docker concepts:
Bind Mounts (+ Anonymous Volumes)
Let's assume you are running an app in a container that converts and stores the articles users type in as a
.txt file. If you're not using Docker Volumes then when you remove this container and create another one from the same image you won't have access to the articles you saved using the previous container. This is simply because when you remove the container all the data it had gets removed too.
But what if you didn't store the data inside the container? What if you stored it "somewhere" on your local machine and each time you run a container you instruct docker to look for that folder and use the articles present there.
This is exactly where
Named Volumes come into play.
Let me now show you how you would actually go ahead and use these named volumes and guess what, their usage is fairly simple :)
This is because when using volumes (not bind mounts), whether they are anonymous or named, you don't have to specify the path on your local machine where Docker should store these articles (in reference to our example). This will be decided and managed by Docker automatically. You simply need to specify two things while starting the container:
The name of the Named Volume. Docker will pick the files stored previously when using this named volume automatically. If it's the first time you are using this named volume then docker will create it for you.
Path inside the container which you want to link. For example, if the articles were getting store in
./data/articlesfolder in your container then this will be the path you need to specify.
And with these two things you're good to go!
So your final command would look something like this:
docker run -v volume-name:/path/in/container IMAGE_ID
volume-name is the name of our named volume and
/path/in/container is the path inside the container which we want to link.
The above-mentioned command will use the named volume called
volume-name (or create it if it doesn't exist) and your container will now have access to all the files already present in this volume (which would have been created by some other container) and will also be able to save files to this container.
So even if you remove a container and hence its file system, the files you wanted to save will remain preserved. And this was it for the concept of Named Volumes.
What if the Named Volumes I talked about above could be mapped to a specific location on your machine. That is, instead of docker managing where on your machine the named volume is storing these files, you get to specify their location. This way you would also be able to edit these files manually from outside the container.
Extending this chain of thought further what if you add your code like this and connect it to the container. Would this not give you the ability to directly see the changes you are making in your code reflect in the running container instance of your app?
Yes, it would.
If it is still not clear what I''m talking about then remember how I said in the last post that the Dockerfile has the instructions to copy your code into the container. Once copied you would not be able to change this code. But if you set up a two-way connection (like we are going to do just now) then your container would be able to pick up the latest code.
This is what we are going to do with Bind Mounts. And we're going to face a slight hiccup in doing so which we'll solve using Anonymous Volumes.
Just like with Named Volumes, Bind Mounts are also set up when you start a container. The only difference is that where you specified the name in the case of Named Volumes, there you specify the exact path on your local machine of the folder you want to establish a link with, in your container.
docker run -v /path/on/our/machine:path/in/container IMAGE_ID
If we set the path on our host machine to the folder containing the source code of our app and the path in the container to where we copied the code in the Dockerfile, then we would have established a link.
Any changes we now make on our local copy of the code will be reflected in the instance of the app running in the container.
Yup this is it. It's this simple!
You may face a slight hiccup which I will try to explain with an example.
Let's say in your Dockerfile you copied the code for your Node.js app and ran
npm install to get all the dependencies in the
node_modules folder. Now if you use bind mounts and if the folder you connect to the container has the code but not the
node_modules folder (because you may not have run
npm install locally) then your containerized version of the app won't work since the entire folder on the container get overridden by your local folder and hence the
node_modules folder too is lost.
A simple solution is to run
npm install locally too so that the dependencies to get copied. But what if we don't want that.
To do that we're going to use anonymous volumes along with a special property of Docker Volumes.
The property is that if two volumes (Bind Mounts included) are specified then the one having the more specific path is considered. This will be more clear when I show you the command.
Coming to Anonymous Volumes they are just like Named Volumes and Bind Mounts except you do not specify a name or a path on your file system. They are also different in the regard that they get removed automatically when the container is removed unlike Named Volumes and Bind Mounts. An Anonymous Volume would look something like this:
docker run -v /path/in/container IMAGE_ID
Now, let's use Anonymous Volumes and the above-mentioned property to solve our initial problem. Let's have a look at the command first and then I'll explain what it does:
docker run -v /path/on/our/machine:path/in/container -v /path/in/container/node_modules IMAGE_ID
Here we used a Bind Mount to establish a connection between the folder having the code on our machine and the location in the container where we copy the code (specified in the Dockerfile). Ideally, this should overwrite the stuff present in the container with our code hence removing the
node_modules folder which was created while building the image.
But since we added an Anonymous Volume with a more specific path, that is, to the
node_modules folder which gets created during image building, Docker will retain this folder and replace the rest with the code on our local machine.
And this covers all you need to know about Docker Volumes (Named as well as Anonymous) and Bind Mounts, in order to get started. The best way to get a deeper understanding of them is to start using all that you learned in your actual projects. I would also implore you to go ahead and read the official documentation for a much more detailed and technical explanation which was beyond the scope of this article. With that, I do hope you learned enough to get a pretty decent idea about Docker Volumes!
Thanks for reading :)
If you have any feedback for me or just want to talk feel free to connect with me on Twitter. I'll be more than happy to help you out! :D