One of the most challenging aspects of working with Docker has been figuring out data persistence. Containers are ephemeral by nature, but most real-world applications need to store data. In this post, I'll share what I've learned about managing persistent data with Docker.
The Ephemeral Nature of Containers
First, let's understand the problem. Docker containers have a virtual file system that resets when a container is removed. Here's what happens:
# Start a container and create a file
docker run -it --name temp ubuntu bash
# (Inside container) touch /test.txt
# (Inside container) exit
# Start the container again - file still exists
docker start -i temp
# (Inside container) ls /test.txt
# Output: /test.txt
# Now remove and recreate the container
docker rm temp
docker run -it --name temp ubuntu bash
# (Inside container) ls /test.txt
# Output: ls: cannot access '/test.txt': No such file or directory
When the container is removed, all data inside is lost. This is a big problem for databases, user uploads, or any stateful application.
Docker Volumes
Docker volumes are the solution to this problem. They're specially designed locations outside of the container's filesystem where data can persist.
Creating and Managing Volumes
# Create a volume
docker volume create my-data
# List volumes
docker volume ls
# Inspect a volume
docker volume inspect my-data
# Remove a volume
docker volume rm my-data
# Remove all unused volumes
docker volume prune
Volumes are stored in a location managed by Docker, typically /var/lib/docker/volumes/ on Linux systems.
Using Volumes with Containers
# Run a container with a volume
docker run -v my-data:/app/data nginx
# Run a container with an anonymous volume
docker run -v /app/data nginx
In the first example, my-data is the volume name, and /app/data is the mount point inside the container. Any data written to /app/data will persist in the my-data volume.
Types of Docker Storage
There are three main ways to persist data with Docker:
1. Named Volumes
docker run -v my-logs:/var/lib/mysql/data mysql
How it works: Docker creates and manages this volume. The data is stored in /var/lib/docker/volumes/my-logs/_data on the host, but you typically don't need to access it directly.
Use cases:
- Production databases
- Application data that needs to persist
- Data that needs to be shared between containers
Advantages:
- Managed by Docker
- Easy to back up
- Can be shared between containers
- Works across platforms (Windows, Mac, Linux)
2. Bind Mounts
docker run -v /home/host/data:/var/lib/mysql/data mysql
How it works: Maps a directory on your host machine directly into the container. Any files in /home/host/data will be available inside the container at /var/lib/mysql/data and vice versa.
Use cases:
- Development environments (for real-time code changes)
- Configuration files
- Sharing files between host and containers
Advantages:
- Direct access to files from host machine
- No need to copy files into the container
- Changes on the host immediately visible in container
3. Anonymous Volumes
docker run -v /var/lib/mysql/data mysql
How it works: Similar to named volumes but with a randomly generated name. Docker itself takes care of the volume creation in host, we just mention the path in the Docker container.
Use cases:
- Temporary data that should outlive a specific container instance
- When you don't need to reference the volume later
Bind Mounts vs. Named Volumes: Choosing the Right Option
I spent a long time figuring out when to use which option. Here's what I learned:
| Feature | Bind Mounts | Named Volumes |
|---|---|---|
| Location | Any directory on host | Managed by Docker in /var/lib/docker/volumes
|
| Path Specification | Full host path required | Just the volume name |
| Portability | Less portable (host-dependent) | More portable |
| Host Modification | Can be modified directly on host | Requires Docker commands |
| Performance | Depends on host filesystem | Optimized by Docker |
| Usage | Development, config files | Production data |
I generally use:
- Named volumes for production data
- Bind mounts for development or when I need to edit files directly
Volume Drivers
Docker supports volume drivers that extend storage capabilities:
# Create a volume with a specific driver
docker volume create --driver=local my-volume
# Create a volume with driver options
docker volume create --driver=local \
--opt type=nfs \
--opt o=addr=192.168.1.1,rw \
--opt device=:/path/to/dir \
my-nfs-volume
Common volume drivers:
-
local: Default local driver -
nfs: For NFS mounts - Cloud storage drivers: For AWS EBS, Azure Disk, etc.
Data Management Strategies
Stateless vs. Stateful Containers
Stateless containers don't store persistent data:
- Web servers
- Application servers
- Microservices
- Worker processes
Stateful containers need to store data:
- Databases
- Caching services
- File storage services
- Message queues
I've found it's best to:
- Make as many components stateless as possible
- Use volumes only for truly stateful parts of the application
- Consider using managed services for stateful components (e.g., RDS for databases)
Data Backup and Recovery
Backing up volume data is essential. Here's how I do it:
# Backup a volume to a tar file
docker run --rm -v my-volume:/source -v $(pwd):/backup \
alpine tar -czf /backup/my-volume-backup.tar.gz -C /source .
# Restore from a backup
docker run --rm -v my-volume:/target -v $(pwd):/backup \
alpine sh -c "tar -xzf /backup/my-volume-backup.tar.gz -C /target"
For automated backups, I put this in a cron job or CI/CD pipeline.
Sharing Data Between Containers
There are two main ways to share data between containers:
1. Using a shared volume:
# Create a shared volume
docker volume create shared-data
# Use it in multiple containers
docker run -v shared-data:/app/data container1
docker run -v shared-data:/app/data container2
Real-World Examples
Running a Database with Persistent Storage
# Create a volume for the database
docker volume create postgres-data
# Run PostgreSQL with the volume
docker run -d \
--name postgres \
-e POSTGRES_PASSWORD=mysecretpassword \
-v postgres-data:/var/lib/postgresql/data \
-p 5432:5432 \
postgres
Now, even if the container is removed, the data will persist in the postgres-data volume.
Development Environment with Code Mounting
# Mount current directory for development
docker run -d \
--name node-app \
-v $(pwd):/app \
-w /app \
-p 3000:3000 \
node:14 \
npm start
This mounts your current directory into the container at /app. When you change code on your host, it's immediately reflected in the container.
Sharing Configuration Files
# Mount a specific config file
docker run -d \
--name nginx \
-v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro \
-p 80:80 \
nginx
The :ro suffix makes the mount read-only, preventing the container from modifying your config file.
Best Practices I've Learned
-
Use named volumes for important data
- They're easier to manage and backup
-
Use bind mounts during development
- For real-time code changes without rebuilding
-
Make containers as stateless as possible
- Easier scaling and recovery
-
Be careful with permissions
- Container users must have proper permissions on mounted volumes
-
Always back up volumes
- Persistence isn't the same as backup
Consider volume labels for organization
docker volume create --label project=myapp myapp-data
- Clean up unused volumes regularly
docker volume prune
Managing Volume Permissions
One issue I frequently ran into was permission problems with volumes. Here's how I solved it:
# Set permissions before mounting
docker run --rm -v my-volume:/data alpine chmod 777 /data
Conclusion
Understanding Docker volumes has been essential for my containerized applications. The ephemeral nature of containers makes volumes necessary for any application that needs to store data.
To summarize:
- Use named volumes for persistent data
- Use bind mounts for development
- Choose the right storage strategy for your application's needs
- Remember to back up your volumes
In the next post, I'll cover Docker networking - how containers communicate with each other and the outside world.
Next up: "Docker Networking: Connecting Containers"
Top comments (0)