Introduction
In this guide, we will walk through the process of manually creating an isolated application environment on a Linux server without using Docker. By leveraging Linux namespaces, cgroups, chroot, and other system-level isolation techniques, we can simulate the core functionality of containers. These isolation mechanisms are essential for understanding how containers work and provide us with the ability to isolate applications from the host system.
By the end of this, you will have created a lightweight, container-like environment on your Linux system, suitable for running applications securely without affecting the host system.
Objectives
By the end of this guide, you will:
- Set up a custom isolated environment for running applications.
- Use Linux namespaces to isolate processes, networking, and file systems.
- Use cgroups to limit CPU, memory, and disk usage.
- Use chroot or pivot_root to create a separate filesystem for applications.
- Ensure networking isolation so applications do not interfere with the host.
- Validate each step with relevant commands and screenshots.
Pre-requisites
An AWS Ubuntu EC2 instance
-
Make sure you have two terminals open:
- One for running commands on the host.
- One for running commands inside the container.
Install the required tools inside the container before isolating the network:
apt update
apt install -y python3 seccomp iproute2 iputils-ping
Step 1: Process Isolation with Namespaces
The first step is to isolate processes within a custom namespace, making sure they don’t interfere with processes running on the host system.
1. Creating an Isolated Namespace
To create an isolated environment for the processes, use the following command:
sudo unshare --pid --fork --mount-proc --mount /bin/bash
What does the above command do?:
- The unshare command in Linux is used to isolate a process from the host namespace, effectively creating a separate environment for the process to run.
- When the --pid flag is used, it creates a new PID namespace, meaning that the process will have its own process ID tree, isolated from the host's process IDs.
- The --fork flag ensures that a new shell is forked within this isolated namespace, allowing the user to interact with it.
- Additionally, the --mount-proc flag mounts a new /proc filesystem, which reflects the process information within the new namespace rather than the host, ensuring complete process isolation.
Why is this useful?
It ensures that processes inside this new environment will not affect or be affected by processes outside of it on the host system. This is a key concept in containerisation.
Verification
To verify that the new namespace is active, run:
lsns | grep pid
You should see a new PID namespace with a different process ID. This confirms that your new isolated process environment is working as expected.
Step 2: Filesystem Isolation with Chroot
Next, we’ll set up a minimal filesystem within the isolated environment. This filesystem will allow us to run applications as if they are in a completely separate environment.
1. Setting Up the Root Filesystem
Create a directory to serve as the root filesystem:
mkdir -p ~/my_container/rootfs
Next, we use the debootstrap tool to create a minimal Debian-based system inside this directory. This tool allows us to install the most basic set of packages needed to run a system:
sudo apt update && sudo apt install debootstrap -y
sudo debootstrap --variant=minbase stable ~/my_container/rootfs http://deb.debian.org/debian
What do the above commands do?
The debootstrap tool sets up a minimal Debian-based system within a specified directory, providing a lightweight and isolated environment. By using the minbase variant, debootstrap includes only the most essential packages, significantly reducing the size of the setup while maintaining a functional minimal system. This approach is useful when creating isolated environments or containers where minimalism is crucial.
Verification
Check the contents of the root filesystem:
ls ~/my_container/rootfs
You should see essential directories like bin
, lib
, etc
, and usr
.
2. Mounting Required System Directories
To make the system fully functional, we need to mount certain directories like /proc
, /sys
, and /dev
inside the container's root filesystem. These directories are required for processes to function correctly.
sudo mkdir -p ~/my_container/rootfs/proc
sudo mkdir -p ~/my_container/rootfs/sys
sudo mkdir -p ~/my_container/rootfs/dev
sudo mkdir -p ~/my_container/rootfs/dev/pts
sudo mount -t proc proc ~/my_container/rootfs/proc
sudo mount --rbind /sys ~/my_container/rootfs/sys
sudo mount --rbind /dev ~/my_container/rootfs/dev
sudo mount --rbind /dev/pts ~/my_container/rootfs/dev/pts
- These commands create the necessary directories and mount them inside the container's root filesystem.
Verification
Check active mounts:
mount | grep ~/my_container/rootfs
This command lists the active mounts. You should see the /proc
, /sys
, /dev
, and /dev/pts
directories mounted correctly inside the container's root filesystem.
3: Entering the Chroot Environment
Now, we enter the chroot environment, which allows us to interact with the container as though it were a separate system:
sudo chroot ~/my_container/rootfs /bin/bash
Verification
Run the following command to confirm that you're inside the isolated environment:
df -h
Step 3: Resource Management with Cgroups
Cgroups allow us to limit the resources (like CPU, memory, and disk usage) that processes inside the container can use.
1. Limiting CPU Usage
To limit the CPU usage of our isolated container, we use the following commands:
mkdir -p /sys/fs/cgroup
mkdir -p /sys/fs/cgroup/my_container
echo "50000 100000" | tee /sys/fs/cgroup/my_container/cpu.max
echo $$ | tee /sys/fs/cgroup/my_container/cgroup.procs
What do these commands do?
-
mkdir -p /sys/fs/cgroup
: Creates a directory for cgroups if it doesn't already exist. -
echo "50000 100000" | tee /sys/fs/cgroup/my_container/cpu.max
: Limits CPU usage to 50% of one CPU core by setting a CPU quota. -
echo $$ | tee /sys/fs/cgroup/my_container/cgroup.procs
: Places the current process (the shell) into the cgroup.
Verification
To check the CPU limits:
cat /sys/fs/cgroup/my_container/cpu.max
You should see the output 50000 100000, which confirms that the CPU limit has been applied.
2. Setting Memory Limits
Now, let's limit the amount of memory the container can use:
mkdir -p /sys/fs/cgroup/my_container
echo 268435456 > /sys/fs/cgroup/my_container/memory.max
echo $$ > /sys/fs/cgroup/my_container/cgroup.procs
Verification
To check the memory limits:
cat /sys/fs/cgroup/my_container/memory.max
You should see268435456 as the output, confirming the memory limit is set correctly.
3. Restricting Disk I/O
To restrict disk I/O for a specific process, use the following command. (This command should be run in the host, not in the container, so run exit to go into the host and run the following commands)
sudo ionice -c 2 -n 7 -p <PID>
This limits the disk I/O priority for the process identified by .
How to find the PID:
To find the process ID (PID) of the container, open a new terminal, SSH into your instance, run sudo su, and use the following command to find the PID:
ps aux | grep <container-name>
Verification
To verify the disk I/O restrictions:
sudo ionice -p <PID>
You should see output indicating the disk I/O priority for the specified process: best-effort: prio 7
Step 4: Security Hardening
In this step, we will enhance the security of our isolated environment by restricting certain system calls using seccomp. Seccomp (short for Secure Computing Mode) is a Linux kernel feature that allows us to filter and block specific system calls, thereby minimizing the attack surface of the container.
1. Restricting System Calls (Seccomp)
1. Install Seccop
First, we need to install the seccomp package, which provides the necessary tools to create and apply system call filters. Run the following command to install the seccomp package:
apt install seccomp -y
This command ensures that the seccomp tools are available for use in our container, allowing us to restrict system calls and improve security.
2. Create the Seccomp Profile Script
Next, we will create a Python script that will apply the seccomp profile to our container environment. The script will load the seccomp profile from a file and enforce the rules we define, such as blocking certain system calls like ptrace
.
Run the following command to create and edit the script apply_seccomp.py
:
cat > apply_seccomp.py << EOF
#!/usr/bin/python3
import seccomp
import sys
import json
# Load the profile
with open('seccomp_profile.json', 'r') as f:
profile = json.load(f)
# Create a seccomp filter
f = seccomp.SyscallFilter(seccomp.ALLOW)
# Add the rules from our profile
for syscall in profile.get('syscalls', []):
if syscall['action'] == 'SCMP_ACT_KILL':
f.add_rule(seccomp.KILL, syscall['name'])
# Apply the filter
f.load()
# Execute the command provided as arguments
if len(sys.argv) > 1:
import os
os.execvp(sys.argv[1], sys.argv[1:])
EOF
What does this script do?
- The script loads a seccomp profile from the seccomp_profile.json file, which contains the system call restrictions.
- It creates a seccomp filter and sets the default action to allow all system calls.
- The script then adds the specific rules from the profile. In this case, any system call marked with SCMP_ACT_KILL will be blocked and cause the process to terminate.
- The script then applies the filter to restrict the system calls.
- Finally, it executes the command provided as arguments to the script (if any), with the system call restrictions applied.
3. Make the script executable
chmod +x apply_seccomp.py
This command changes the file permissions of the script to allow it to be executed.
4. Run the Seccomp Script
./apply_seccomp.py
What happens here?
When you run the script, it will apply the seccomp profile and restrict system calls based on the rules defined in the seccomp_profile.json file. If the profile indicates that a specific system call, like ptrace, should be blocked, any attempt to use ptrace will result in the process being killed.
Verification
To test if ptrace is blocked, try to run the following command inside the container:
strace -e ptrace ls
You should see an error message like: Bad system call (core dumped)
.
Step 5: Networking Isolation
Networking isolation is an essential part of setting up a secure, isolated application environment. In this section, we will walk through how to create a network namespace, establish virtual network interfaces, and isolate networking between two different environments using Linux networking tools.
NB: Until stated otherwise, the commands in this section should be run on the host machine
1. Create a Network Namespace
In this first step, we create a network namespace named my_net. A network namespace is a separate network environment where we can manage network interfaces, IP addresses, and routing tables independently of the host system. The command sudo ip netns add my_net ensures that a new isolated network environment is created.
sudo ip netns add my_net
This command lists all the network namespaces present on the system. After creating my_net, it should appear in the list. This helps to verify that the namespace was successfully created.
ip netns list
2. Create Virtual Ethernet Interfaces
We now create a pair of virtual Ethernet interfaces, veth0 and veth1. These interfaces act as a bridge between the host system and the newly created network namespace. The command creates a virtual network interface (veth0) and its peer (veth1). These interfaces can communicate with each other, simulating network communication between the host system and the namespace.
sudo ip link add veth0 type veth peer name veth1
These commands display the details of the virtual interfaces veth0 and veth1. It helps us ensure that the interfaces are created correctly and are available for use. You should see output showing these interfaces and their current states (e.g., UP or DOWN).
ip link show veth0
ip link show veth1
3. Move veth0 into my_net
sudo ip link set veth0 netns my_net
4. Assign IP Addresses
In this step, we assign IP addresses to the network interfaces, enabling them to communicate within the isolated network. These interfaces will allow the isolated network namespace to interact with other networks or devices.
Assigning IP Address to veth0 Inside the Network Namespace
Next, we bring the veth0 interface up inside the my_net network namespace:
sudo ip netns exec my_net ip link set veth0 up
-
sudo ip netns exec my_net
: Executes the command within the my_net network namespace. -
ip link set veth0 up
: This command activates the veth0 interface, allowing it to participate in network communication. Until the interface is brought up, it cannot send or receive packets.
This command assigns an IP address (192.168.1.1/24
) to the interface veth0
within the my_net
network namespace. By using sudo ip netns exec my_net
, we execute the command inside the my_net
namespace. This ensures that veth0
gets the specified IP address within the isolated environment.
sudo ip netns exec my_net ip addr add 192.168.1.1/24 dev veth0
Here, we bring the interface veth0 up inside the my_net namespace. This makes the interface active and able to participate in network communication.
sudo ip netns exec my_net ip link set veth0 up
Assigning IP Address to veth1
on the Host System
On the host side, we assign the IP address 192.168.1.2/24
to the peer interface veth1
. This allows veth1
communicate with veth0
(and thus with the my_net
namespace) through the network bridge.
sudo ip addr add 192.168.1.2/24 dev veth1
We activate veth1 by bringing it up. This allows the host system to use the interface to communicate with the isolated network namespace.
sudo ip link set veth1 up
5. Test the Network Connectivity
sudo ip netns exec my_net ping -c 3 192.168.1.2
Now that the interfaces are set up, we test the connectivity between the my_net
namespace and the host system by using the ping command. This sends three ping requests (-c 3
) from the my_net
namespace to the IP address 192.168.1.2
(which is assigned to veth1
). If the setup is correct, you should see successful ping responses.
6. Verify Network Interfaces Inside the Namespace
ip netns exec my_net ip a
This command displays the IP address and other network details for all interfaces inside the my_net network namespace. It should show veth0 with the IP address 192.168.1.1/24
as expected.
7. Linking to a Container Network Namespace
sudo ln -s /proc/<PID>/ns/net /var/run/netns/my_container
In this step, we link the network namespace of a running process to the container's network namespace. The ln -s
command creates a symbolic link to the network namespace in the /var/run/netns/
directory, making it accessible for further network configuration.
8. Enter the Container with Network Namespace
sudo nsenter --net=/var/run/netns/my_net -- chroot /root/my_container/rootfs /bin/bash
Here, we use nsenter to enter the network namespace of the container. This command allows us to run a shell (/bin/bash
) within the isolated network environment of the container. The --net
option tells nsenter
to use the network namespace we linked earlier, and the chroot
command changes the root directory to the container's root filesystem (/root/my_container/rootfs
).
9. Verify Network Configuration Inside the Container
ip a
Inside the container, running ip a
should show the network interfaces of the container, including the virtual interface (veth0
) that connects it to the network namespace. The IP address assigned to veth0
(192.168.1.1/24
) should also be visible.
Step 6: Deploying an Application
In this step, we'll deploy a simple application within our isolated environment and test its accessibility both from the host and within the isolated network. We will set up a basic Python web server to demonstrate this process.
1. Installing Dependencies
Before running the application, recall that at the beginning of our setup, we installed Python in our container and its required dependencies inside the container. This ensures that Python is available to run our application inside the isolated environment.
2. Running the Web Server
To start the web server inside the isolated environment, run the following command within the container:
python3 -m http.server 8080
What does this command do?
- python3 -m http.server: This command starts a simple HTTP server using Python's built-in library. It listens for incoming HTTP requests.
- 8080: This specifies that the web server should listen on port 8080.
The Python web server will now run inside the isolated container, serving files on port 8080.
Why this is important:
By running the server in this isolated environment, we can observe how the container handles networking and whether the isolation works as expected.
3. Verifying the Web Server from the Host
To verify that the web server is accessible from the host system (i.e., the machine running the container), run the following command on the host
curl 192.168.1.1:8080
-
curl
: This command is used to transfer data from or to a server using various protocols, in this case, HTTP. -
192.168.1.1:8080
: This is the IP address ofveth0
inside the my_net network namespace, and8080
is the port where our web server is running.
Output:
The command should display the HTML content served by the Python web server, confirming that the server inside the isolated container is accessible from the host system. You should see a webpage or content indicating the server is running.
4. Verifying Network Isolation
Next, to test the network isolation, run the following command on the host:
curl localhost:8080
What does this command do?
-
curl localhost:8080
: This command tries to access the web server by targeting thelocalhost
address (i.e., the host system itself) on port 8080.
The command should not access the web server. This is because the web server is running inside the isolated network namespace (my_net
), which has been configured with network isolation. Therefore, localhost on the host system does not have access to the container's network.
This confirms that our network isolation is working properly and the application is isolated within the container environment.
Conclusion
This guide demonstrated how to manually create an isolated application environment on Linux using namespaces, cgroups, chroot, and network isolation. By understanding these underlying concepts, you gain valuable insights into how containers like Docker work under the hood.
Thank You for Reading!
If you found this guide helpful, don’t forget to like, comment, and share! Let me know if you have any questions or need further assistance.
Happy isolating! 🚀
Top comments (0)