This article is the second iteration of the "Orchestrating MicroVMs, from first principles" series about the creation of an orchestrator of Firecracker microVMs in Go named Ravel. You can find the first article here.
Generally, an orchestrator cluster is made up of 2 kinds of nodes. First, there are worker nodes which handle computing workload. Secondly, there are supervisor nodes which orchestrate and schedule the workload on worker nodes.
Today we'll only talk about the worker node. Our worker will expose a REST API which will respond to the following needs:
- Create microVMs from a container image,
- Manage microVMs lifecycle (start, stop, delete microVMs),
- List and inspect microVMs,
- Expose microVMs logs (will be covered in a future article).
For the purpose of the article all the presented code is simplified and truncated of error management. You can find the full implementation on the Ravel Github repository.
From containers to microVMs
A container image (OCI image) holds all binaries required to run a workload and metadata about the intended process at container start, but it lacks a real Linux distro and kernel. Instead, it relies on the host system’s kernel, with the container runtime, like Docker or containerd, managing the container's lifecycle and interactions with the host system.
The metadata within the image, specifying entry points, environment variables, and command arguments, is crucial for initializing the container correctly by the runtime.
So, to run a container image in a microVM we need to bring a Linux kernel and an init binary to execute at kernel boot.
For now, our init binary will be mostly a copy of the thi-startup one. This needs to be mounted as a root file system and accompanied by a config file providing the metadata of the container image. Then, we extract the image content into a EXT4 file system and mount it as a second file system.
The init binary is charged to make the OCI image usable like a kind of linux distro (some linux mounts among other things) and to spawn a process with of the intended worlkoad from the provided config file.
That's about all we need to construct our microVM.
An overview of the Go implementation
Here is a simplified version of our drive creation implementation:
func (dm *DrivesManager) CreateDrive(dc RavelDriveSpec) (*Drive) {
driveId := utils.NewId()
// Create a file and allocate it the desired size
utils.Fallocate(getDrivePath(driveId), dc.Size)
// Format it as an ext4 file system
exec.Command("mkfs.ext4", getDrivePath(driveId)).Run()
// Store it in a local store
dm.store.StoreRavelDrive(driveId, dc)
return &Drive{
RavelDrive: drive,
}
}
And our drive object interface:
type Drive interface {
Mount() error
Unmount() error
GetMountPath() string
GetDrivePath() string
}
Then here is how we build the VM extracted from the original file:
func (machineManager *MachineManager) buildMachine(machineSpec types.RavelMachineSpec) (types.RavelMachine, error) {
machineId := utils.NewId()
// Get the desired image
machineManager.images.GetImage(machineSpec.Image)
// Generate the config from image
imageInitConfig := image.GetInitImageConfig()
// Build init drive and add the init binary and config into it
initDrive := machineManager.buildInitDrive(machineSpec, imageInitConfig)
// Build main drive and extract container image into it
mainDrive := machineManager.buildMainDrive(machineSpec)
return types.RavelMachine{
Id: machineId,
RavelMachineSpec: &machineSpec,
InitDriveId: initDrive.Id,
RootDriveId: mainDrive.Id,
Status: types.RavelMachineStatusCreated,
}
}
Managing microVMs lifecycle
Now that we have a file system and an init, we'll use Firecracker to spawn the microVMs. Each Firecracker microVM is attached to a unique Firecracker VMM (Virtual Machine Manager) process. So when we call firecracker.NewMachine
from the Go SDK, the returned *firecracker.Machine
object has methods to interact with this process.
We need to track this object during all the lifetime of the Firecracker VMM which is attached to our Golang process so we'll keep it in memory with a VMMManager
which is just a little abstraction over the SDK:
type VMMManager struct {
machines map[string]*firecracker.Machine
}
type VMMManagerInterface interface {
CreateMachine(ctx context.Context, machineId string, config firecracker.Config) (*firecracker.Machine, error)
StartMachine(ctx context.Context, machineId string) error
StopMachine(ctx context.Context, machineId string) error
GetMachine(machineId string) (*firecracker.Machine, error)
DeleteMachine(ctx context.Context, machineId string) error
}
We'll probably need to extract this code into a pluggable "driver" and communicate with it under an unix socket, or something like that, to decorrelate the microVMs lifecycle from our worker API. This will allow us to keep the machines on even when turning off the API. This interface will also probably become more hypervisor agnostic in the future.
Let's create our first microVM:
func (machineManager *MachineManager) CreateMachine(ctx context.Context, ravelMachineSpec types.RavelMachineSpec) (string) {
ravelMachine := machineManager.buildMachine(ravelMachineSpec)
firecrackerConfig := GetFirecrackerConfig(&ravelMachine)
machineManager.machines.CreateMachine(ctx, ravelMachine.Id, firecrackerConfig)
machineManager.store.StoreRavelMachine(&ravelMachine)
return ravelMachine.Id
}
We can now start it:
func (machineManager *MachineManager) StartMachine(ctx context.Context, machineId string) error {
machineManager.machines.StartMachine(ctx, machineId)
machineManager.store.UpdateRavelMachine(machineId, func(rm *types.RavelMachine) {
rm.Status = types.RavelMachineStatusRunning
})
return nil
}
We then do the same to stop and delete microVMs.
Here we are, so now let's expose theses features over this REST API:
GET /machines List machines
GET /machines/<id> Inspect machines
POST /machines Create a machine
POST /machines/<id>/start Start a machine
POST /machines/<id>/stop Stop a machine
DELETE /machines/<id> Delete a machine
To do that we use a minimal HTTP router for Go named Flow:
mux.HandleFunc("/api/v1/machines", h.CreateMachineHandler, "POST")
mux.HandleFunc("/api/v1/machines", h.ListMachinesHandler, "GET")
mux.HandleFunc("/api/v1/machines/:id", h.GetMachineHandler, "GET")
mux.HandleFunc("/api/v1/machines/:id/start", h.StartMachineHandler, "POST")
mux.HandleFunc("/api/v1/machines/:id/stop", h.StopMachineHandler, "POST")
mux.HandleFunc("/api/v1/machines/:id", h.DeleteMachineHandler, "DELETE")
And that's it, we have a REST API to manage firecracker microVMs. But it's still a work in progress...
What's next ?
There is a lot of work to make this worker ready to handle workload:
- Add logs management and an endpoint for it
- Add a VSOCK device to enable
docker exec
like stuff with the init. - Use the Firecracker Jailer to improve security
- Add networking management to bring internet connectivity and, in the future, cluster wide connectivity.
- Add clean OCI images management decoupled from docker.
- Track status of the workload and failure recovery
- Add resource management
And many other things...
I will publish more blog posts in the future for some of these points. To stay tuned, you can follow me on X (formerly Twitter).
For Potential Contributors: The world of open-source thrives on collaboration. If you're passionate about distributed systems and want to contribute Ravel welcomes you.
For Criticisms and Feedback: Every system can be refined and improved. Your constructive feedback is invaluable to the development of Ravel.
Thanks for reading.
Top comments (0)