Introduction
Agentic AI is a type of artificial intelligence system powered by large language models that operates autonomously using tools to interact with the outside system under minimal human supervision. These AI Agents use tool calling to interact with the outside world and perform tasks on the user's behalf. These tasks can include programming (including performing long-running programming tasks, such as Ralph loops), sending emails, responding to some event, executing code, searching the web, using a computer, and generally anything else that can be represented as a tool call. While this provides LLMs with great utility, it also gives them destructive power that can result in the deletion of files, execution of harmful code, heavy usage of resources, etc.
This is where sandboxing is useful: if AI agents could run in an ephemeral machine where making irreversible changes to the system doesn't have many consequences, it can alleviate some of the pitfalls of providing LLMs with this much power.
Sandboxes are ephemeral machines that are easy and fast to create and destroy, and can also run for long amounts of time. They’re usually deployed on some cloud provider, since cloud machines are easy to spin up and down, can run for long amounts of time, and can be created with various hardware configurations; but there are tools that allow you to create sandboxes on your local machine (you can use docker for this, for example). Thus, this is a great use case for Virtual Private Servers (VPS) such as DigitalOcean Droplets. They can be created and removed with ease, while being billed by the second, so you only pay for what you use. Sandboxes are meant to be ephemeral, i.e. when you delete the sandbox, it loses all its data. Sandboxes are created based on an image, so they start off with a set configuration (such as installed and configured programs, users, keys, etc.).
In this article, you will learn how you can use DigitalOcean Droplets to create ephemeral sandboxes (which can also have persistent storage attached). You'll be introduced to various DigitalOcean services that make using Droplets as sandboxes easier. Using DO over other tools lets you select the location, compute power, and software installed on the sandbox, while being able to take advantage of DigitalOcean's competitive pricing and connecting other services like Volumes, VPCs, the Container Registry, and more to your sandbox right in one place.
Key Takeaways
- Isolated Environments for Agentic AI: Running autonomous AI tools in isolated cloud environments mitigates security risks like accidental file deletion, resource draining, or malicious script execution.
- Cost Efficiency via Per-Second Billing: Services like DigitalOcean Droplets support per-second billing, enabling rapid programmatic provisioning and teardown without incurring flat hourly or monthly minimums.
-
Declarative Infrastructure Setup: Utilizing
cloud-initconfigurations provides a robust, code-defined method to inject security rules, users, and environment packages during instance launch without pre-building custom disk images. - Data State Management: Combining ephemeral compute with network-attached storage volumes allows you to persist structural project histories or git trees across individual sandbox replacements.
Prerequisites
To follow along with this article, you will need the following things:
- A DigitalOcean Account (you can sign up for $200 free credits using this link)
- Some basic Linux command-line knowledge (check these articles for an introduction)
- Basic programming knowledge and familiarity with a coding agent for the Ralph loop use case.
Programmatically Creating DigitalOcean Droplets
A Droplet is a Virtual Private Server (a Linux virtual machine in the cloud) which can be configured with a desired amount of (virtual) CPUs, memory, and storage. Droplets can be provisioned and deprovisioned as we need, making them perfect for creating ephemeral sandbox machines. Additionally, you save costs with per-second droplet pricing that was rolled out recently, since you pay down to the exact second for our droplet usage, rather than an hourly or monthly rate.
The DigitalOcean API allows you to create and destroy Droplets programmatically. You can do this by calling API endpoints from an HTTP client, use one of their client libraries, or the CLI. You will need an API Key to authenticate to the API, which you can create in the console. Create an API Key with at least the read, create, and update permissions for Droplets (you may need to add other permissions if you plan on using other features of DigitalOcean like VPCs and volumes).
Save this token to your environment so you can use it in the deployment commands later:
export TOKEN="your_api_token_here"
You can use the doctl CLI or the DigitalOcean API to create a droplet with a provided image and size. You can select one of the base images or a one-click image from the Market, but what if you need something specific that isn't available in these images, such as wanting the sandbox to have specific packages or expose some specific services (like an A2A server).
DigitalOcean lets you specify a shell script to run when the droplet is created, or write a more sophisticated configuration using cloud-init yaml files. They provide a nice alternative to custom images in that they don't cost money to store, don't have to be built in advance, but result in longer ready-times for the sandbox.
Moving Beyond Standard Images
While you can easily deploy base images (like Debian or Ubuntu) or Marketplace apps, out-of-the-box templates often fall short if your sandbox needs specific pre-installed packages or custom background services right from boot.
You could pass a standard bash script to run on startup, but shell scripts can be brittle. A single missing dependency or typo can leave the system in a half-broken state. A more robust alternative is cloud-init. This industry-standard tool uses structured YAML configurations (called cloud-configs) to handle system setup predictably. They provide a lightweight alternative to building custom machine images in advance, avoiding storage costs while keeping your infrastructure defined as code.
Defining the Sandbox Configuration
Let's define a sandbox that installs the latest version of Bun, downloads a web app, exposes an HTTP service, and locks down the firewall.
Create a file named cloud-config and add the following YAML. Notice how the configuration is broken down into declarative blocks: managing user access (users), installing system packages (packages), writing a systemd daemon directly to the filesystem (write_files), and finally executing the deployment commands (runcmd).
#cloud-config
# create a non-root user
users:
- name: user
# the user's real name
gecos: User
# set their shell to bash
shell: /bin/bash
# give them sudo privs
sudo: "ALL=(ALL) NOPASSWD:ALL"
# remove password login, we can only log in via ssh keys
lock_passwd: true
# set one or more SSH keys for this user
ssh_authorized_keys:
# put any ssh keys you want on the user here
- ssh-ed25519 AAAASOME_FINGERPRINT
# run apt-get update (or equivalent on non-Debian systems)
package_update: true
# install curl and unzip (required to install bun)
# ufw is the firewall
packages:
- curl
- unzip
- ufw
# write arbitrary files
write_files:
# create systemd service
- path: /etc/systemd/system/app.service
permissions: '0644'
content: |
[Unit]
Description=A simple server
After=network.target
[Service]
Type=simple
User=user
WorkingDirectory=/home/user
ExecStart=/opt/bun/bun run /home/user/index.ts
Restart=always
Environment=PORT=5000
[Install]
WantedBy=multi-user.target
# running arbitrary commands
runcmd:
# download bun
- mkdir -p /opt/bun
- curl -fsSLo /opt/bun/bun.zip [https://github.com/oven-sh/bun/releases/download/bun-v1.3.10/bun-linux-x64.zip](https://github.com/oven-sh/bun/releases/download/bun-v1.3.10/bun-linux-x64.zip)
- unzip -oqd /opt/bun /opt/bun/bun.zip
- mv /opt/bun/bun-linux-x64/bun /opt/bun
- chmod +x /opt/bun/bun
- rm -r /opt/bun/bun-linux-x64 /opt/bun/bun.zip
# download the source code
- curl -fsSLo /home/user/index.ts [https://gist.github.com/arnu515/8c639949ee1a5d226312873151ca40f9/raw/49eeeba4344110d297ed8f30d32ba8f307af4db2/index.ts](https://gist.github.com/arnu515/8c639949ee1a5d226312873151ca40f9/raw/49eeeba4344110d297ed8f30d32ba8f307af4db2/index.ts)
- chown user:user /home/user/index.ts
# configure the firewall
- ufw allow ssh
- ufw allow 5000/tcp
- ufw enable
# enable and start the systemd service
- systemctl enable --now app
Do not remove the
#cloud-configcomment at the top of the file, as DigitalOcean requires it to parse the payload. Remember to replacessh-ed25519 AAAASOME_FINGERPRINTwith your actual public key.
Provisioning the Droplet
With the configuration defined, you can deploy the machine. You can do this via the official doctl CLI or by sending the payload directly to the API using curl.
If you have doctl installed and authenticated, run:
doctl compute droplet create \
--enable-ipv6 \
--image debian-13-x64 \
--size s-1vcpu-512mb-10gb \
--user-data-file cloud-config \
--region YOUR_REGION_CHOICE \
sandbox
Alternatively, if you are relying strictly on the API, you can use jq to properly escape the YAML file's quotes and newlines before piping it into curl:
jq -n \
--arg region "YOUR_REGION_HERE" \
--arg ud "$(cat cloud-config)" \
'{
name: "sandbox",
region: $region,
size: "s-1vcpu-512mb-10gb",
image: "debian-13-x64",
ipv6: true,
user_data: $ud
}' | curl -X POST "[https://api.digitalocean.com/v2/droplets](https://api.digitalocean.com/v2/droplets)" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d @-
To find the exact slug for your preferred data center, run
doctl compute regions list.
Monitoring and Testing
The underlying droplet infrastructure takes a few seconds to spin up. You can check the provisioning status by listing your droplets:
doctl compute droplet list
The status will initially show as new. After a brief moment, DigitalOcean will allocate the IP addresses and the status will change to active:
ID Name Public IPv4 ... Status ...
SOME_NUMBER sandbox SOME_IP ... active ...
Once the droplet is active, cloud-init begins executing your configuration in the background. After the system finishes pulling the packages and starting the systemd service, your HTTP server will be live.
Navigate to http://SOME_IP:5000 in your browser to view your live view counter app. You now have a fully reproducible sandbox environment.
Cleaning Up
Because Droplets are billed by the second, you should destroy the sandbox as soon as you are done experimenting. You can delete it using either of the following methods:
doctl compute droplet delete sandbox
# or via the API:
curl -X DELETE "[https://api.digitalocean.com/v2/droplets/DROPLET_ID](https://api.digitalocean.com/v2/droplets/DROPLET_ID)" \
-H "Authorization: Bearer $TOKEN"
Why Not Containers?
While running OCI containers locally provide a reasonably sandboxed and reproducible environment, they do still run on your local machine and come with downsides because of that, such as not being able to spin up long running tasks, requiring compute/storage resources, requiring additional software (or possibly hypervisors) installed, etc.
Running these sandboxes on a VPS provider like DigitalOcean gives you server-grade hardware and networking while being able to have long-running tasks without keeping your local machine up or using its resources.
Comparison with Other Sandbox Services
There are services like Daytona or sprites.dev which specifically provide preconfigured sandbox instances complete with client libraries and instant configuration. These services usually work by running a container or micro-VM on a server with a preconfigured image, and charging you per-second of compute you use. These services are more convenient, and containers/micro-VMs are faster to start up, but these services tend to cost much more and have higher vendor lock-in compared to using a VPS provider like DigitalOcean. For example, as of the time of writing, running a 2vCPU + 4GiB RAM sandbox with 10GiB of storage for two hours on Daytona costs $0.33, while the same sandbox running for the same time costs $0.036 on DigitalOcean — almost a 10x difference!
If you prioritize spin-up speed over cost, these services may be better than using a cloud provider. There also are plenty of sandbox orchestrating projects on GitHub that you can host on a VPS to create sandboxes within that VPS or a fleet of VPSes, but this article will not be covering that. Let me know if you're interested in such a thing and I'll be sure to cover it!
Unfortunately, there isn't a user-friendly library/command line tool for using DigitalOcean VPSes as sandboxes yet, so you'll have to write scripts that wrap over doctl or use the HTTP API (optionally through a client library) instead. This article will henceforth cover using the doctl command since that's easiest to wrap using something like bash scripts, but these commands are easily translatable to API calls or client library methods.
Using Prebuilt Images
Writing a cloud-config and waiting for it to run every time a sandbox is created is cumbersome. It would be better if a droplet could be created from an image that is already configured according to your needs. DigitalOcean supports uploading custom images (billed at $0.06 per GB per month to store) from which Droplets can be created.
Let's create an image that has a Python interpreter, some JavaScript runtimes, and sane security defaults. This image will be used later in this article as a code execution sandbox. These images must be Unix-like images with a proper filesystem (ext3/ext4), with cloud-init and sshd installed. These images can be raw (.img) or in a virtual machine image format like qcow2 or vdi.
To create such an image, the easiest way is to create a droplet, make your preferred changes to it, and then create a snapshot of it from the console. You can also create images from QEMU/VirtualBox VMs, or use tools like Packer. You could also create a NixOS configuration and make it output a qcow2 image — for maximal reproducibility.
This article will cover the first method. Create a droplet and SSH into it. Then, make any desired changes to the droplet. I chose to update the system, create a new user, give it sudo perms, add an SSH key to it, disable root ssh login, set up ufw, set up fail2ban, and finally, install python3, nodejs, and bun. Feel free to follow these steps, or change them as you wish.
Once your droplet is ready, create a snapshot of it to save it as an image: (these steps can also be done from the DigitalOcean console)
# power off the droplet
doctl compute droplet-action power-off DROPLET_ID
# create a snapshot — note the region
doctl compute droplet-action snapshot --snapshot-name 'sandbox-image' --wait DROPLET_ID
# delete the droplet
doctl compute droplet delete DROPLET_ID
Now, you can use this image to create a droplet. You can only create the droplet in the region you created the snapshot (i.e. the region of the earlier droplet). You can add snapshots (and user-created images) to other regions at no additional cost from the console.
# note the image id of the desired image
doctl compute image list-user
doctl compute droplet create \
--image IMAGE_ID \
--size s-1vcpu-1gb \
--region REGION \
--sandbox
Feel free to scour the DigitalOcean one-click app marketplace or for community created images on the web for readymade images you can use.
Persist Sandbox Contents
There are some situations in which you don't want an ephemeral sandbox, but you need compute that can be spun up and down on demand while keeping data changes. A common example of this is something like GitHub codespaces, where you want to persist local code changes the user has made, but spin down the compute when they're done editing. They should then be able to get back to editing from the point they left off later.
DigitalOcean allows you to attach a (network-attached) storage pool that persists over droplet destruction. They're called block storage volumes. They can be created and attached to a droplet to provide it with storage that doesn't get destroyed with the droplet. Do keep in mind that volumes can only be attached to Droplets in the same region.
If you're looking for storage to upload, say build artifacts/output, or other such output files that aren't read from once they're written to, need to be accessed from tooling outside the sandbox, or do not need to be available across sandboxes, but need to be available somewhere, then I recommend using object storage (like DigitalOcean Spaces) instead. This would mean having to write some additional software that your sandbox will have to execute to upload these files to the object storage (it could be as simple as a cURL script). Block storage volumes are recommended in use cases requiring constant access and modification of created files, rather than "upload and forget".
The below command creates a 5GiB volume: (be sure to note the volume's ID)
doctl compute volume create --region blr1 --size 5GiB --fs-type ext4 --fs-label storage sandbox-volume
The volume can be attached to a running droplet like so:
doctl compute volume-action attach VOLUME_ID DROPLET_ID
After the action completes, the volume is available on the droplet at /mnt/sandbox_volume. Writing to any files in this volume will ensure that those files are persisted even after the droplet is destroyed. You can remove a volume from a droplet by using the detach subcommand of volume-action in doctl. If you delete this droplet now, the volume will still remain (but it'll be detached). You are still charged for the volume, even if it isn't attached to anything.
When creating a new droplet, you can directly attach the volume to it using the --volumes flag with the droplet create subcommand. The volume will be attached to the same place and the files that were created earlier will be available. Unlike object storage, the files on the volume cannot be accessed from outside Droplets.
Volumes can only be attached to one droplet at a time (to prevent from data corruption due to race conditions). If you want multiple Droplets to access data from the same volume at the same time, you'll have to use a network filesystem hosted on one of the Droplets with the volume attached.
Attaching a Fixed IP to the Sandbox
Up until now, all the Droplets you've created have had different IPs. This isn't really useful since you'd want to reach your sandboxes over the internet to use them. DigitalOcean allows you to reserve IPv4 and IPv6 addresses that you can attach to Droplets to give them a deterministic address that they can be reached at.
IPv4 reserved addresses cost money — even if they're not attached to a droplet, but IPv6 addresses are free to reserve. If your network supports IPv6, you can get away with not reserving IPv4 addresses and only using IPv6 to communicate with your Droplets.
Create a fixed IP through the following commands:
doctl compute reserved-ip create --region REGION
doctl compute reserved-ipv6 create --region REGION
You can then attach a reserved IP to an existing droplet using the following command: (the IP and droplet must be in the same region)
doctl compute reserved-ip-action attach IPV4_ADDR DROPLET_ID
doctl compute reserved-ipv6 attach IPV6_ADDR DROPLET_ID
Unfortunately, there's no way to create a droplet and assign it a reserved IP immediately. You'll have to run the above command to assign it the IP after creation manually. Alternatively, you could fetch the droplet's IP address by using the following commands:
doctl compute droplet get --template '{{.PublicIPv4}}' sandbox
doctl compute droplet get --template '{{.PublicIPv6}}' sandbox
You may want to take a look at the Reserved IP documentation if you want to do some further configuration, like making the droplet send outbound traffic through the reserved IP. Some of these changes may require the ip command, which you can persist across sandboxes either by updating the image, or by writing a specific cloud-config (I recommend the latter since if your reserved IP changes in the future, that would mean updating the image with the former method).
Use-case: A Ralph loop Agent
If you're familiar with LLM-assisted programming (using large language models to automate and enhance software development tasks), you must have heard of a Ralph loop, which, in its simplest state, is the following bash loop:
while:; do cat prompt.md | CODING_AGENT --allow-all-perms; done
What is a Ralph loop? It's an agent that is fed a prompt from a file over and over again until it performs some big task. At every iteration of the agent, it can edit the prompt file to give instructions to the next iteration of the agent that runs. Ralph loops can either produce bug-free complicated software due to its many iterations and precise instructions, or end up going in a death spiral after one of its iterations makes a mistake.
Why sandbox a Ralph loop? This is the perfect use-case for a sandbox. When a Ralph loop goes rogue and starts breaking apart, you should be able to contain the fallout. There are many cases of agents going rogue and deleting files, writing bad code, running arbitrary commands, etc., and stuff like that should not be happening on a machine you care about. A sandbox is ideal for running Ralph loops, especially in unsupervised cases.
How to set up the sandbox? 1. Base Image: You'd start by using a base image with all of your necessary tools included — such as programming languages, build tools, and an LLM coding tool.
-
cloud-config: Then, you'd add a cloud-config that creates the initial
prompt.mdfile, clones the repo, sets up credentials & API keys, etc.
Things to consider
The repo should be in a block volume since if the agent goes rogue, you'll still have the repo that you can restore using git after deleting the sandbox. The original Ralph loop article linked above recommends providing detailed specifications and making the agent only do one task per loop.
Make sure your Ralph loop knows how to exit (a common way is to exit if the previous iteration's prompt is the same as the next — but make this clear in the system prompt of the agent).
You should also tell the agent to periodically commit its work using git. You'll need to upload the results (if not using a volume) and delete the sandbox after the agent is done running. You can check if the agent is still active by periodically using the ps command to check for your coding tool's process. It is also a good practice to have a cronjob on your main machine to delete these Ralph sandboxes in case the harness crashes or the agent takes too long to save on compute and tokens.
You'll need to write a harness that spins up these sandboxes with a cloud-config that creates the initial prompt.md and any other supporting files, waits for the agent to finish executing, and then spins down the sandbox. It could also perform some additional steps, like creating a PR from the resulting code, or running a code review, or something else. This harness should be extremely specific to your work, due to the non-deterministic nature of LLMs. A generic harness may make the LLM go rogue more often, and thus, I leave the harness creation to you. Do tell about your experience in the comments!
FAQs
What happens to data stored on a sandbox Droplet when it is destroyed?
Because Droplets act as completely isolated virtual machine instances, destroying a Droplet permanently deletes its root filesystem and any local ephemeral storage. To preserve logs, source trees, or runtime artifacts, you must connect a persistent network attached component like a DigitalOcean Block Storage Volume or upload output artifacts directly to DigitalOcean Spaces object storage prior to executing the destruction command.
How does DigitalOcean’s per-second billing benefit automated AI workflows?
Traditional cloud VPS platforms bill on a strict hourly minimum window even if an instance only runs for a few minutes. DigitalOcean calculates compute usage down to the exact second. For autonomous AI testing workflows—where a harness might spin up an agent, run a suite of tests in two minutes, and instantly destroy the environment—you only pay for those exact 120 seconds of resource usage.
Can I attach a single DigitalOcean Volume to multiple sandbox Droplets simultaneously?
No. DigitalOcean Block Storage Volumes use standard raw block mappings (such as ext4) that are not designed to safely negotiate concurrent write operations across multiple independent operating systems. Attaching a single volume to multiple systems simultaneously can cause catastrophic filesystem corruption. For multi-agent configurations requiring shared access, configure a Network File System (NFS) server wrapper on a host Droplet instead.
Why should I choose cloud-init over standard custom snapshots for my sandbox environments?
While custom machine images provide faster initial boot sequences, they require maintenance overhead, incur minor storage costs ($0.06/GB per month), and must be regenerated whenever global dependency targets shift. cloud-init allows you to manage sandbox setups entirely via infrastructure-as-code configurations, modifying system behaviors dynamically on the fly without changing baseline snapshots.
Conclusion
To combat the unpredictability of agentic-AI systems, they must be run on disposable machines, while still being able to preserve certain changes and outputs of those agents. Sandboxes fill this criterion quite well, not only for agentic applications but also in use cases where quick ephemeral compute is required. DigitalOcean provides raw compute for cheap, and using the techniques prescribed in this article, you can use these resources as a sandboxing tool. Their products are cost-effective and provide more control over using specific sandboxing services, but they may not be DX-optimized or be the fastest to start up. You can reach a middle ground by using a self-hostable sandbox orchestrator over one or multiple Droplets that uses either containers or micro-VMs (Firecracker). Let me know if you want to see more articles about this topic!
I can't wait to see all the things you create with the ideas mentioned in this article! Please tell me about your experiences or queries, if you have any, in the comments below!
This article is part of the DigitalOcean Ripple Writers program. I received compensation and platform credits for writing this content, but all technical assessments, code, and opinions are my own based on hands-on testing.
Top comments (0)