After paying $12/month for a commercial VPN service that I barely used, I decided to build something better. What started as a weekend project became a deep dive into infrastructure automation, SSH scripting, and the peculiarities of cloud provider APIs.
The Problem
Commercial VPN services charge monthly fees regardless of usage. I needed a VPN maybe 10 days per month while traveling, but paid for 30. The math was simple: VPS instances cost $3-6/month, while VPN subscriptions cost $10-15. But there was a catch - I didn't want to maintain a server 24/7 just for occasional use.
The Solution: On-Demand Infrastructure
I built Auto-VPN to deploy WireGuard servers on-demand across multiple cloud providers (Vultr, Linode, DigitalOcean). The system automatically provisions servers when needed and tears them down when inactive. This post covers the technical implementation details.
Architecture Overview
The system has four main layers:
Web Interface (Streamlit)
↓
Application Layer (orchestration)
↓
Provider Layer (cloud APIs + Pulumi)
↓
Infrastructure (VPS + WireGuard)
Challenge 1: Automating WireGuard Installation
WireGuard doesn't have a clean API. Most tutorials assume manual SSH and interactive prompts. I needed to automate the entire installation process.
The SSH Automation Problem
The Nyr/wireguard-install script is excellent but highly interactive. Here's what a typical session looks like:
wget https://git.io/wireguard -O wireguard-install.sh && bash wireguard-install.sh
# IPv4 address [1]:
# Port [51820]:
# Name [client]:
# DNS server [1]:
I built a generic command-response system using paramiko that handles any interactive script:
def execute_command_with_responses(self, command, responses, completion_indicator=None):
shell = self.client.invoke_shell()
shell.send(command + "\n")
buffer = ""
used_responses = set()
while True:
if shell.recv_ready():
recv = shell.recv(1024).decode("utf-8", errors="ignore")
buffer += recv
# Check for each prompt and send response
for i, (prompt_pattern, response) in enumerate(responses):
if i not in used_responses and re.search(
prompt_pattern, buffer, re.IGNORECASE | re.MULTILINE
):
shell.send(response)
used_responses.add(i)
buffer = "" # Clear to prevent re-matching
break
The response mapping for WireGuard installation:
responses = [
(r"IPv4 address \[1\]:\s*$", "1\n"), # Select first IP
(r"Port \[51820\]:\s*$", "\n"), # Default port
(r"Name \[client\]:\s*$", f"{client_name}\n"), # Client name
(r"DNS server \[1\]:\s*$", "3\n"), # DNS option 3
(r"Press any key to continue\.\.\.\s*$", "\n"), # Continue
]
Real-World Complications
This took longer than expected. The regex patterns needed extensive testing because:
- Timing issues: SSH output arrives in chunks
- Encoding problems: Some VPS providers have locale issues
- Package manager locks: Ubuntu's unattended-upgrades often blocks apt
I added a package manager wait function that polls until apt is available:
def wait_for_package_manager(self, timeout=300):
wait_command = f"""
timeout_counter=0
while [ $timeout_counter -lt {timeout} ]; do
if ! pgrep -x apt >/dev/null && ! pgrep -x apt-get >/dev/null; then
if apt update -y >/dev/null 2>&1; then
exit 0
fi
fi
sleep 5
timeout_counter=$((timeout_counter + 5))
done
"""
Challenge 2: Infrastructure as Code with Pulumi
Managing infrastructure across multiple providers required abstraction. I chose Pulumi over Terraform for its programmatic Python API.
Provider Abstraction
Each cloud provider has different resource names and parameters. I created a unified interface:
class InfrastructureManager(ABC):
@abstractmethod
def pulumi_program(self):
"""Define the Pulumi program for the specific provider"""
pass
@abstractmethod
def set_stack_config(self):
"""Set provider-specific configurations"""
pass
@abstractmethod
def required_plugins(self):
"""Define required Pulumi plugins"""
pass
Plugin Management Complexity
Pulumi plugins are platform-specific binaries. I pre-downloaded all combinations:
pulumi-resource-vultr-v2.23.1-darwin-amd64.tar.gz
pulumi-resource-vultr-v2.23.1-darwin-arm64.tar.gz
pulumi-resource-vultr-v2.23.1-linux-amd64.tar.gz
pulumi-resource-vultr-v2.23.1-linux-arm64.tar.gz
The system detects architecture and extracts the correct plugin:
def get_system_arch(self):
system = platform.system().lower()
machine = platform.machine().lower()
arch_map = {"x86_64": "amd64", "aarch64": "arm64", "arm64": "arm64"}
return system_map.get(system, system), arch_map.get(machine, machine)
State Management
Pulumi state needed to be portable since servers are ephemeral. I serialize the entire stack state to the database:
def export_stack_state(self) -> dict[str, Any]:
export_result = self.stack.export_stack()
return {
"deployment": {
"version": export_result.version,
"deployment": export_result.deployment,
},
"config": self._read_stack_settings().get("config", {}),
"project_name": self.project_name,
"stack_name": self.stack_name,
}
Challenge 3: Activity Monitoring and Cleanup
The system needs to detect inactive VPN connections and clean up servers automatically.
WireGuard Handshake Tracking
WireGuard exposes handshake timestamps via wg show
:
wg show all latest-handshakes
# wg0 pub_key_hash 1638360000 # Unix timestamp
# wg0 another_key 0 # Never connected
I parse this output to track peer activity:
def get_latest_handshakes(self) -> dict[str, datetime | None]:
command = "wg show all latest-handshakes"
stdin, stdout, stderr = self.client.exec_command(command)
output = stdout.read().decode("utf-8")
handshakes = {}
for line in output.strip().split("\n"):
parts = line.split()
if len(parts) != 3:
continue
_, peer_public_key, timestamp_str = parts
timestamp = int(timestamp_str)
if timestamp == 0:
handshake_time = None # Never connected
else:
handshake_time = datetime.utcfromtimestamp(timestamp)
handshakes[peer_public_key] = handshake_time
return handshakes
Cleanup Logic
The cleanup algorithm considers both handshake activity and peer creation time:
def _should_delete_server(self, peers, handshakes, activity_threshold_time) -> bool:
if not peers:
return True # Delete servers with no peers
for peer in peers:
handshake_time = handshakes.get(peer.public_key)
if handshake_time:
# Has connected before - check last handshake
if handshake_time >= activity_threshold_time:
return False # Recent activity
else:
# Never connected - check creation time
peer_creation_time = peer.created_at.replace(tzinfo=pytz.UTC)
if peer_creation_time >= activity_threshold_time:
return False # Recently created
return True # All peers inactive
Challenge 4: Database Persistence
I used Peewee ORM with support for both SQLite (development) and PostgreSQL (production):
class Server(BaseModel):
provider = CharField()
project_name = CharField()
ip_address = CharField(unique=True)
ssh_private_key = TextField() # Serialized RSA key
stack_state = TextField() # JSON-encoded Pulumi state
location = CharField()
server_type = CharField()
price_per_month = FloatField(null=True)
created_at = DateTimeField(default=lambda: datetime.now(pytz.UTC))
class VPNPeer(BaseModel):
server = ForeignKeyField(Server, backref="peers", on_delete="CASCADE")
peer_name = CharField()
public_key = TextField() # WireGuard public key
wireguard_config = TextField() # Complete .conf file
created_at = DateTimeField(default=lambda: datetime.now(pytz.UTC))
Real-World Usage and Costs
After 6 months of usage:
- Average monthly cost: $0.87 (vs $12 for commercial VPN)
- Usage pattern: 8-12 days/month while traveling
- Server uptime: 2-4 hours average per session
The cost savings are real, but this isn't for everyone.
Limitations and Honest Drawbacks
Not suitable if you need:
- Zero-configuration setup
- Mobile app integration
Technical complexity:
- Requires cloud provider API keys
- Database setup (PostgreSQL recommended for production)
- Understanding of infrastructure concepts
Implementation Lessons
SSH automation is harder than it looks. Interactive scripts need extensive testing across different environments.
Cloud provider APIs are inconsistent. What works for Vultr may not work for DigitalOcean. Abstract early.
State management matters. Pulumi state corruption can leave orphaned resources. Always backup state.
Error handling is critical. Failed deployments can leave resources running indefinitely.
The Code
The complete implementation is available at g1ibby/auto-vpn. It's not perfect, but it works and saves me money.
Conclusion
Building Auto-VPN taught me more about infrastructure automation than any tutorial. The 90/10 rule applies heavily here - the core functionality took a weekend, but handling edge cases and real-world deployment issues took months.
If you use a VPN occasionally and enjoy technical challenges, this approach can save significant money. If you just want a VPN that works, stick with commercial services.
The sweet spot is developers or technical users who travel occasionally and want full control over their VPN infrastructure without ongoing monthly costs.
Cost comparison based on personal usage: Vultr $0.87/month vs NordVPN $11.95/month. Your mileage may vary.
Top comments (0)