Sergei Varibrus

Posted on Jun 4

Self-Hosting Your VPN: A Developer's Guide to WireGuard Automation

#vpn #privacy #cicd #programming

After paying $12/month for a commercial VPN service that I barely used, I decided to build something better. What started as a weekend project became a deep dive into infrastructure automation, SSH scripting, and the peculiarities of cloud provider APIs.

The Problem

Commercial VPN services charge monthly fees regardless of usage. I needed a VPN maybe 10 days per month while traveling, but paid for 30. The math was simple: VPS instances cost $3-6/month, while VPN subscriptions cost $10-15. But there was a catch - I didn't want to maintain a server 24/7 just for occasional use.

The Solution: On-Demand Infrastructure

I built Auto-VPN to deploy WireGuard servers on-demand across multiple cloud providers (Vultr, Linode, DigitalOcean). The system automatically provisions servers when needed and tears them down when inactive. This post covers the technical implementation details.

Architecture Overview

The system has four main layers:

Web Interface (Streamlit) 
    ↓
Application Layer (orchestration)
    ↓  
Provider Layer (cloud APIs + Pulumi)
    ↓
Infrastructure (VPS + WireGuard)

Challenge 1: Automating WireGuard Installation

WireGuard doesn't have a clean API. Most tutorials assume manual SSH and interactive prompts. I needed to automate the entire installation process.

The SSH Automation Problem

The Nyr/wireguard-install script is excellent but highly interactive. Here's what a typical session looks like:

wget https://git.io/wireguard -O wireguard-install.sh && bash wireguard-install.sh
# IPv4 address [1]: 
# Port [51820]: 
# Name [client]: 
# DNS server [1]:

I built a generic command-response system using paramiko that handles any interactive script:

def execute_command_with_responses(self, command, responses, completion_indicator=None):
    shell = self.client.invoke_shell()
    shell.send(command + "\n")

    buffer = ""
    used_responses = set()

    while True:
        if shell.recv_ready():
            recv = shell.recv(1024).decode("utf-8", errors="ignore")
            buffer += recv

            # Check for each prompt and send response
            for i, (prompt_pattern, response) in enumerate(responses):
                if i not in used_responses and re.search(
                    prompt_pattern, buffer, re.IGNORECASE | re.MULTILINE
                ):
                    shell.send(response)
                    used_responses.add(i)
                    buffer = ""  # Clear to prevent re-matching
                    break

The response mapping for WireGuard installation:

responses = [
    (r"IPv4 address \[1\]:\s*$", "1\n"),      # Select first IP
    (r"Port \[51820\]:\s*$", "\n"),           # Default port  
    (r"Name \[client\]:\s*$", f"{client_name}\n"),  # Client name
    (r"DNS server \[1\]:\s*$", "3\n"),        # DNS option 3
    (r"Press any key to continue\.\.\.\s*$", "\n"),  # Continue
]

Real-World Complications

This took longer than expected. The regex patterns needed extensive testing because:

Timing issues: SSH output arrives in chunks
Encoding problems: Some VPS providers have locale issues
Package manager locks: Ubuntu's unattended-upgrades often blocks apt

I added a package manager wait function that polls until apt is available:

def wait_for_package_manager(self, timeout=300):
    wait_command = f"""
timeout_counter=0
while [ $timeout_counter -lt {timeout} ]; do
    if ! pgrep -x apt >/dev/null && ! pgrep -x apt-get >/dev/null; then
        if apt update -y >/dev/null 2>&1; then
            exit 0
        fi
    fi
    sleep 5
    timeout_counter=$((timeout_counter + 5))
done
"""

Challenge 2: Infrastructure as Code with Pulumi

Managing infrastructure across multiple providers required abstraction. I chose Pulumi over Terraform for its programmatic Python API.

Provider Abstraction

Each cloud provider has different resource names and parameters. I created a unified interface:

class InfrastructureManager(ABC):
    @abstractmethod
    def pulumi_program(self):
        """Define the Pulumi program for the specific provider"""
        pass

    @abstractmethod  
    def set_stack_config(self):
        """Set provider-specific configurations"""
        pass

    @abstractmethod
    def required_plugins(self):
        """Define required Pulumi plugins"""
        pass

Plugin Management Complexity

Pulumi plugins are platform-specific binaries. I pre-downloaded all combinations:

pulumi-resource-vultr-v2.23.1-darwin-amd64.tar.gz
pulumi-resource-vultr-v2.23.1-darwin-arm64.tar.gz
pulumi-resource-vultr-v2.23.1-linux-amd64.tar.gz
pulumi-resource-vultr-v2.23.1-linux-arm64.tar.gz

The system detects architecture and extracts the correct plugin:

def get_system_arch(self):
    system = platform.system().lower()
    machine = platform.machine().lower()

    arch_map = {"x86_64": "amd64", "aarch64": "arm64", "arm64": "arm64"}
    return system_map.get(system, system), arch_map.get(machine, machine)

State Management

Pulumi state needed to be portable since servers are ephemeral. I serialize the entire stack state to the database:

def export_stack_state(self) -> dict[str, Any]:
    export_result = self.stack.export_stack()
    return {
        "deployment": {
            "version": export_result.version,
            "deployment": export_result.deployment,
        },
        "config": self._read_stack_settings().get("config", {}),
        "project_name": self.project_name,
        "stack_name": self.stack_name,
    }

Challenge 3: Activity Monitoring and Cleanup

The system needs to detect inactive VPN connections and clean up servers automatically.

WireGuard Handshake Tracking

WireGuard exposes handshake timestamps via wg show:

wg show all latest-handshakes
# wg0    pub_key_hash    1638360000    # Unix timestamp
# wg0    another_key     0             # Never connected

I parse this output to track peer activity:

def get_latest_handshakes(self) -> dict[str, datetime | None]:
    command = "wg show all latest-handshakes"
    stdin, stdout, stderr = self.client.exec_command(command)
    output = stdout.read().decode("utf-8")

    handshakes = {}
    for line in output.strip().split("\n"):
        parts = line.split()
        if len(parts) != 3:
            continue
        _, peer_public_key, timestamp_str = parts

        timestamp = int(timestamp_str)
        if timestamp == 0:
            handshake_time = None  # Never connected
        else:
            handshake_time = datetime.utcfromtimestamp(timestamp)

        handshakes[peer_public_key] = handshake_time

    return handshakes

Cleanup Logic

The cleanup algorithm considers both handshake activity and peer creation time:

def _should_delete_server(self, peers, handshakes, activity_threshold_time) -> bool:
    if not peers:
        return True  # Delete servers with no peers

    for peer in peers:
        handshake_time = handshakes.get(peer.public_key)

        if handshake_time:
            # Has connected before - check last handshake
            if handshake_time >= activity_threshold_time:
                return False  # Recent activity
        else:
            # Never connected - check creation time  
            peer_creation_time = peer.created_at.replace(tzinfo=pytz.UTC)
            if peer_creation_time >= activity_threshold_time:
                return False  # Recently created

    return True  # All peers inactive

Challenge 4: Database Persistence

I used Peewee ORM with support for both SQLite (development) and PostgreSQL (production):

class Server(BaseModel):
    provider = CharField()
    project_name = CharField() 
    ip_address = CharField(unique=True)
    ssh_private_key = TextField()  # Serialized RSA key
    stack_state = TextField()      # JSON-encoded Pulumi state
    location = CharField()
    server_type = CharField()
    price_per_month = FloatField(null=True)
    created_at = DateTimeField(default=lambda: datetime.now(pytz.UTC))

class VPNPeer(BaseModel):
    server = ForeignKeyField(Server, backref="peers", on_delete="CASCADE")
    peer_name = CharField()
    public_key = TextField()       # WireGuard public key
    wireguard_config = TextField() # Complete .conf file
    created_at = DateTimeField(default=lambda: datetime.now(pytz.UTC))

Real-World Usage and Costs

After 6 months of usage:

Average monthly cost: $0.87 (vs $12 for commercial VPN)
Usage pattern: 8-12 days/month while traveling
Server uptime: 2-4 hours average per session

The cost savings are real, but this isn't for everyone.

Limitations and Honest Drawbacks

Not suitable if you need:

Zero-configuration setup
Mobile app integration

Technical complexity:

Requires cloud provider API keys
Database setup (PostgreSQL recommended for production)
Understanding of infrastructure concepts

Implementation Lessons

SSH automation is harder than it looks. Interactive scripts need extensive testing across different environments.
Cloud provider APIs are inconsistent. What works for Vultr may not work for DigitalOcean. Abstract early.
State management matters. Pulumi state corruption can leave orphaned resources. Always backup state.
Error handling is critical. Failed deployments can leave resources running indefinitely.

The Code

The complete implementation is available at g1ibby/auto-vpn. It's not perfect, but it works and saves me money.

Conclusion

Building Auto-VPN taught me more about infrastructure automation than any tutorial. The 90/10 rule applies heavily here - the core functionality took a weekend, but handling edge cases and real-world deployment issues took months.

If you use a VPN occasionally and enjoy technical challenges, this approach can save significant money. If you just want a VPN that works, stick with commercial services.

The sweet spot is developers or technical users who travel occasionally and want full control over their VPN infrastructure without ongoing monthly costs.

Cost comparison based on personal usage: Vultr $0.87/month vs NordVPN $11.95/month. Your mileage may vary.

DEV Community