<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rodolfo Albuquerque</title>
    <description>The latest articles on DEV Community by Rodolfo Albuquerque (@rdalbuquerque).</description>
    <link>https://dev.to/rdalbuquerque</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1874160%2Fa38683cd-87ac-4bcb-aa06-6eadf3fe35ba.jpeg</url>
      <title>DEV Community: Rodolfo Albuquerque</title>
      <link>https://dev.to/rdalbuquerque</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rdalbuquerque"/>
    <language>en</language>
    <item>
      <title>Building a Cost-Effective Valheim Server on Azure with Serverless Discord Bot Integration</title>
      <dc:creator>Rodolfo Albuquerque</dc:creator>
      <pubDate>Sat, 23 Nov 2024 17:58:04 +0000</pubDate>
      <link>https://dev.to/rdalbuquerque/building-a-cost-effective-valheim-server-on-azure-with-serverless-discord-bot-integration-2jco</link>
      <guid>https://dev.to/rdalbuquerque/building-a-cost-effective-valheim-server-on-azure-with-serverless-discord-bot-integration-2jco</guid>
      <description>&lt;p&gt;In this blog post, I'll walk you through how I built a cost-effective Valheim game server on Azure, complete with a Discord bot that lets players start and stop the server using slash commands. The setup leverages Azure's serverless capabilities and spot instances to minimize costs while providing flexibility and scalability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Intro
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/rdalbuquerque/valheim-server" rel="noopener noreferrer"&gt;This GitHub repository&lt;/a&gt; contains the &lt;a href="https://github.com/rdalbuquerque/valheim-server/tree/main/infra" rel="noopener noreferrer"&gt;infrastructure code&lt;/a&gt; and &lt;a href="https://github.com/rdalbuquerque/valheim-server/tree/main/discordbot" rel="noopener noreferrer"&gt;Discord bot code&lt;/a&gt; for the Valheim game server. The primary goal is to create a server as cost-efficiently as possible by utilizing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Azure Virtual Machine Scale Sets (VMSS) with spot instances for compute.&lt;/li&gt;
&lt;li&gt;Azure File Share for persistent game server data.&lt;/li&gt;
&lt;li&gt;Azure Functions for event-driven automation through Discord slash commands.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Azure Functions were chosen for their cost-effectiveness, offering 1 million free executions per month. However, this choice introduces some complexities, which I'll discuss later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Interactions and Reactions
&lt;/h3&gt;

&lt;p&gt;The system's interaction flow starts with Discord slash commands, which are handled by an HTTP-triggered Azure Function (&lt;code&gt;interactions&lt;/code&gt;). Discord requires a response within 3 seconds, so the API's only responsibility is to enqueue the command in an events queue and quickly respond.&lt;/p&gt;

&lt;p&gt;Another queue-triggered Azure Function (&lt;code&gt;reactions&lt;/code&gt;) picks up the command, performs the requested task (e.g., starting the server), and reports back to Discord.&lt;/p&gt;

&lt;p&gt;Here’s the sequence diagram for the start command:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ote2uc3rjw2uuekfqsv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ote2uc3rjw2uuekfqsv.png" alt="sequence diagram /start command" width="800" height="412"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Game Events
&lt;/h2&gt;

&lt;p&gt;To enhance the experience, the solution monitors Valheim game server logs and reports events such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Server availability for connections&lt;/li&gt;
&lt;li&gt;Player connections&lt;/li&gt;
&lt;li&gt;Player disconnections&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is achieved with a &lt;a href="https://github.com/rdalbuquerque/valheim-server/blob/main/infra/cloud-init.yml" rel="noopener noreferrer"&gt;script&lt;/a&gt; configured with &lt;code&gt;cloud-init.yml&lt;/code&gt; that runs on the VM. The script listens to the container logs, extracts relevant log lines, and enqueues them in the events queue.&lt;/p&gt;

&lt;p&gt;Here's how the flow looks:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdkud7em6q2v33c9ev8wg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdkud7em6q2v33c9ev8wg.png" alt="flowchart game events" width="800" height="309"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Persisting State
&lt;/h2&gt;

&lt;p&gt;To maintain the server state, I chose Azure Table Storage for its simplicity and cost-efficiency. The following attributes are persisted:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ip&lt;/code&gt; (server IP address)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;online_players&lt;/code&gt; (number of players currently online)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;status&lt;/code&gt; (e.g., running, stopped)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Given Azure Functions can execute in parallel, I implemented optimistic concurrency control using &lt;code&gt;ETags&lt;/code&gt;. This ensures that if multiple events are processed simultaneously, only the first write succeeds, and retries handle the rest. Retries are built-in with message dequeue counter (5 max) and configured with a visibility timeout to allow time for state reconciliation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Azure Function OS and Language Choice
&lt;/h2&gt;

&lt;p&gt;One of the biggest challenges was ensuring the bot responded within Discord's 3-second timeout, even during cold starts.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Initial Setup&lt;/strong&gt;: I started with a Python bot on a Linux Azure Function. However, cold starts frequently caused timeouts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Switch to Go&lt;/strong&gt;: I migrated to Go, known for its faster performance. Surprisingly, deploying the Go bot on a Windows Function App yielded significantly better cold start times compared to Linux.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To quantify this, I tested the following setups using a Postman monitor:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python on Linux&lt;/li&gt;
&lt;li&gt;Go on Linux&lt;/li&gt;
&lt;li&gt;Go on Windows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here are the results:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;th&gt;Average (s)&lt;/th&gt;
&lt;th&gt;P95 (s)&lt;/th&gt;
&lt;th&gt;P99 (s)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;10.43&lt;/td&gt;
&lt;td&gt;12.02&lt;/td&gt;
&lt;td&gt;13.35&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Go (Linux)&lt;/td&gt;
&lt;td&gt;5.84&lt;/td&gt;
&lt;td&gt;7.77&lt;/td&gt;
&lt;td&gt;7.64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Go (Windows)&lt;/td&gt;
&lt;td&gt;1.07&lt;/td&gt;
&lt;td&gt;1.70&lt;/td&gt;
&lt;td&gt;1.85&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This trial revealed that Go on Windows offered the best performance for this use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Possible Improvements
&lt;/h2&gt;

&lt;p&gt;While the setup works well, there’s room for improvement:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spot Instance Risks&lt;/strong&gt;: Spot instances can be preempted, risking progress loss if the game server isn’t stopped gracefully. A solution involves monitoring Azure's scheduled events endpoint. Upon detecting a preemption event, the VM can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stop the Valheim server (&lt;code&gt;docker stop valheim-server&lt;/code&gt;) to send a &lt;code&gt;SIGTERM&lt;/code&gt;, triggering a world save.&lt;/li&gt;
&lt;li&gt;Restart the server in a different zone or instance size.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional Features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automating backups of the game world.&lt;/li&gt;
&lt;li&gt;Adding more granular state persistence, such as player-specific data.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>serverless</category>
      <category>azure</category>
      <category>discord</category>
      <category>terraform</category>
    </item>
  </channel>
</rss>
