<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ethan Vance</title>
    <description>The latest articles on DEV Community by Ethan Vance (@ethan_vance).</description>
    <link>https://dev.to/ethan_vance</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3614663%2Fc8f8351b-195e-43a6-a694-692367589d6e.png</url>
      <title>DEV Community: Ethan Vance</title>
      <link>https://dev.to/ethan_vance</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ethan_vance"/>
    <language>en</language>
    <item>
      <title>How to Install Prometheus and Node Exporter on CentOS Stream 9 (The Upstream Way)</title>
      <dc:creator>Ethan Vance</dc:creator>
      <pubDate>Fri, 10 Apr 2026 09:55:48 +0000</pubDate>
      <link>https://dev.to/ethan_vance/how-to-install-prometheus-and-node-exporter-on-centos-stream-9-the-upstream-way-4l58</link>
      <guid>https://dev.to/ethan_vance/how-to-install-prometheus-and-node-exporter-on-centos-stream-9-the-upstream-way-4l58</guid>
      <description>&lt;p&gt;If you are managing Linux infrastructure, having real-time visibility into your servers is non-negotiable. &lt;strong&gt;Prometheus&lt;/strong&gt; is the industry-standard, open-source monitoring and alerting toolkit. Paired with &lt;strong&gt;Node Exporter&lt;/strong&gt;, it becomes a powerhouse for collecting host metrics like CPU usage, memory consumption, load averages, and network statistics.&lt;/p&gt;

&lt;p&gt;In this guide, we'll look at the SysAdmin-approved way to install Prometheus and Node Exporter on &lt;strong&gt;CentOS Stream 9&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;Instead of relying on outdated third-party RPMs, we will use the &lt;strong&gt;official upstream binaries&lt;/strong&gt;. This approach is cleaner, easy to audit, and simple to keep updated.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why the Upstream Approach?
&lt;/h3&gt;

&lt;p&gt;Relying on old third-party repositories can introduce version mismatches, missing features, and security issues. By downloading directly from the official Prometheus releases and verifying the SHA256 checksums, you guarantee your binaries are authentic and uncorrupted.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Best Practices for Deployment
&lt;/h3&gt;

&lt;p&gt;If you are setting this up in a production environment, here are the critical steps you need to follow:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Create Dedicated Service Users&lt;/strong&gt;&lt;br&gt;
For security purposes, services should never run as root. Create dedicated system users (&lt;code&gt;prometheus&lt;/code&gt; and &lt;code&gt;node_exporter&lt;/code&gt;) with no login shell to isolate the services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Verify Official Binaries&lt;/strong&gt;&lt;br&gt;
Always download the &lt;code&gt;sha256sums.txt&lt;/code&gt; alongside your tarballs and verify them using &lt;code&gt;sha256sum -c&lt;/code&gt;. Only proceed if the output says &lt;code&gt;OK&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Configure Systemd Services&lt;/strong&gt;&lt;br&gt;
Create custom &lt;code&gt;systemd&lt;/code&gt; unit files for both Prometheus and Node Exporter. This ensures they run reliably in the background, start automatically on boot, and manage data retention properly (e.g., by setting the &lt;code&gt;--storage.tsdb.retention.time=15d&lt;/code&gt; flag).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Lock Down the Firewall&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;CRITICAL SECURITY WARNING:&lt;/strong&gt; Exposing port &lt;code&gt;9090&lt;/code&gt; (Prometheus UI) or &lt;code&gt;9100&lt;/code&gt; (Node Exporter) directly to the public internet is highly discouraged. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bind Node Exporter to &lt;code&gt;127.0.0.1&lt;/code&gt; for local single-server setups.&lt;/li&gt;
&lt;li&gt;For remote scraping, use strict &lt;code&gt;firewalld&lt;/code&gt; source IP restrictions, VPNs (like WireGuard/Tailscale), or Reverse Proxies.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Complete Step-by-Step Guide
&lt;/h3&gt;

&lt;p&gt;We have documented the entire process from start to finish. If you want the complete, copy-paste friendly commands, we have put together the full SysAdmin guide on our blog. &lt;/p&gt;

&lt;p&gt;It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The exact Bash commands to download, verify, and extract the binaries.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;prometheus.yml&lt;/code&gt; scrape configurations.&lt;/li&gt;
&lt;li&gt;The complete &lt;code&gt;systemd&lt;/code&gt; unit files for both services.&lt;/li&gt;
&lt;li&gt;Initial PromQL queries to test your new monitoring stack.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;📖 &lt;strong&gt;&lt;a href="https://www.migservers.com/tutorials/howto/install-prometheus-node-exporter/" rel="noopener noreferrer"&gt;How to Install Prometheus and Node Exporter on CentOS Stream 9&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3 Common Mistakes to Avoid
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Skipping config validation:&lt;/strong&gt; Always run &lt;code&gt;promtool check config /etc/prometheus/prometheus.yml&lt;/code&gt; before restarting systemd. A simple YAML indentation typo will prevent Prometheus from starting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assuming &lt;code&gt;up == 1&lt;/code&gt; means perfect health:&lt;/strong&gt; This only confirms that Prometheus can reach and scrape the target. It does not guarantee that all expected metrics are actually present.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forgetting time synchronization:&lt;/strong&gt; If your Prometheus server and the monitored nodes are out of sync, your graphs, rate calculations, and alerts will be inaccurate.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Happy monitoring! Let me know in the comments if you have any questions about configuring your scrape jobs or writing PromQL queries.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>linux</category>
      <category>sysadmin</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>What is CUDA? Understanding the Technology Behind AI and GPU Computing</title>
      <dc:creator>Ethan Vance</dc:creator>
      <pubDate>Fri, 06 Mar 2026 05:01:03 +0000</pubDate>
      <link>https://dev.to/ethan_vance/what-is-cuda-understanding-the-technology-behind-ai-and-gpu-computing-g30</link>
      <guid>https://dev.to/ethan_vance/what-is-cuda-understanding-the-technology-behind-ai-and-gpu-computing-g30</guid>
      <description>&lt;p&gt;If you're building infrastructure for Artificial Intelligence (AI), Machine Learning (ML), or High-Performance Computing (HPC), powerful hardware alone isn't enough. The real performance advantage comes from the software layer that drives the GPU. In NVIDIA's ecosystem, that layer is CUDA.&lt;/p&gt;

&lt;p&gt;In this article, we'll break down what CUDA actually is, how its architecture works, and why it has become the industry standard for accelerating compute-intensive workloads.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Exactly is CUDA?
&lt;/h2&gt;

&lt;p&gt;Many developers assume CUDA is a programming language or even an operating system. That is not accurate.&lt;/p&gt;

&lt;p&gt;CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model developed by NVIDIA. It allows developers to use the massive parallel processing power of GPUs for general-purpose computing.&lt;/p&gt;

&lt;p&gt;Instead of relying only on CPUs for heavy computations, CUDA enables workloads like deep learning, scientific simulations, and matrix operations to run thousands of operations simultaneously on GPU cores.&lt;/p&gt;

&lt;h3&gt;
  
  
  Simple analogy
&lt;/h3&gt;

&lt;p&gt;GPU → Raw compute engine&lt;br&gt;
CUDA → Software layer that unlocks GPU parallelism&lt;/p&gt;

&lt;p&gt;CUDA provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;APIs&lt;/li&gt;
&lt;li&gt;Compilers&lt;/li&gt;
&lt;li&gt;Development tools&lt;/li&gt;
&lt;li&gt;Optimized libraries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These tools allow developers to utilize GPU acceleration without writing low-level assembly code.&lt;/p&gt;

&lt;p&gt;CPU vs GPU Architecture&lt;/p&gt;

&lt;p&gt;Understanding CUDA requires understanding the fundamental difference between CPUs and GPUs.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;CPU&lt;/th&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Core Count&lt;/td&gt;
&lt;td&gt;Dozens of powerful cores&lt;/td&gt;
&lt;td&gt;Thousands of smaller cores&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Execution Model&lt;/td&gt;
&lt;td&gt;Sequential tasks&lt;/td&gt;
&lt;td&gt;Massively parallel execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transistor Focus&lt;/td&gt;
&lt;td&gt;Cache and control logic&lt;/td&gt;
&lt;td&gt;Data processing throughput&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best Use Case&lt;/td&gt;
&lt;td&gt;Complex control logic&lt;/td&gt;
&lt;td&gt;Matrix operations and AI workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;GPUs are specifically designed for data-parallel workloads, which is why they are ideal for deep learning and scientific computing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The CUDA Software Stack
&lt;/h2&gt;

&lt;p&gt;CUDA is not a single tool. It is a full ecosystem for GPU development.&lt;/p&gt;

&lt;h3&gt;
  
  
  nvcc – CUDA Compiler
&lt;/h3&gt;

&lt;p&gt;The NVIDIA CUDA Compiler Driver (nvcc) separates:&lt;/p&gt;

&lt;p&gt;Host code (runs on the CPU)&lt;/p&gt;

&lt;p&gt;Device code (runs on the GPU)&lt;/p&gt;

&lt;p&gt;This allows developers to write heterogeneous programs where CPU and GPU work together.&lt;/p&gt;

&lt;h2&gt;
  
  
  CUDA APIs
&lt;/h2&gt;

&lt;p&gt;CUDA provides two major APIs:&lt;/p&gt;

&lt;h3&gt;
  
  
  CUDA Runtime API
&lt;/h3&gt;

&lt;p&gt;High-level interface used in most CUDA applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  CUDA Driver API
&lt;/h3&gt;

&lt;p&gt;Low-level interface for more granular control of GPU execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  CUDA Libraries
&lt;/h2&gt;

&lt;p&gt;CUDA also provides highly optimized libraries used across AI and HPC applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  cuBLAS
&lt;/h3&gt;

&lt;p&gt;Optimized linear algebra operations for GPUs.&lt;/p&gt;

&lt;h3&gt;
  
  
  cuDNN
&lt;/h3&gt;

&lt;p&gt;Deep neural network primitives such as convolution, pooling, softmax, and attention.&lt;/p&gt;

&lt;p&gt;These libraries power frameworks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PyTorch&lt;/li&gt;
&lt;li&gt;TensorFlow&lt;/li&gt;
&lt;li&gt;JAX&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  CUDA Programming Model
&lt;/h2&gt;

&lt;p&gt;CUDA assumes a heterogeneous system consisting of:&lt;/p&gt;

&lt;h3&gt;
  
  
  Host
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;CPU&lt;/li&gt;
&lt;li&gt;Host memory&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Device
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;GPU&lt;/li&gt;
&lt;li&gt;Device memory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Execution typically follows this workflow.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Data Transfer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Data is copied from host memory (CPU) to device memory (GPU).&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Kernel Execution&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A CUDA function called a Kernel is executed on the GPU.&lt;/p&gt;

&lt;p&gt;Execution hierarchy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Threads&lt;/li&gt;
&lt;li&gt;Blocks&lt;/li&gt;
&lt;li&gt;Grids&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Threads are the smallest execution units, while blocks allow threads to cooperate using shared memory.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Result Retrieval&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once the computation is complete, results are copied back from GPU memory to CPU memory.&lt;/p&gt;

&lt;p&gt;Performance depends heavily on memory access patterns. Efficient CUDA programs maximize the use of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Registers&lt;/li&gt;
&lt;li&gt;Shared memory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;while minimizing slower global memory access.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why CUDA Dominates AI Infrastructure
&lt;/h2&gt;

&lt;p&gt;NVIDIA’s leadership in AI infrastructure is largely due to the CUDA ecosystem.&lt;/p&gt;

&lt;p&gt;Reasons include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mature development platform&lt;/li&gt;
&lt;li&gt;Highly optimized performance libraries&lt;/li&gt;
&lt;li&gt;Deep integration with AI frameworks&lt;/li&gt;
&lt;li&gt;Strong developer ecosystem&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Major frameworks like PyTorch and TensorFlow rely heavily on CUDA for GPU acceleration.&lt;/p&gt;

&lt;p&gt;Because CUDA applications are built specifically for NVIDIA GPUs, it has also created a strong ecosystem around NVIDIA hardware.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;CUDA has become a foundational technology for modern GPU computing. By enabling developers to harness massive parallelism inside GPUs, CUDA allows AI systems, machine learning models, and scientific computing workloads to run dramatically faster.&lt;/p&gt;

&lt;p&gt;For developers working with AI, HPC, or GPU-accelerated computing, understanding CUDA is essential.&lt;/p&gt;

&lt;h3&gt;
  
  
  Original article:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.migservers.com/blogs/nvidia-cuda-gpu-computing/" rel="noopener noreferrer"&gt;Understanding NVIDIA CUDA: The Core of GPU Parallel Computing&lt;/a&gt;&lt;/p&gt;

</description>
      <category>cuda</category>
      <category>machinelearning</category>
      <category>ai</category>
      <category>gpu</category>
    </item>
    <item>
      <title>Finally found a way to rent H100s without selling a kidney (MIG Tech)</title>
      <dc:creator>Ethan Vance</dc:creator>
      <pubDate>Tue, 20 Jan 2026 12:17:28 +0000</pubDate>
      <link>https://dev.to/ethan_vance/finally-found-a-way-to-rent-h100s-without-selling-a-kidney-mig-tech-5cma</link>
      <guid>https://dev.to/ethan_vance/finally-found-a-way-to-rent-h100s-without-selling-a-kidney-mig-tech-5cma</guid>
      <description>&lt;p&gt;Is it just me, or is trying to rent a dedicated H100 or A100 right now an absolute nightmare?&lt;/p&gt;

&lt;p&gt;I've been working on some LLM fine-tuning recently, and I kept running into the same problem: &lt;strong&gt;Overkill.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I needed the architecture of the H100 (for the transformer engine), but I didn't need the &lt;em&gt;entire&lt;/em&gt; card 24/7. Paying $4/hr+ for a GPU that sits idle 80% of the time just burns through the budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "Aha" Moment: Splitting the Hardware
&lt;/h2&gt;

&lt;p&gt;I did some digging and realized I should be looking for &lt;strong&gt;MIG (Multi-Instance GPU)&lt;/strong&gt; capable servers.&lt;/p&gt;

&lt;p&gt;If you aren't familiar with it, MIG basically lets you slice a physical GPU (like an A100 or H100) into up to 7 completely isolated instances. It’s not just software partitioning; it’s hardware-level isolation. So you get your own dedicated memory and cache.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Resource: MIG servers
&lt;/h2&gt;

&lt;p&gt;I came across a provider called &lt;strong&gt;&lt;a href="https://www.migservers.com/" rel="noopener noreferrer"&gt;MIG servers&lt;/a&gt;&lt;/strong&gt; that specializes exactly in this. I wanted to share it here because their inventory is actually pretty impressive compared to the "Sold Out" signs I see everywhere else.&lt;/p&gt;

&lt;p&gt;They seem to have bare metal stock in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;USA:&lt;/strong&gt; Dallas, LA, Chicago&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Europe:&lt;/strong&gt; Luxembourg, London, Amsterdam&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Asia:&lt;/strong&gt; Incheon, Tokyo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What stood out to me was the flexibility. You can grab a massive 8x H100 cluster if you are training, or just slice up an A100 if you are doing inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it matters
&lt;/h2&gt;

&lt;p&gt;If you are a DevOps engineer or working in AI, you know that "Time-Slicing" is usually laggy and insecure. MIG solves that.&lt;/p&gt;

&lt;p&gt;I wrote a deeper breakdown on my personal blog about the technical specs and pricing comparisons, but I just wanted to drop this here for anyone struggling to find hardware.&lt;/p&gt;

&lt;p&gt;To give you an idea of what MIG-ready hardware looks like, here are the specs we typically deploy for these workloads at &lt;strong&gt;MIG Servers&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Location&lt;/th&gt;
&lt;th&gt;CPU&lt;/th&gt;
&lt;th&gt;GPU Configuration&lt;/th&gt;
&lt;th&gt;Max MIG Instances&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Luxembourg&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2x Xeon Platinum 8480+&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8x NVIDIA H100 (200Gbps)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;56 Instances&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dallas, USA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2x EPYC 9354&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8x NVIDIA H100 NVLink&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;56 Instances&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;London, UK&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2x Xeon Gold 6210U&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;NVIDIA A30&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;4 Instances&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;👉 &lt;a href="https://www.migservers.com/blogs/nvidia-mig-gpu-dedicated-servers/" rel="noopener noreferrer"&gt;Check out full breakdown and the server list here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let me know if you guys have tried partitioning H100s yet!&lt;/p&gt;

</description>
      <category>hardware</category>
      <category>dedicatedservers</category>
      <category>gpu</category>
      <category>nvidia</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>Ethan Vance</dc:creator>
      <pubDate>Mon, 17 Nov 2025 06:40:34 +0000</pubDate>
      <link>https://dev.to/ethan_vance/-50i6</link>
      <guid>https://dev.to/ethan_vance/-50i6</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/ethan_vance" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3614663%2Fc8f8351b-195e-43a6-a694-692367589d6e.png" alt="ethan_vance"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/ethan_vance/architecture-for-apac-the-engineering-case-for-singapore-bare-metal-infrastructure-jb" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Architecture for APAC: The Engineering Case for Singapore Bare Metal Infrastructure&lt;/h2&gt;
      &lt;h3&gt;Ethan Vance ・ Nov 17&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#linux&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#webdev&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#dedicatedservers&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>linux</category>
      <category>webdev</category>
      <category>dedicatedservers</category>
    </item>
    <item>
      <title>Architecture for APAC: The Engineering Case for Singapore Bare Metal Infrastructure</title>
      <dc:creator>Ethan Vance</dc:creator>
      <pubDate>Mon, 17 Nov 2025 06:28:16 +0000</pubDate>
      <link>https://dev.to/ethan_vance/architecture-for-apac-the-engineering-case-for-singapore-bare-metal-infrastructure-jb</link>
      <guid>https://dev.to/ethan_vance/architecture-for-apac-the-engineering-case-for-singapore-bare-metal-infrastructure-jb</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;For DevOps engineers and solutions architects deploying in the Asia-Pacific (APAC) region, the challenge isn't just distance; it is network topology. With a user base exceeding 3 billion, the difference between a 50ms and a 200ms Round Trip Time (RTT) dictates application viability.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;While cloud virtualization offers flexibility, high-performance workloads—specifically gaming, real-time analytics, and LLM training—often hit the "noisy neighbor" wall. &lt;a href="https://www.servers99.com/blog/why-singapore-dedicated-servers-are-your-secret-weapon-for-apac-dominance/" rel="noopener noreferrer"&gt;This article analyzes the technical infrastructure of Singapore as a hosting hub&lt;/a&gt;, examining connectivity ecosystem, hardware proximity, and the efficiency of bare metal over virtualized environments.&lt;/p&gt;

&lt;blockquote&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;1. The Physics of Latency: Why Topology Matters&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Singapore isn't just a geographical location; it is a primary peering exchange point. The island acts as a landing site for over 30 major submarine cable systems (including AAG, SJC2, and FASTER).&lt;/p&gt;

&lt;p&gt;For a developer, this density translates to fewer hops. When you host in Singapore, you aren't routing through Japan or the US to reach Indonesia or India. You are utilizing direct peering links.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Typical RTT Metrics from Singapore (SG1):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Jakarta/Manila: &amp;lt; 20ms&lt;/li&gt;
&lt;li&gt;Tokyo/Mumbai: &amp;lt; 50ms&lt;/li&gt;
&lt;li&gt;Sydney: &amp;lt; 95ms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Technical Note:&lt;/strong&gt; &lt;em&gt;Achieving these speeds requires a provider using multi-homed BGP (Border Gateway Protocol) sessions. BGP automation ensures that if a specific carrier (e.g., NTT) experiences packet loss, the route automatically fails over to an alternative path (e.g., Tata or Singtel) without manual intervention.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;**&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Bare Metal vs. Virtualization Overhead
&lt;/h2&gt;

&lt;p&gt;**&lt;/p&gt;

&lt;p&gt;The convenience of VPS (Virtual Private Servers) comes with a performance tax known as "Hypervisor Overhead." In a virtualized environment, the physical CPU must translate instructions from the guest OS to the host hardware.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For I/O-heavy applications, this results in:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CPU Steal Time&lt;/strong&gt;: Waiting for the physical scheduler to allocate cycles.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I/O Wait&lt;/strong&gt;: Latency introduced by sharing disk controllers with other tenants.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Bare Metal Advantage&lt;/strong&gt;: Deploying on dedicated hardware (e.g., AMD EPYC 9754 or Intel Xeon Platinum) provides raw access to the kernel. There is no abstraction layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PCIe 5.0 &amp;amp; NVMe&lt;/strong&gt;: You get full throughput (up to 14 GB/s read speeds) without virtualization throttling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deterministic Performance&lt;/strong&gt;: Unlike a VPS where performance fluctuates based on neighbors, dedicated resources provide a flat-line performance graph essential for predictable SLAs.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;3. The GPU Sovereignty Factor&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;For AI engineers working with Large Language Models (LLMs) or CUDA-accelerated rendering, hardware availability is a critical bottleneck.&lt;/p&gt;

&lt;p&gt;Singapore offers a unique distinct advantage regarding high-performance compute (HPC) availability. Data centers here frequently stock enterprise-grade clusters (NVIDIA H100, A100, L40S) that are often supply-constrained in Western availability zones. Accessing these via bare metal allows for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Direct Passthrough&lt;/strong&gt;: No vGPU licensing costs or performance loss.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cluster Scaling&lt;/strong&gt;: Low-latency cross-connects allow for efficient multi-node training setups.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;**&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Data Center Efficiency (PUE) and Resilience
&lt;/h2&gt;

&lt;p&gt;**&lt;br&gt;
Modern infrastructure in Singapore is dictated by land scarcity, driving vertical innovation. The standard for new facilities involves strictly regulated power efficiency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Specs for System Architects&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PUE (Power Usage Effectiveness)&lt;/strong&gt;: &amp;lt; 1.3. This is achieved via Direct-to-Chip liquid cooling, essential for sustaining the thermal design power (TDP) of modern high-density racks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance Stack&lt;/strong&gt;: Look for SOC 2, ISO 27001, and TVRA (Threat Vulnerability Risk Assessment) certification.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;N+1 Redundancy&lt;/strong&gt;: Ensure the facility utilizes independent dual power feeds to the rack, backed by redundant UPS and generator systems.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;**&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary: When to Switch to Dedicated
&lt;/h2&gt;

&lt;p&gt;**&lt;br&gt;
While containerization (Kubernetes) on cloud instances serves microservices well, monolithic databases, high-frequency trading platforms, and game servers require the raw clock speed of dedicated hardware.&lt;/p&gt;

&lt;p&gt;If your traceroute shows excessive hops or your database IOPS are inconsistent, moving the workload to a &lt;a href="https://www.servers99.com/dedicated-server/asia/singapore/" rel="noopener noreferrer"&gt;Singapore-based dedicated environment&lt;/a&gt; is the logical architectural step for stabilizing APAC performance.&lt;/p&gt;

</description>
      <category>linux</category>
      <category>webdev</category>
      <category>dedicatedservers</category>
    </item>
  </channel>
</rss>
