<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mike Smith</title>
    <description>The latest articles on DEV Community by Mike Smith (@mikesmith).</description>
    <link>https://dev.to/mikesmith</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4009348%2Ff3b9f931-f3e5-49fe-8eba-b3e7d84ec245.png</url>
      <title>DEV Community: Mike Smith</title>
      <link>https://dev.to/mikesmith</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mikesmith"/>
    <language>en</language>
    <item>
      <title>Your Dedicated Server Benchmark Looks Great. Your Production Database Disagrees. Here's Why.</title>
      <dc:creator>Mike Smith</dc:creator>
      <pubDate>Tue, 30 Jun 2026 10:21:17 +0000</pubDate>
      <link>https://dev.to/hostrunway/your-dedicated-server-benchmark-looks-great-your-production-database-disagrees-heres-why-2iei</link>
      <guid>https://dev.to/hostrunway/your-dedicated-server-benchmark-looks-great-your-production-database-disagrees-heres-why-2iei</guid>
      <description>&lt;p&gt;&lt;em&gt;A clean fio or dd benchmark on a brand-new dedicated server is not the same thing as real-world I/O performance under concurrent, mixed-pattern production load. The gap between the two trips up more teams than it should.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Every team that provisions a new dedicated server runs the same ritual at some point. Spin up the box, SSH in, run a quick storage benchmark — fio, dd, iozone, whatever the team's preferred tool is — and watch the numbers come back looking excellent. Sequential write throughput in the gigabytes per second. Sub-millisecond read latency. Everything looks exactly like the vendor's spec sheet promised.&lt;/p&gt;

&lt;p&gt;Then the database goes live, real traffic hits it, and query latency under load doesn't match what the benchmark implied at all. This gap — between synthetic storage benchmarks and real production I/O behavior — is one of the most consistently underestimated factors in &lt;a href="https://www.hostrunway.com/dedicated-servers.php" rel="noopener noreferrer"&gt;dedicated server&lt;/a&gt; performance planning, and it's worth understanding precisely why it happens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Sequential Benchmarks Lie (Without Meaning To)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The default storage benchmark most engineers reach for tests sequential read or write throughput — writing or reading one large, contiguous block of data as fast as possible. This is a genuinely useful number for understanding the theoretical ceiling of your storage hardware. It is also almost never representative of what a production database actually does.&lt;/p&gt;

&lt;p&gt;Real database workloads are dominated by random I/O, not sequential. A transactional database serving concurrent users is constantly reading and writing small, scattered blocks of data across the disk — index lookups, row updates, write-ahead log entries, all interleaved with each other, often from multiple connections simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Also read - &lt;a href="https://www.hostrunway.com/blog/latency-maps-server-location-matters-more-than-you-think/" rel="noopener noreferrer"&gt;Latency Maps: Server Location Matters More Than You Think&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;NVMe storage handles random I/O dramatically better than older spinning disk or even SATA SSD technology, which is exactly why NVMe has become the standard for serious database workloads. But "dramatically better than the alternative" doesn't mean "identical to the sequential benchmark number." A drive that delivers 3.5 GB/s on a sequential write test can show meaningfully different — and more variable — performance under a realistic random I/O pattern with high queue depth and concurrent access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Queue Depth Problem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's a detail that gets glossed over constantly: most default benchmark configurations test at a queue depth of 1 — meaning one I/O operation in flight at a time. This produces the lowest possible latency numbers because there's no contention for the device's internal resources.&lt;/p&gt;

&lt;p&gt;Production databases under real load operate at much higher effective queue depths, with many operations in flight simultaneously from different connections and threads. As queue depth increases, individual operation latency typically increases as well, even on excellent hardware, simply because operations are now waiting behind each other for the underlying device controller's attention.&lt;/p&gt;

&lt;p&gt;A benchmark run at queue depth 1 and a production workload running at effective queue depth 32 or 64 are testing fundamentally different things, even though they're hitting the same physical drive. Teams that benchmark with default settings and then extrapolate those numbers to predict production performance under concurrent load are comparing two different scenarios without realizing it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Actually Changes Under Real Load&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Filesystem and database engine overhead. Raw block-device benchmarks bypass much of the filesystem and database engine logic that real queries pass through. Write-ahead logging, journaling, checksumming, and transaction commit semantics all add overhead that a raw dd test never touches. A database configured for strong durability guarantees (synchronous commits, fsync on every write) will show meaningfully different I/O latency characteristics than a raw storage benchmark, because it's doing genuinely more work per logical operation.&lt;/p&gt;

&lt;p&gt;Resource contention from concurrent processes. A freshly provisioned, otherwise idle dedicated server gives a storage benchmark the entire I/O subsystem to itself. A production server is also running the application layer, background jobs, monitoring agents, log shipping, and often multiple database connections simultaneously — all competing for the same underlying I/O resources. None of this contention exists during a clean benchmark run.&lt;/p&gt;

&lt;p&gt;Thermal and sustained-write behavior. Many NVMe drives exhibit excellent burst performance but show reduced sustained write throughput once onboard cache is exhausted and the controller has to manage thermal load during extended write-heavy periods. A short benchmark run captures burst performance. A database under sustained heavy write load for hours can encounter the drive's actual sustained performance characteristics, which can be meaningfully lower than the headline burst numbers.&lt;/p&gt;

&lt;p&gt;RAID configuration overhead. If the dedicated server uses RAID for redundancy — which most production database deployments should — write operations now involve additional overhead for parity calculation or mirroring, depending on the RAID level chosen. A single-disk benchmark doesn't capture this, and the overhead varies significantly between RAID 1, RAID 5, RAID 10, and software versus hardware RAID implementations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to Benchmark in a Way That Actually Predicts Production Behavior&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Test with realistic access patterns, not just sequential. Configure fio (or your benchmarking tool of choice) to use a mixed random read/write pattern with a block size that matches your actual database's typical I/O size — often 4K, 8K, or 16K depending on the database engine — rather than relying on default large sequential block tests.&lt;/p&gt;

&lt;p&gt;Test at realistic queue depths. Benchmark at multiple queue depths, including ones that approximate your expected concurrent connection count, not just queue depth 1. This gives you a latency curve rather than a single optimistic number, and that curve is far more useful for capacity planning.&lt;/p&gt;

&lt;p&gt;Run sustained tests, not quick bursts. A 30-second benchmark captures burst performance. Run tests for 15-30 minutes or longer to surface any sustained throughput degradation that burst tests miss entirely.&lt;/p&gt;

&lt;p&gt;Benchmark with the actual database engine under realistic concurrent load, not just raw storage tools. Tools like sysbench for database-specific benchmarking, configured with a representative schema and query mix, will surface engine-level overhead that raw storage benchmarks can't capture. This is a meaningfully better predictor of production behavior than any raw I/O tool alone.&lt;/p&gt;

&lt;p&gt;Validate under contention, not isolation. If possible, run your storage benchmark concurrently with a synthetic CPU and memory load that approximates your actual application's resource footprint, to capture how I/O performance holds up when it's not the only thing happening on the box.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Honest Bottom Line&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A clean benchmark number on a freshly provisioned dedicated server tells you the hardware is capable. It does not tell you how that hardware will behave under the specific, messy, concurrent, mixed-pattern reality of your actual production workload. The gap between those two things isn't a sign that the hardware is bad or that the provider misrepresented anything — it's simply a reflection of the fact that synthetic benchmarks and production workloads are testing fundamentally different scenarios.&lt;/p&gt;

&lt;p&gt;Teams that build capacity planning models around synthetic benchmark numbers alone are working from data that systematically overstates real-world performance. The fix isn't distrust of benchmarks — it's running benchmarks that actually resemble what you're going to do with the hardware, and validating those results against real application behavior before you commit to a capacity plan.&lt;/p&gt;

</description>
      <category>dedicatedservers</category>
      <category>sysadmin</category>
      <category>devops</category>
      <category>ai</category>
    </item>
    <item>
      <title>The Cloud Networking Problem Nobody Mentions Until Your Latency Bill Arrives</title>
      <dc:creator>Mike Smith</dc:creator>
      <pubDate>Tue, 30 Jun 2026 09:53:51 +0000</pubDate>
      <link>https://dev.to/mikesmith/the-cloud-networking-problem-nobody-mentions-until-your-latency-bill-arrives-3509</link>
      <guid>https://dev.to/mikesmith/the-cloud-networking-problem-nobody-mentions-until-your-latency-bill-arrives-3509</guid>
      <description>&lt;p&gt;&lt;em&gt;Cross-region cloud architecture looks clean on an architecture diagram. In production, it quietly adds latency that no amount of compute power can fix — and most teams don't notice until users complain.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There's a particular kind of incident report that shows up in engineering postmortems with predictable regularity. The application is fast. The database is fast. The compute is appropriately sized. And yet, users in a specific region are experiencing response times that are inexplicably, persistently slower than everyone else.&lt;/p&gt;

&lt;p&gt;Nine times out of ten, the root cause isn't compute. It's the network path the request actually takes — and it's a problem that architecture diagrams almost never capture accurately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why "It's All in the Cloud" Doesn't Mean "It's All Close Together"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cloud architecture diagrams have a habit of representing services as boxes connected by clean, straight lines. A user hits your application server, which calls your database, which calls your cache layer, which calls a third-party API. On the diagram, these all look adjacent.&lt;/p&gt;

&lt;p&gt;In reality, each of those boxes might be running in a different region, a different availability zone, or worse, a different &lt;a href="https://www.hostrunway.com/cloud-services.php" rel="noopener noreferrer"&gt;cloud provider&lt;/a&gt; entirely — and every hop between them incurs real, physical, unavoidable network latency that's bound by the speed of light and the actual fiber path the data takes.&lt;/p&gt;

&lt;p&gt;A request that bounces from your application server in one region, to a database replica in another, to a caching layer in a third, can easily accumulate 150-300ms of pure network latency before any actual processing happens — even though every individual service is, in isolation, performing perfectly.&lt;/p&gt;

&lt;p&gt;This is the gap between "architecturally correct" and "physically fast." Both can be true about the same system simultaneously, and the second one is the one your users actually experience.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Specific Patterns That Cause This&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Database reads crossing regions silently. A common pattern: an application &lt;a href="https://www.hostrunway.com/dedicated-servers-usa.php" rel="noopener noreferrer"&gt;dedicated server in US-East&lt;/a&gt; queries a database that has a read replica in US-East, but a misconfigured connection string or load balancer routes some percentage of traffic to a replica in EU-West instead. The application works correctly. The data is accurate. But a subset of requests are now incurring a transatlantic round trip that adds 80-100ms to every single query, and because the failure mode isn't an error, it's just slowness, it can persist undetected for months.&lt;/p&gt;

&lt;p&gt;Microservices chatting across availability zones. Modern microservice architectures often involve a single user request triggering a cascade of internal service-to-service calls — sometimes a dozen or more. If those services aren't deliberately co-located within the same availability zone, each inter-service call incurs cross-zone latency that compounds. A chain of 10 services each adding 2-5ms of cross-zone overhead turns into 20-50ms of pure latency tax that has nothing to do with actual computation.&lt;/p&gt;

&lt;p&gt;CDN and origin server mismatch. Content delivery networks are excellent at caching static assets close to users. But when a cache miss occurs — or when the request requires dynamic, uncacheable content — the request has to travel all the way back to the origin server. If your origin is in a single region and your CDN edge nodes span the globe, users far from your origin experience this round trip on every cache miss, which for highly dynamic applications can be the majority of requests.&lt;/p&gt;

&lt;p&gt;Third-party API dependencies in distant regions. Your application might be perfectly architected, but if it depends synchronously on a third-party payment processor, authentication provider, or data enrichment API hosted in a region far from your own infrastructure, that dependency becomes a latency floor you cannot engineer around without changing the dependency itself or its connection path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why This Is Genuinely Hard to Diagnose&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The frustrating part of cross-region latency issues is that they're often invisible in the metrics teams check first.&lt;/p&gt;

&lt;p&gt;CPU utilization looks fine. Memory looks fine. Database query execution time, measured at the database itself, looks fine. The problem only becomes visible when you measure end-to-end latency from the actual user's perspective and trace exactly which network hops contributed to it — which requires distributed tracing instrumentation that many teams don't have in place until after they've already experienced the problem.&lt;/p&gt;

&lt;p&gt;This is precisely why observability practices that capture the full request path — not just individual service performance — have become essential rather than optional for any application with meaningful geographic distribution. Tools like distributed tracing (OpenTelemetry, Jaeger, AWS X-Ray) exist specifically to make these invisible cross-region hops visible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Actually Fixes This&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Co-locate services that talk to each other frequently. The most direct fix is architectural: services that communicate synchronously and frequently should run in the same region, ideally the same availability zone. This sounds obvious, but as organizations grow and teams deploy services independently, regional drift happens gradually and often goes unnoticed until someone audits it deliberately.&lt;/p&gt;

&lt;p&gt;Use read replicas correctly, and verify routing. If you've deployed regional read replicas specifically to reduce latency, audit your connection routing regularly. A misconfigured load balancer or an application that doesn't properly use region-aware connection strings can silently defeat the entire purpose of having replicas in the first place.&lt;/p&gt;

&lt;p&gt;Push computation to the edge where it makes sense. For workloads where round-tripping to a centralized origin is unavoidable for every request, edge computing — running logic at the CDN edge rather than the origin — can eliminate much of this latency for the subset of operations that don't require centralized state.&lt;/p&gt;

&lt;p&gt;Choose infrastructure providers with genuine geographic coverage that matches your user base. This sounds like an obvious point, but it's frequently under-prioritized during initial infrastructure selection. If a meaningful portion of your user base is in a region where your provider has thin or absent regional presence, no amount of clever architecture fully compensates for the physical distance between your infrastructure and your users.&lt;/p&gt;

&lt;p&gt;Measure from the user's actual vantage point, not just from your own data center. Synthetic monitoring tools that test from multiple global locations — not just from your own infrastructure's region — are the only reliable way to catch this class of problem before users report it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Broader Lesson&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Latency problems caused by network topology rarely show up as a single dramatic incident. They show up as a slow, persistent erosion of user experience in specific regions that's easy to attribute to "the app being a bit slow sometimes" rather than correctly diagnosing as a structural, fixable architecture issue.&lt;/p&gt;

&lt;p&gt;The fix isn't more compute. It's rarely more compute. It's understanding the actual physical and logical path your data takes, region by region, hop by hop — and being deliberate about which services genuinely need to be geographically close to each other and which ones can tolerate distance.&lt;/p&gt;

</description>
      <category>cloudcomputing</category>
      <category>distributedsystems</category>
      <category>devops</category>
      <category>networklatency</category>
    </item>
  </channel>
</rss>
