<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aleh Karachun</title>
    <description>The latest articles on DEV Community by Aleh Karachun (@aleh_karachun).</description>
    <link>https://dev.to/aleh_karachun</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3763585%2F1b3180e7-f3f9-476d-a692-ba897d9a4687.png</url>
      <title>DEV Community: Aleh Karachun</title>
      <link>https://dev.to/aleh_karachun</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aleh_karachun"/>
    <language>en</language>
    <item>
      <title>.NET 10 Performance: The O(n^2) String Trap and the Zero-Allocation Quest</title>
      <dc:creator>Aleh Karachun</dc:creator>
      <pubDate>Sat, 21 Mar 2026 13:08:00 +0000</pubDate>
      <link>https://dev.to/aleh_karachun/net-10-performance-the-on2-string-trap-and-the-zero-allocation-quest-3cjh</link>
      <guid>https://dev.to/aleh_karachun/net-10-performance-the-on2-string-trap-and-the-zero-allocation-quest-3cjh</guid>
      <description>&lt;p&gt;"Premature optimization is the root of all evil." We’ve all heard it. But in the world of high-load cloud systems and serverless environments, there is another truth: "Ignoring scalability is the root of a massive AWS bill."&lt;/p&gt;

&lt;p&gt;Today, we are doing a deep dive into .NET 10 string manipulation. We’ll explore how a simple &lt;code&gt;+=&lt;/code&gt; can turn your performance into a disaster and how to achieve Zero-Allocation using modern C# features.&lt;/p&gt;




&lt;h3&gt;
  
  
  1. The Big Picture: Scaling is a Cliff
&lt;/h3&gt;

&lt;p&gt;In computer science, &lt;em&gt;O(n)&lt;/em&gt; vs &lt;em&gt;O(n^2)&lt;/em&gt; is often treated as academic theory. But when you visualize it, theory becomes a cold, hard reality. We compared three contenders:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Classic Concatenation:&lt;/strong&gt; The quadratic &lt;em&gt;O(n^2)&lt;/em&gt; path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;StringBuilder:&lt;/strong&gt; The standard heap-allocated buffer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ValueStringBuilder (Optimized):&lt;/strong&gt; A &lt;code&gt;ref struct&lt;/code&gt; living entirely on the stack.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs9tmyi6iuteavf9iaxh1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs9tmyi6iuteavf9iaxh1.png" alt="This chart visualizes " width="800" height="300"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Figure 1.&lt;/strong&gt; Scaling performance overview.&lt;/p&gt;

&lt;p&gt;If the log scale feels too abstract, look at the linear reality at &lt;em&gt;N=10,000&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8r4v3fpxtly2id6crd1s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8r4v3fpxtly2id6crd1s.png" alt="This represents " width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Figure 2.&lt;/strong&gt; Linear comparison at maximum scale.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. The Micro-Scale Paradox (&lt;em&gt;N=10&lt;/em&gt;)
&lt;/h3&gt;

&lt;p&gt;Engineering is about choosing the right tool for the right job. On a tiny scale &lt;em&gt;(N=10)&lt;/em&gt;, our "super-optimized" approach actually loses.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;UseStringBuilder: 32.30 ns&lt;/li&gt;
&lt;li&gt;UseStringConcatenation: 52.95 ns&lt;/li&gt;
&lt;li&gt;UseValueStringBuilder_Optimized: ~107 ns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Paradox Explained:&lt;/strong&gt;&lt;br&gt;
Why does the "optimized" method lose here? It comes down to the "Setup Tax." Initializing a &lt;code&gt;ref struct&lt;/code&gt; and preparing a &lt;code&gt;stackalloc&lt;/code&gt; buffer takes more time than the actual string processing when &lt;em&gt;N&lt;/em&gt; is small.&lt;/p&gt;

&lt;p&gt;Meanwhile, &lt;strong&gt;StringBuilder&lt;/strong&gt; in .NET 10 has been heavily tuned for small-scale operations. It manages to avoid the heavy allocations of &lt;code&gt;+=&lt;/code&gt; while bypassing the complex initialization required by our manual stack-based approach. At this scale, the runtime's built-in optimizations are simply more efficient than manual memory management.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdhdyg4qwqd2ea0ro6jgm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdhdyg4qwqd2ea0ro6jgm.png" alt="This is the " width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Figure 3.&lt;/strong&gt; Execution time distribution for &lt;em&gt;N=10&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt; Don't over-engineer for the small stuff. For small-scale formatting or log messages, standard library tools provide the best balance of performance and maintainability.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. The "GC Fingerprint" (&lt;em&gt;N=10,000&lt;/em&gt;)
&lt;/h3&gt;

&lt;p&gt;When we scale to 10,000 operations, the masks come off. String concatenation at this scale allocates 379.4 MB of garbage. This leads to what is called the "Camel Effect" on our density plots.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj5e6zkta6shxlamnsrfe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj5e6zkta6shxlamnsrfe.png" alt="This is " width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Figure 4.&lt;/strong&gt; Impact of Garbage Collection on latency.&lt;/p&gt;

&lt;p&gt;Now, compare this to the optimized Zero-Allocation method:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbfqdw5p54ybjq9g3btyo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbfqdw5p54ybjq9g3btyo.png" alt="This is " width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Figure 5.&lt;/strong&gt; Predictability of zero-allocation execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note on hardware physics:&lt;/strong&gt; Even in Figure 5, where Zero-Allocation is achieved, a microscopic "tail" of jitter is still visible on the right. This isn't the Garbage Collector; it is the "physics of the hardware". OS interrupts, CPU context switching, and cache misses introduce these unavoidable micro-fluctuations. However, compared to the "Camel Effect" of GC pauses, this is just statistical noise, confirming the almost perfect predictability of our approach.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Engineering for Zero-Allocation
&lt;/h3&gt;

&lt;p&gt;How did we achieve this? By staying off the Managed Heap entirely. We combined three pillars of modern .NET:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;ref struct&lt;/code&gt;: Ensures our builder never escapes to the heap.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;stackalloc char[256]&lt;/code&gt;: Allocates the initial buffer directly on the stack.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ISpanFormattable&lt;/code&gt;: Writes data directly into memory via &lt;code&gt;TryFormat&lt;/code&gt;, avoiding intermediate &lt;code&gt;ToString()&lt;/code&gt; allocations.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ReadOnlySpan&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Transaction&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// 1. Initial buffer on the stack&lt;/span&gt;
    &lt;span class="n"&gt;Span&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;buffer&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;stackalloc&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;512&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;vsb&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;ValueStringBuilder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// 2. Zero-allocation formatting&lt;/span&gt;
        &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Amount&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;TryFormat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vsb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AppendSpan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;out&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;written&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// 3. Final result (the only allocation)&lt;/span&gt;
    &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vsb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; 
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Conclusion: Be Pragmatic
&lt;/h3&gt;

&lt;p&gt;The benchmark results demonstrate that the optimal string manipulation strategy depends entirely on the expected data volume and system requirements.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Small scale (&lt;em&gt;N &amp;lt; 50&lt;/em&gt;):&lt;/strong&gt; &lt;strong&gt;StringBuilder&lt;/strong&gt; is technically the winner, offering 40% better performance and 50% fewer allocations than simple concatenation. However, &lt;strong&gt;concatenation&lt;/strong&gt; remains an acceptable choice for one-off tasks where code readability is the top priority.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Medium scale (&lt;em&gt;N &amp;lt; 1000&lt;/em&gt;):&lt;/strong&gt; &lt;strong&gt;StringBuilder&lt;/strong&gt; remains the standard efficient approach for general-purpose applications, providing linear scaling with manageable heap pressure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-performance / High-load:&lt;/strong&gt; Implementation of &lt;strong&gt;Zero-Allocation&lt;/strong&gt; patterns (e.g., &lt;code&gt;ValueStringBuilder&lt;/code&gt;) is critical for systems with strict latency requirements. This approach eliminates bimodal distribution caused by Garbage Collection, ensuring deterministic execution time and lower memory throughput.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Final decision-making should balance &lt;strong&gt;code complexity&lt;/strong&gt; against &lt;strong&gt;predictability&lt;/strong&gt;. For high-concurrency environments like AWS Lambda, bypassing the managed heap is a primary strategy for cost and latency optimization.&lt;/p&gt;

&lt;p&gt;The full source code and raw BenchmarkDotNet data are available on my GitHub:&lt;br&gt;
👉 &lt;a href="https://github.com/olegKarachun/dotnet-string-optimization-benchmarks" rel="noopener noreferrer"&gt;https://github.com/olegKarachun/dotnet-string-optimization-benchmarks&lt;/a&gt;&lt;/p&gt;

</description>
      <category>dotnet</category>
      <category>performance</category>
      <category>programming</category>
      <category>aws</category>
    </item>
    <item>
      <title>Battle of the Titans (Part 1): The Ultimate Go Lambda on AWS Graviton</title>
      <dc:creator>Aleh Karachun</dc:creator>
      <pubDate>Thu, 19 Mar 2026 17:04:03 +0000</pubDate>
      <link>https://dev.to/aleh_karachun/battle-of-the-titans-part-1-the-ultimate-go-lambda-on-aws-graviton-2632</link>
      <guid>https://dev.to/aleh_karachun/battle-of-the-titans-part-1-the-ultimate-go-lambda-on-aws-graviton-2632</guid>
      <description>&lt;p&gt;Hi everyone! Welcome to the first part of my series exploring AWS Lambda performance. My goal is to compare Go and .NET Native AOT in a realistic serverless environment.&lt;/p&gt;

&lt;p&gt;To make this a fair benchmark, we aren't just deploying a "Hello World" function. Our Lambda simulates a standard combat task: it deserializes a JSON payload of financial transactions, filters them, calculates the total amount, and computes a SHA-256 hash of the IDs to generate a signature (simulating CPU load).&lt;/p&gt;

&lt;p&gt;Today, we are focusing on setting up and optimizing the Go contender on &lt;strong&gt;ARM64 (Graviton)&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  1. The Infrastructure (AWS SAM)
&lt;/h3&gt;

&lt;p&gt;We use AWS SAM (Serverless Application Model) to define our infrastructure. It allows us to describe resources declaratively and generates the underlying CloudFormation template.&lt;/p&gt;

&lt;p&gt;Here is the core of our &lt;code&gt;template.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;CodeUri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bin/&lt;/span&gt;
&lt;span class="na"&gt;Handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bootstrap&lt;/span&gt;
&lt;span class="na"&gt;Runtime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;provided.al2023&lt;/span&gt;
&lt;span class="na"&gt;Architectures&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;arm64&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Key takeaways
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Runtime: provided.al2023&lt;/code&gt;: Amazon Linux 2023 is currently the recommended minimalist OS for compiled languages in AWS. It boots significantly faster than the legacy &lt;code&gt;go1.x&lt;/code&gt; runtime.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Architectures: arm64&lt;/code&gt;: Targeting AWS Graviton processors. They use a RISC architecture that typically provides around 20% better price/performance for serverless workloads compared to &lt;code&gt;x86_64&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Handler: bootstrap&lt;/code&gt;: When using custom runtimes, AWS Lambda expects the executable binary inside the deployment package to be named exactly &lt;code&gt;bootstrap&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  2. Compiling for Lambda
&lt;/h3&gt;

&lt;p&gt;A standard &lt;code&gt;go build&lt;/code&gt; works, but we can optimize it further for the Lambda environment. Here is the command we use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;GOOS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;linux &lt;span class="nv"&gt;GOARCH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;arm64 go build &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-tags&lt;/span&gt; lambda.norpc &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-ldflags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"-s -w"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-o&lt;/span&gt; bin/bootstrap main.go
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Key takeaways
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;GOOS=linux GOARCH=arm64&lt;/code&gt;: This enables cross-compilation, allowing us to build a Linux ARM64 binary directly from our local machine (even if it's x86).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;-tags lambda.norpc&lt;/code&gt;: The &lt;code&gt;al2023&lt;/code&gt; runtime communicates with the Lambda service via an internal HTTP API. This tag tells the compiler to drop the legacy RPC compatibility code from the &lt;code&gt;aws-lambda-go&lt;/code&gt; library, reducing the binary size and initialization time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;-ldflags="-s -w"&lt;/code&gt;: These linker flags strip the symbol table and debug information, resulting in a leaner binary that loads into memory faster.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  3. Local Testing and the "Error 255"
&lt;/h3&gt;

&lt;p&gt;If you develop on an x86 (Intel/AMD) machine and try to test this locally using &lt;code&gt;sam local invoke&lt;/code&gt;, you will likely hit a &lt;strong&gt;Fatal Error 255&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This happens because the Docker container spins up an ARM64 environment, but your host CPU cannot natively execute ARM instructions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; We need a translator. Running the &lt;code&gt;multiarch/qemu-user-static&lt;/code&gt; Docker image solves this. QEMU intercepts the ARM commands and translates them into x86 instructions for your host CPU on the fly, allowing you to seamlessly test the production binary locally.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Anatomy of a Cold Start
&lt;/h3&gt;

&lt;p&gt;When you run &lt;code&gt;sam deploy --guided&lt;/code&gt;, AWS packages the binary, uploads it to S3, and updates the CloudFormation stack. But the most interesting part happens on the first invocation.&lt;/p&gt;

&lt;p&gt;When we triggered the Lambda, CloudWatch reported an &lt;strong&gt;Init Duration of ~60 ms.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;During these 54 milliseconds, AWS performed the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Allocated a Graviton-based server.&lt;/li&gt;
&lt;li&gt;Provisioned an isolated Firecracker microVM.&lt;/li&gt;
&lt;li&gt;Downloaded the deployment zip from S3 and extracted it.&lt;/li&gt;
&lt;li&gt;Booted the &lt;code&gt;provided.al2023&lt;/code&gt; OS.&lt;/li&gt;
&lt;li&gt;Loaded our &lt;code&gt;bootstrap&lt;/code&gt; binary into memory.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once the environment was warm, subsequent invocations (Warm Starts) took roughly &lt;strong&gt;2 ms&lt;/strong&gt; of compute time with a memory footprint of about &lt;strong&gt;19 MB.&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Go on ARM64 with the AL2023 runtime provides an excellent baseline. With extremely low memory consumption and cold starts consistently under 60ms, it is a highly efficient choice for serverless APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s Next?
&lt;/h3&gt;

&lt;p&gt;In &lt;strong&gt;Part 2&lt;/strong&gt;, we will set up our challenger: &lt;strong&gt;.NET 10 Native AOT&lt;/strong&gt;. We will explore how to configure the C# project with Zero-Allocation techniques and Source Generators to see if it can match or beat Go's numbers.&lt;/p&gt;

&lt;p&gt;The full source code for this setup is available in my GitHub repository:&lt;br&gt;
👉 &lt;a href="https://github.com/olegKarachun/aws-lambda-go-graviton" rel="noopener noreferrer"&gt;https://github.com/olegKarachun/aws-lambda-go-graviton&lt;/a&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>aws</category>
      <category>serverless</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
