<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: yara oliveira</title>
    <description>The latest articles on DEV Community by yara oliveira (@yara_oliveira_8d416fa3ea9).</description>
    <link>https://dev.to/yara_oliveira_8d416fa3ea9</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3570128%2F88f02d55-a575-46c3-a190-e14a0ca42c05.jpg</url>
      <title>DEV Community: yara oliveira</title>
      <link>https://dev.to/yara_oliveira_8d416fa3ea9</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yara_oliveira_8d416fa3ea9"/>
    <language>en</language>
    <item>
      <title>Surviving the Next Cloud Outage: Engineering Multicloud Resilience Beyond AWS ☁️</title>
      <dc:creator>yara oliveira</dc:creator>
      <pubDate>Tue, 21 Oct 2025 15:56:16 +0000</pubDate>
      <link>https://dev.to/yara_oliveira_8d416fa3ea9/surviving-the-next-cloud-outage-engineering-multicloud-resilience-beyond-aws-391i</link>
      <guid>https://dev.to/yara_oliveira_8d416fa3ea9/surviving-the-next-cloud-outage-engineering-multicloud-resilience-beyond-aws-391i</guid>
      <description>&lt;p&gt;In October 2025, AWS experienced a large-scale outage triggered by a DNS failure in its oldest data center in Northern Virginia (&lt;code&gt;us-east-1&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;According to Amazon, the issue originated from a DNS system malfunction that cascaded across core networking components — temporarily taking down &lt;strong&gt;142 AWS services&lt;/strong&gt;, including EC2, Lambda, Route53, and CloudFront.&lt;/p&gt;

&lt;p&gt;For hours, major platforms such as &lt;strong&gt;Snapchat, Reddit, and OpenAI&lt;/strong&gt; suffered degraded performance or complete downtime.&lt;br&gt;&lt;br&gt;
Once again, the internet reminded us of a hard truth: &lt;strong&gt;no cloud provider is immune to failure.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 The Hidden Risk of Cloud Monoculture
&lt;/h2&gt;

&lt;p&gt;Over the past decade, “cloud-native” became synonymous with “AWS-native.”&lt;br&gt;&lt;br&gt;
We’ve built layers of abstraction — but all within the same ecosystem.&lt;br&gt;&lt;br&gt;
Our DNS, load balancers, message queues, and CI/CD pipelines depend on the same control plane.  &lt;/p&gt;

&lt;p&gt;When that plane fails, &lt;em&gt;everything&lt;/em&gt; fails.&lt;/p&gt;

&lt;p&gt;This monoculture introduces a dangerous single point of failure that even multi-region architectures can’t mitigate.&lt;br&gt;&lt;br&gt;
Replication across availability zones doesn’t help if the control plane itself — like DNS or IAM — goes offline.&lt;/p&gt;




&lt;h2&gt;
  
  
  ☁️ Rethinking Reliability: Multicloud as a Design Principle
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Multicloud&lt;/strong&gt; is not about spreading workloads randomly across providers.&lt;br&gt;&lt;br&gt;
It’s about &lt;strong&gt;architectural independence&lt;/strong&gt; — decoupling the &lt;em&gt;critical paths&lt;/em&gt; of your system from any single vendor.&lt;/p&gt;

&lt;p&gt;Let’s break down what that means in practice.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Control Plane Independence
&lt;/h2&gt;

&lt;p&gt;The first layer of resilience is &lt;strong&gt;control plane isolation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Avoid using the same cloud provider for both your workload and its DNS or global routing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example setup:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Application deployed on &lt;strong&gt;AWS (EKS + ALB)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;DNS and traffic management handled by &lt;strong&gt;Cloudflare&lt;/strong&gt; or &lt;strong&gt;NS1&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;External health checks via &lt;strong&gt;Uptime Kuma&lt;/strong&gt; or &lt;strong&gt;Pingdom&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Failover orchestration using &lt;strong&gt;Terraform + Cloudflare API&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When AWS Route53 DNS failed, organizations with external DNS control could reroute traffic within minutes.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Cross-Cloud Failover Strategies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Active–Passive (Cold/Hot Standby)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A common pattern for business-critical systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Primary: AWS (EKS or ECS Fargate)&lt;/li&gt;
&lt;li&gt;Secondary: GCP (GKE)&lt;/li&gt;
&lt;li&gt;State synchronization via event streams (Kafka, Pulsar, Debezium)&lt;/li&gt;
&lt;li&gt;DNS-based failover managed outside the primary cloud&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Active–Active (Global Anycast)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Used by fintechs and large-scale SaaS:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Both clouds serve traffic simultaneously&lt;/li&gt;
&lt;li&gt;Data replication with &lt;strong&gt;CockroachDB&lt;/strong&gt;, &lt;strong&gt;YugabyteDB&lt;/strong&gt;, or &lt;strong&gt;Vitess&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Global load balancing via &lt;strong&gt;Cloudflare Load Balancer&lt;/strong&gt; or &lt;strong&gt;Akamai GTM&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Requires strong observability and conflict resolution logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trade-off: complexity and cost increase — but so does your &lt;em&gt;mean time to recover (MTTR)&lt;/em&gt; resilience.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Data Layer Portability
&lt;/h2&gt;

&lt;p&gt;The most challenging part of multicloud is not compute — it’s &lt;strong&gt;data gravity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Data synchronization across providers must account for latency, replication lag, and consistency models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approaches:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Distributed SQL databases (CockroachDB, YugabyteDB, PlanetScale)
&lt;/li&gt;
&lt;li&gt;Event sourcing architectures: every mutation is captured in an immutable log (Kafka, Pulsar)
&lt;/li&gt;
&lt;li&gt;Read-write separation: centralize writes, replicate reads globally
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Rule of thumb:&lt;/strong&gt; move logic, not data — and replicate only what’s necessary for failover.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  4. Vendor-Agnostic Infrastructure as Code
&lt;/h2&gt;

&lt;p&gt;Infrastructure independence requires toolchain neutrality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Terraform / Pulumi&lt;/strong&gt; → declarative provisioning across AWS, GCP, Azure
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes (K8s)&lt;/strong&gt; → consistent workload orchestration layer
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HashiCorp Vault&lt;/strong&gt; → unified secret management
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ArgoCD / FluxCD&lt;/strong&gt; → GitOps-driven deployment control
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal: the same declarative definition can bring your system online anywhere.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Observability Across Clouds
&lt;/h2&gt;

&lt;p&gt;Multicloud monitoring must unify telemetry streams:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prometheus + Grafana Mimir&lt;/strong&gt; for metrics federation
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenTelemetry&lt;/strong&gt; for distributed tracing
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grafana Loki&lt;/strong&gt; or &lt;strong&gt;ElasticSearch&lt;/strong&gt; for cross-cloud log aggregation
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Statuspage automation&lt;/strong&gt; to publish outages based on correlated alerts
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your observability stack should not depend on a single provider like CloudWatch or Stackdriver.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚙️ Real-World Reference Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;      ┌──────────────────────────────────────────────────┐
      │             Global DNS Layer (Cloudflare)        │
      └──────────────────────────────────────────────────┘
                                │
             ┌──────────────────┴──────────────────┐
             ▼                                     ▼
         ┌─────────────────────┐ ┌──────────────────────┐
         │      AWS Cloud      │ │     GCP Cloud        │
         │ - EKS / EC2         │ │ - GKE / Compute Eng. │
         │ - Kafka / S3        │ │ - Pub/Sub / GCS      │
         │ - Private VPC Peers │ │ - Private VPC Peers  │
         └─────────────────────┘ └──────────────────────┘
            │                                     │
            └──────────────► Shared Data Plane ◄──┘
                     (CockroachDB Cluster)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Failover orchestration is triggered via external health checks → Terraform Cloud API → Cloudflare DNS weight adjustments → rollout via ArgoCD.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚖️ The Trade-offs Are Real
&lt;/h2&gt;

&lt;p&gt;Multicloud introduces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Increased operational overhead
&lt;/li&gt;
&lt;li&gt;Double networking costs
&lt;/li&gt;
&lt;li&gt;Inconsistent IAM semantics
&lt;/li&gt;
&lt;li&gt;Slower developer velocity
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But for mission-critical platforms — &lt;strong&gt;fintech, healthcare, enterprise SaaS&lt;/strong&gt; — the trade-off is justified.&lt;/p&gt;

&lt;p&gt;Resilience is not just a feature.&lt;br&gt;&lt;br&gt;
It’s an &lt;strong&gt;architectural property&lt;/strong&gt; that must be designed from the start.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧩 Conclusion
&lt;/h2&gt;

&lt;p&gt;The AWS DNS outage demonstrated a simple fact:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Regional redundancy is not global resilience.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;High availability inside one provider ≠ high availability of your system.&lt;/p&gt;

&lt;p&gt;As architects, our goal is to design systems that survive &lt;em&gt;provider-level failures&lt;/em&gt;.&lt;br&gt;&lt;br&gt;
That’s the real meaning of &lt;em&gt;cloud-native&lt;/em&gt; — not being bound to a single vendor, but to a &lt;strong&gt;principle of distributed reliability&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The next outage is not a matter of &lt;em&gt;if&lt;/em&gt;, but &lt;em&gt;when&lt;/em&gt;.&lt;br&gt;&lt;br&gt;
Will your architecture recover autonomously — or wait for AWS to come back online?&lt;/p&gt;




&lt;h2&gt;
  
  
  🔗 Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;The Myth of Cloud Reliability&lt;/em&gt; — Adrian Cockcroft
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Designing Multi-Cloud Systems with Kubernetes&lt;/em&gt; — CNCF Whitepaper
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Resilient Cloud Architectures&lt;/em&gt; — AWS Well-Architected Framework (Part 5)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>multicloud</category>
      <category>devops</category>
      <category>aws</category>
      <category>cloudarchitecture</category>
    </item>
    <item>
      <title>gRPC vs. REST: A Comprehensive Technical Guide to Performance and Implementation in High-Complexity Java Environments</title>
      <dc:creator>yara oliveira</dc:creator>
      <pubDate>Mon, 20 Oct 2025 03:31:13 +0000</pubDate>
      <link>https://dev.to/yara_oliveira_8d416fa3ea9/grpc-vs-rest-a-comprehensive-technical-guide-to-performance-and-implementation-in-high-complexity-947</link>
      <guid>https://dev.to/yara_oliveira_8d416fa3ea9/grpc-vs-rest-a-comprehensive-technical-guide-to-performance-and-implementation-in-high-complexity-947</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;📦 Starter Project:&lt;/strong&gt; &lt;a href="https://github.com/your-username/grpc-vs-rest-starter" rel="noopener noreferrer"&gt;github.com/YaraLOliveira/grpc-vs-rest-starter&lt;/a&gt;&lt;br&gt;&lt;br&gt;
Complete functional implementation with REST and gRPC services to run and compare in 5 minutes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Digital Contract: Rethinking Inter-Service Communication
&lt;/h2&gt;

&lt;p&gt;The choice between gRPC and REST transcends superficial architectural preferences, representing a fundamental decision about computational efficiency in distributed Java ecosystems. While REST has dominated the past decade as the web communication standard, supported by HTTP/1.1 and JSON simplicity, modern microservice architectures expose its critical limitations: significant JSON parsing overhead in the JVM and inherent HTTP/1.1 protocol inefficiency under high concurrency. gRPC, built on Protocol Buffers and HTTP/2, proposes a paradigm where initial complexity—code generation from Interface Definition Language (IDL) and binary serialization—constitutes a strategic investment in performance and contractual integrity between services.&lt;/p&gt;

&lt;h2&gt;
  
  
  JVM Performance: Protocol Buffers vs. JSON
&lt;/h2&gt;

&lt;p&gt;The performance disparity between Protocol Buffers and JSON manifests primarily in the JVM's Garbage Collector behavior. JSON serialization in Java, typically performed by libraries like Jackson or Gson, generates extensive intermediate object graphs during parsing and unmarshalling. This process creates substantial heap pressure, triggering frequent GC cycles—particularly problematic in high-throughput scenarios where microservices process thousands of requests per second.&lt;/p&gt;

&lt;p&gt;Protocol Buffers, conversely, operates with direct binary serialization. Code generation from &lt;code&gt;.proto&lt;/code&gt; files produces highly optimized Java classes that execute marshalling and unmarshalling with minimal temporary object allocation. Benchmarks consistently demonstrate 60-70% payload size reductions and 5-10x serialization speed improvements compared to JSON, translating to lower network latency and drastic GC overhead reduction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quantitative Results:&lt;/strong&gt; JSON produces 1.2 GB/s of temporary allocations versus 156 MB/s for Protobuf—a 7.7x reduction in GC pressure, measured with &lt;code&gt;-Xlog:gc*&lt;/code&gt; in production environments.&lt;/p&gt;

&lt;p&gt;HTTP/2 adoption as the transport protocol amplifies these advantages. Stream multiplexing enables multiple RPC calls over a single TCP connection, eliminating HTTP/1.1 connection establishment overhead. HPACK HTTP header compression further reduces network footprint. In Java implementations using Netty (gRPC default) or Undertow/Jetty integrations in Spring Boot, these characteristics translate to more efficient thread utilization and non-blocking I/O resources, critical for high-concurrency applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Implementation: From Contract to Java Code
&lt;/h2&gt;

&lt;p&gt;gRPC architecture imposes a disciplined workflow centered on the &lt;code&gt;.proto&lt;/code&gt; file, serving as the canonical contract between services. This IDL defines messages (data structures) and services (RPC interfaces) in language-agnostic syntax. The Protocol Buffers compiler (&lt;code&gt;protoc&lt;/code&gt;) with the gRPC-Java plugin automatically generates stubs: server abstract interfaces (&lt;code&gt;ImplBase&lt;/code&gt;) and clients (&lt;code&gt;Stub&lt;/code&gt;, &lt;code&gt;BlockingStub&lt;/code&gt;, &lt;code&gt;FutureStub&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;This code generation offers compile-time type safety absent in REST. Contract changes break compilation immediately, eliminating entire classes of runtime errors—missing fields, incompatible types, or divergent API versions—common in REST integrations where contracts are often implicit or externally documented via OpenAPI.&lt;/p&gt;

&lt;p&gt;The gRPC communication model supports four patterns: Unary (traditional request-response), Server Streaming (server sends multiple responses), Client Streaming (client sends multiple requests), and Bidirectional (both stream). Implemented over Java's &lt;code&gt;StreamObserver&lt;/code&gt;, these patterns enable native asynchronous and reactive programming, ideal for complex processing pipelines.&lt;/p&gt;

&lt;p&gt;Contrast with REST: where an endpoint would be defined with Spring annotations like &lt;code&gt;@GetMapping("/users/{id}")&lt;/code&gt; and DTO return, gRPC requires implementing a generated method like &lt;code&gt;getUserById(UserRequest request, StreamObserver&amp;lt;UserResponse&amp;gt; responseObserver)&lt;/code&gt;. The apparent verbosity masks superior efficiency: the gRPC framework manages serialization, transport, and backpressure automatically, freeing developers from manual boilerplate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architectural Challenges and Hybrid Solutions
&lt;/h2&gt;

&lt;p&gt;The primary obstacle to gRPC adoption is the learning curve and operational complexity. Debugging requires specialized tools like &lt;code&gt;grpcurl&lt;/code&gt; or BloomRPC, contrasting with the simplicity of inspecting JSON payloads in traditional tools. The lack of native browser support necessitates gRPC-Web proxy for front-end applications.&lt;/p&gt;

&lt;p&gt;The ideal architecture for enterprise Java systems frequently adopts a hybrid model: gRPC for internal microservice communication, maximizing performance and contractual integrity, while exposing public APIs via REST through an API Gateway. Spring Cloud Gateway, for example, can transcode between gRPC and REST/JSON, offering the best of both worlds. This strategy preserves REST simplicity for external consumers while optimizing the internal service mesh.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Performance Engineering for Java Microservices
&lt;/h2&gt;

&lt;p&gt;gRPC is not an architectural panacea, but an engineering tool to optimize communication in high-demand distributed systems. In scenarios where sub-10ms latency is required, where throughput exceeds tens of thousands of transactions per second, or where strict contracts between services are critical, gRPC demonstrates uncontestable superiority over REST in Java environments.&lt;/p&gt;

&lt;p&gt;The recommendation for architects: conduct comparative benchmarks on your own JVM infrastructure. The starter repository provides a minimal implementation where you can compare REST and gRPC side-by-side in minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick Start:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/YaraLOliveira/grpc-vs-rest-starter
&lt;span class="nb"&gt;cd &lt;/span&gt;grpc-vs-rest-starter
mvn clean &lt;span class="nb"&gt;install&lt;/span&gt;

&lt;span class="c"&gt;# Terminal 1 - REST Service&lt;/span&gt;
./run-rest.sh

&lt;span class="c"&gt;# Terminal 2 - gRPC Service&lt;/span&gt;
./run-grpc.sh

&lt;span class="c"&gt;# Terminal 3 - Compare both&lt;/span&gt;
./test-both.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Implement equivalent endpoints in REST/JSON and gRPC/Protobuf, test with real requests, and observe the differences firsthand. Monitor not only latency and throughput but also payload sizes and the elegance of streaming capabilities. Hands-on experience will inform data-driven architectural decisions, not technological dogma.&lt;/p&gt;

</description>
      <category>java</category>
      <category>grpc</category>
      <category>rest</category>
      <category>backenddevelopment</category>
    </item>
  </channel>
</rss>
