DEV Community: João Vitor Nascimento Mendonca

Beyond Auto-scaling: Engineering Cost-Efficiency into Cloud-Native Architectures

João Vitor Nascimento Mendonca — Sat, 11 Apr 2026 16:46:53 +0000

published: João Vitor Nascimento de mendonça
description: How to move from "scaling at all costs" to "resource stewardship" using VPA, Graviton, and Spot Instances.
tags: aws, kubernetes, finops, architecture
cover_image: https://dev-to-uploads.s3.amazonaws.com/uploads/articles/example-cloud-cost.png

series: Modern Infrastructure Series

Part 2: The Post BodyPaste this into the main content area.1. The Fallacy of "Infinite" CloudIn the early days of Cloud adoption, the mantra was "scale at all costs." In 2026, the industry has shifted. The new gold standard is Resource Stewardship.I recently audited a microservices environment where the cloud bill was growing faster than the user base. The culprit wasn't traffic—it was over-provisioning and a lack of cost-awareness in the development cycle. Here is how we re-engineered the platform for efficiency.2. Data-Driven Right-SizingWe stopped relying on "gut feelings" for Kubernetes resource requests. Instead of guessing how much memory a service needed, we implemented Vertical Pod Autoscaler (VPA) in recommendation mode.The Insight: We discovered that 60% of our services were using less than 20% of their allocated CPU.The Action: We automated the adjustment of requests and limits to match real-world $P95$ usage.The Result: A 35% reduction in wasted cluster capacity overnight.3. Embracing Spot Instances and ARM64We re-architected non-critical workloads to run on Amazon EC2 Spot Instances paired with AWS Graviton (ARM64) processors.Handling InterruptionsTo use Spot instances safely, we implemented graceful shutdown handlers to catch the 2-minute interruption notice:Go// Simplified logic for Spot Interruption handling
func handleTermination() {
termChan := make(chan os.Signal, 1)
signal.Notify(termChan, syscall.SIGTERM)

<-termChan // Interruption signal received
log.Println("Spot instance terminating. Shifting state to Redis...")

// Logic to drain connections and save state
cache.SaveWorkerState(currentJobs)
os.Exit(0)

}
Outcome: This shift resulted in a 60% cost reduction for our CI/CD pipelines and data processing workers.4. "FinOps as Code"Engineering excellence is no longer just about uptime; it's about financial visibility. We integrated cost-estimation tools (like Infracost) directly into our Terraform pipelines.Every Pull Request now displays an estimated monthly cost change. If a developer tries to provision a massively oversized RDS instance, the system flags it during the code review phase, not when the bill arrives.5. ConclusionA great architect builds systems that are as lean as they are powerful. By treating cost as a first-class metric—right alongside latency and availability—we build more sustainable technology.How is your team handling cloud costs this year? Let’s discuss below!

Beyond Logs: Implementing Tracing and Golden Signals for Distributed Systems

João Vitor Nascimento Mendonca — Sat, 11 Apr 2026 16:41:36 +0000

By: João Vitor Nascimento De Mendonça

The Observability Gap
In a microservices environment, having logs is not enough. When a request fails, you need to know exactly where the bottleneck is. I recently moved a legacy monitoring setup to an OpenTelemetry-based tracing system to solve "hidden" latencies.
The Four Golden Signals
We focused on the Google SRE Golden Signals:

Latency: Time it takes to service a request.

Traffic: Demand placed on the system.

Errors: The rate of requests that fail.

Saturation: How "full" your service is.

Implementation: Distributed Tracing
By injecting Trace IDs across services, we could visualize the entire request lifecycle. We discovered that a specific middleware was adding 120ms of unnecessary overhead to every auth request—a find that logs alone couldn't pinpoint.
Conclusion
Logs tell you what happened; traces tell you where and why. If you aren't using distributed tracing in 2026, you are flying blind.

Post 3: Escalabilidade de Dados
Title: Database Sharding vs. Read Replicas: How We Scaled for 1M Concurrent Users
By: João Vitor Nascimento De Mendonça

The Vertical Scaling Wall
There comes a point where "just get a bigger instance" stops working for your database. We hit that wall when our RDS instance reached 90% CPU utilization even on the largest tier.
Strategy 1: CQRS and Read Replicas
The first step was separating concerns using the CQRS (Command Query Responsibility Segregation) pattern.

All writes go to the Primary instance.

All heavy GET requests are load-balanced across Read Replicas.

Result: CPU usage dropped by 40% instantly.

Strategy 2: Application-Level Sharding
For the most critical tables, we implemented Sharding based on user_id. By distributing data across multiple physical shards, we ensured that no single database became a single point of failure.
Conclusion
Scaling databases is 10% hardware and 90% architecture. Sharding is complex, but for massive scale, it’s the only path forward.

Beyond the Perimeter: Implementing Zero Trust and Ephemeral Identities in Multi-Cloud Environments

João Vitor Nascimento Mendonca — Sat, 11 Apr 2026 16:18:50 +0000

By: João Vitor Nascimento De Mendonça

Field: Cybersecurity / Cloud Engineering

Publication: Cloud Architecture Hub / Independent Technical Series

The Death of the "Castle and Moat" Model Until recently, network security relied on the idea of a strong perimeter: once you were inside the VPN, you were trusted. In 2026, with the fragmentation of microservices and multi-cloud architectures (AWS, GCP, Azure), this model has failed. The perimeter is no longer the network; the perimeter is now Identity.

Implementing a Zero Trust Architecture (ZTA) starts with a simple but rigorous principle: "Never trust, always verify." It doesn't matter if the request comes from inside or outside the network; every access must be authenticated and authorized.

Technical Implementation: Ephemeral Identities The greatest security risk today is static credentials (API keys that never expire). To mitigate this, I moved our infrastructure to an Ephemeral Identity model.

Short-Lived Tokens: Instead of static keys, we use tools like HashiCorp Vault or AWS IAM Roles Anywhere to generate credentials with a Time-to-Live (TTL) of only 15 minutes.

mTLS (Mutual TLS): We implemented mTLS via a Service Mesh (Istio). This ensures that every microservice has its own digital certificate and that communication is encrypted and verified at both ends.

Policy Enforcement (Open Policy Agent - OPA)
To ensure no S3 bucket is created without encryption, we use "Security as Code" to block non-compliant infrastructure:

Code snippet

OPA Rule to prevent public buckets

package cloud.security

deny[msg] {
input.resource == "aws_s3_bucket"
input.attributes.acl == "public-read"
msg := "ERROR: Public buckets are not allowed by compliance policy."
}

Benefits and Success Metrics The transition to Zero Trust isn't just about security; it’s about operational efficiency. By automating identity management, we observed:

98% Reduction in the exposure window in the event of credential leakage (due to short TTL).

Automatic Compliance: Significantly less time spent on manual audits, as policies are enforced directly within the CI/CD pipeline.

Conclusion Modern security cannot be a "bottleneck" for development. By transforming security into code and adopting ephemeral identities, we allow engineering teams to move fast, with the certainty that every byte exchanged across clouds is protected and verified.

Mitigating I/O Bottlenecks in Event-Driven Architectures: A Deep Dive into Backpressure and Resiliency

João Vitor Nascimento Mendonca — Sat, 11 Apr 2026 16:10:35 +0000

By: João Vitor Nascimento De Mendonça Originally published in Engineering Weekly / Tech Blog

The Scenario: The Chaos of Unmanaged Scale In modern architectures, using Apache Kafka or RabbitMQ solves decoupling issues but creates a new challenge: throughput disparity.

I recently observed a scenario where a producer was injecting 50k msgs/s, while the consumer—limited by a third-party API—could only process 10k msgs/s. The result? Metric omission, heap memory exhaustion, and cascading latency across the entire system.

Backpressure and Concurrency Control To solve this, simply "scaling the pod" isn't enough. I implemented Semaphore-based Concurrency Control. In Go, for instance, we use buffered channels as semaphores to limit active workers:

Go
// Example of a concurrency limiter for DB protection
var semaphore = make(chan struct{}, 50) // Limit to 50 active workers

func processEvent(event Event) {
semaphore <- struct{}{} // Acquire slot
defer func() { <-semaphore }() // Release slot

// Processing logic and DB persistence
db.Save(event)

}
Additionally, we integrated a Circuit Breaker (using Resilience4j/Hystrix). If the database begins responding above a 500ms threshold, the circuit opens, immediately halting queue consumption. This prevents the application from crashing while attempting to process requests it cannot currently deliver.

Infrastructure Tuning: Optimizing the Garbage Collector (GC) Latency wasn't solely caused by I/O; millisecond pauses from the Garbage Collector were locking up processing via "Stop-the-World" events.

We migrated from traditional x86 instances to AWS Graviton (ARM64) and fine-tuned the ZGC (on Java 21+). Our goal was to maintain pauses below 1ms, even with large heaps.

The Result: An 85% reduction in GC pauses, stabilizing throughput during high-traffic peaks.

Resilience with Dead Letter Queues (DLQ) Errors are inevitable. Our strategy involved implementing Exponential Backoff. If a message fails, it doesn't block the main queue; instead, it is routed to a Retry Topic with increasing delays (1s, 10s, 1min). Once retries are exhausted, the message lands in a DLQ (Dead Letter Queue) for manual inspection.

Field Note: Never allow infinite retries without backoff. Doing so is essentially a self-inflicted Denial of Service (DoS) attack against your own database.