Articles
π Bucket4j + Infinispan: A Deep Dive Into Implementation
A code-level walkthrough of running Bucket4j rate limiting on top of embedded Infinispan. It traces how InfinispanProxyManager wraps a Bucket4j RemoteCommand into a SerializableFunction that runs as an AbstractBinaryTransaction on the primary node, deserializing RemoteBucketState, applying tryConsume, and writing the result back with a MetaLifespan TTL under atomic CAS evaluation. It rounds out with the protostream context initializer and the bytecode/version-homogeneity constraints you need to get it working.
π Building a High-Performance API Gateway with Vert.x: Architecture Deep Dive
A production look at building an API gateway on Vert.x and the performance contract that comes with it. The Router handler pipeline chains stages through routingContext.next(), short-circuiting on auth or validation failure, while blockingHandler keeps slow work off the event loop. The piece treats handler ordering as a security property and digs into worker-pool exhaustion tuning and the fail-open vs fail-closed call when a downstream key/auth service times out, all grounded in real profiling.
π Deploying a Multi-Cloud API Gateway from Scratch: Architecture, Failure Modes, and Hard-Won Lessons
Building a multi-cloud API gateway from the ground up, with the failure modes spelled out. A Go control plane watches versioned JSON route configs in Redis and serves them to Envoy over xDS, shifting traffic weights when backend error rates cross a threshold. The hard-won lessons are the good part: rate limiting silently fails open when the gRPC limiter is unreachable, a Redis restart can hand Envoy empty clusters (fixed with in-memory plus disk read-through), and OTLP exporters drop spans without retry_on_failure and a sending queue.
π Deploying MCP servers in production: the 2026 attack surface and the defense stack
A practical MCP threat model that maps the disclosed 2026 CVE classes onto a six-layer defense stack and a seven-question go/no-go checklist. It walks the mitigations layer by layer: pin and mirror servers against supply-chain attacks, harden the schema contract, validate OAuth 2.1 PKCE tokens via RFC 7662 introspection with agent-scoped delegation (RFC 8693), inspect tool descriptions, args, and responses at the gateway to catch prompt injection and rug-pulls, log every call, and isolate at the OS level (containers, gVisor, Firecracker) rather than trusting JS sandboxes.
π Designing the outbound delivery log: what to store, what to expose, what to keep
A field guide to designing a durable outbound delivery-attempt log: what to store, what to expose, and what to throw away. It proposes a four-part field taxonomy (identity, lifecycle, outcome, observability) anchored by status, error_category, and latency_ms, then splits the hot write path from a warm/cold query path on an OLAP store. Customer-facing views are workspace-scoped with header/body sanitization and translated errors, and retention is tiered: full samples hot, metadata-only warm, outcome-only cold, with explicit delete-or-aggregate handling for PII.
π Enterprise-grade Authorization for MCP Servers
An end-to-end "OAuth for MCP" authorization design. It maps the MCP client/server/resource-server roles, wires up discovery via RFC 9728/8414 and dynamic client registration (RFC 7591), then mandates PKCE with S256 challenges for loopback redirects. Access tokens stay short-lived and scoped (minutes), refresh tokens rotate with replay detection and immediate revocation. The author is candid about OAuth's limit: a valid token still can't stop payload-driven prompt injection.
A schema-first way to stop event replay from taking down production. Events are tagged replay-safe, replay-restricted, or replay-toxic via an x-replay-policy field carried in Avro, Protobuf, and JSON Schema. The runtime replay gate fails closed on missing or unknown policies, propagates a replayMode flag to suppress external side effects for restricted events, and routes toxic events to forward-only reconciliation with compensating actions. A nightly drift detector rebuilds projections in a sandbox to catch misclassification before a real replay does.
π Event Schema Evolution: 4 Versioning Strategies, 1 That Quietly Breaks Consumers
Four schema-versioning strategies compared over a year-one-to-year-five horizon, and the one that quietly breaks consumers. The silent failure: versioned topics where producers retire v1 before every consumer has migrated. The fixes are concrete: a consumer-topic gap monitor, an expand-contract flow that gates contraction on a schema_version_consumed readiness check plus registry cross-checks, and an upcaster pattern that versions the read path with chained transformers (with notes on error compounding and caching cost).
π Event-Driven vs Polling Architectures
A clear-eyed comparison of how agent systems get their triggers: webhooks, log-based CDC, message-bus subscriptions, and plain polling, each mapped to its delivery contract and failure modes. It covers provider-specific retry/order/rate-limit quirks, explains CDC as WAL replay with per-partition ordering and WAL-accumulation risk, and shows why agent runtimes need durable state across waits. The recommendation: webhook-plus-reconciliation, with a structural idempotency key (agent_run_id, step_id, tool_name, call_index) at the write boundary to make at-least-once delivery safe.
π How Agoda Simulates Booking Flows to Test Flight Integrations
How Agoda replaced brittle connector end-to-end tests with a supplier-agnostic Workflow Simulator for flight bookings. A Scenario Builder generates deterministic or randomized context, a Workflow Executor models the booking as a DAG, and shared state is carried across calls as nodes are traversed. Endpoint assertions check contract and schema constraints while workflow assertions verify cross-step data propagation against recorded request-response pairs, with an honest note on where it still can't model race and rate-limit effects.
π How LI.FI Added Enterprise Auth to Apache Superset's MCP Server
A start-to-finish account of putting enterprise auth in front of Apache Superset 6.1's MCP server with Okta OIDC. It extends FastMCP's OIDCProxy to call Okta /userinfo during the token exchange, folds the upstream email into the FastMCP JWT, and monkey-patches get_user_from_request to set g.user from that claim. For Okta org-AS opaque tokens it swaps JWKS validation for an IntrospectionTokenVerifier via RFC 7662 /introspect, fixing the 401 invalid_token, and closes with RBAC setup and Helm/K8s deployment gotchas.
π How we built integration testing for fast-moving AI backend
A full integration-testing setup for a backend whose AI dependency moves fast. It boots a real Llama Stack as a uv subprocess with health-check seeding, injects an X-LlamaStack-Provider-Data test id through a custom Go transport, and runs CI in two phases: replay first, then record to refresh fixtures only on mismatch. A scheduled "Compatibility Sentinel" GitHub Action resolves stable and dev releases, reruns the suite against pinned Makefile versions, and posts structured Slack status so contract drift surfaces weeks ahead.
A close read of two Notion changes that keep the same shape but change meaning, and the integration bugs they cause. Persisted pagination cursors and rate-limit backoff are the casualties: store the Notion-Version next to each cursor and reject cross-version replay (old UUID cursors still work, new base64 ones break on older versions), and read x-ratelimit-reset as a seconds duration with sane min/max bounds and alerts on negative or absurd waits. It ends with a migration test you can run against live 429s and paginated queries.
π OpenAPI 3.1 in Practice: What I Learned Publishing a Real-World Swap API
A hands-on playbook for shipping a real OpenAPI 3.1 spec. It leans on JSON Schema 2020-12-native constructs (type [.., null], examples[]), models webhooks and events directly in the 3.1 document, and publishes from a dedicated git-tagged repo that regenerates /openapi.json on deploy. The client strategy is two-tier: hand-written canonical Python/TS SDKs alongside generated "second-tier" clients, with live API tests to catch spec/implementation drift and semver-mapped URL versioning (/api/v2 vs /api/v3).
π Saga Compensation When Undo Is Impossible: 3 Patterns and the Audit Trail
What to do when a saga's compensation step can itself fail. The author pairs three patterns (forward recovery, the authorize/commit pivot, and a reconciliation queue) with a concrete append-only audit trail. It shows how to mark step deviations with explicit Outcome states, implement payment pivots with idempotency and expiry-aware timeouts, and park ambiguous stuck entries with the operator actions available to resolve them, explaining why causation_id, actor, and external_refs are non-negotiable for end-to-end traceability.
π What's old is new: A NATS-native protocol for AI agents
A pinned, NATS-native interoperability contract for AI agents. Discovery runs over $SRV.PING.agents/$SRV.INFO.agents; a conversation is a single request whose reply streams typed JSON {type,data} chunks, starting with a mandatory status=ack and ending with a zero-byte terminator; liveness rides fixed agents.hb.{agent}.{owner}.{name} heartbeats plus agents.status.* subjects. The spec also nails down envelope discrimination (UTF-8 text vs JSON), queue grouping, capability metadata, and error-header semantics.
π Your Integration Logs Say Everything Is Fine. Your Best Customer Canβt Check Out.
A production post-mortem on "silent" semantic drift: ghost addresses appearing between Business Central and Adobe Commerce while the integration reported zero errors. The root cause was missing update intent. The fix translates ERP record identifiers into explicit eCommerce commands: a matched ID becomes an update, an unmatched one an insert. Bulk re-sync is rearchitected to update-and-verify only, killing the duplication and cutting checkout latency.
A version-gated data-integrity trap in Shopify's discount APIs. On Admin API versions before 2026-07, market-scoped discounts vanish from discountNodes and return null when fetched by ID, so reads look like deletions and quietly corrupt reconciliation and bulkOperationRunQuery results. The remedy: upgrade to 2026-07, count the version diff, and map discounts with market-type and inheritance awareness using the market_ids context.
π How Retry Storms Crash API-Led Systems: Bounded Reliability Patterns for Distributed Architectures
A bounded-reliability playbook for API-led stacks (Gateway, Experience, Process, System, ERP) showing how well-meaning fault tolerance correlates into cascading failure. It dissects three traps: retry storms, where independent per-layer retries multiply load; synchronous replication fan-out collapse; and autoscaling that feeds on retry-inflated metrics. The remedies are equally concrete: capped, jittered exponential backoff with a load-aware short-circuit; tiered durability scoped by criticality; and autoscaling on organic RPS rather than retry-driven spikes.
Apache Camel
π Apache Camel AI: Building an Email Triage Agent with OpenAI, Gmail Transformers, and Camel JBang
An end-to-end Camel pipeline that orchestrates an LLM in YAML and JS. It sanitizes raw Gmail HTML with a custom SimpleFunction chained through Camel 4.18's ~> operator, then calls camel-openai chat completion with a jsonSchema to force a constrained category/needsReply response. A Choice routes to direct:handle-triaged-email, wireTap handles async reply drafting, and 4.19 DataType Transformers (google-mail:update-message-labels, google-mail:draft) build the ModifyMessageRequest and Draft objects without hand-written API models.
Apache Kafka
π Architecting Cloud-Native Kafka
Turns Kafka's cloud-native features into concrete FinOps and platform-governance workflows. It covers KIP-405 tiered storage and when object-storage reads actually pay off, then uses KIP-1267-style RemoteFetchBytesPerSec and RemoteFetchRequestsPerSec JMX telemetry to drive Prometheus/Grafana chargeback and quota throttling. It also weighs KIP-848 incremental rebalance for safe HPA/KEDA scaling, KIP-932 share-group tradeoffs, and KIP-1150 diskless-migration risks like orphan segments and EOS/LSO uncertainty.
π Benchmarking the Kroxylicious Proxy
A reproducible benchmark for sizing the Kroxylicious proxy. It compares a Kafka baseline, Kroxylicious passthrough, and AES-256-GCM record encryption (Vault KMS) using OpenMessaging Benchmark rate sweeps on a fixed OpenShift/Kafka testbed. Passthrough overhead is negligible; encryption costs roughly 25% throughput and saturates earlier. From the data the author derives a CPU planning formula, CPU(mc)=kΓ(P+NΓC), with measured k coefficients and a requests=limits pod-spec rule that makes the proxy's throughput ceiling predictable.
π Designing a High-Throughput Webhook Delivery System at Scale
An implementation-grade design for high-throughput webhook delivery. It uses a per-entity-type transactional outbox (partitioned by tenant), a DB-based distributed mutex with heartbeats to coordinate pollers, and Kafka topics per entity type keyed by entity_id for strict per-entity ordering. Delivery runs on Java 21 virtual threads with per-customer semaphore concurrency and endpoint circuit breakers, and the DLQ deliberately avoids auto-replay, offering explicit resume and rate-limited controlled replays from the current offset instead.
π Kafkaβs Real Compression Problem Is Batch Depth
A causal model for why Kafka compression underperforms: shallow producer batches give the codec less redundancy to work with, which shrinks compressed batches and piles fetch overhead onto consumers across fan-out. The author localizes the fault with batch-fill-rate, producer-compression-rate, and consumer-fetch-size metrics, then fixes it in order: raise linger.ms, grow batch.size within buffer.memory, switch to zstd level 1 for structured data, and align fetch.min.bytes/fetch.max.wait.ms to the new batch shape.
π Messaging in the Age of AI
A concrete Kafka-plus-Spring blueprint for agent messaging under nondeterminism. Messages carry an envelope with tokenCount, trace lineage, senderType, modelId, and an agent idempotencyKey, and traffic is split into lane topics with their own retention. The consumer side does chunked context assembly, enforces token-aware per-lane quotas with backpressure for cost control, and layers in lineage-focused JSON observability plus messaging-bound safety filtering.
π Ursa β a new Diskless Lakestream engine for Kafka
Ursa is a Kafka storage extension that decouples durable log data from strongly consistent metadata (via Oxia), enabling diskless topics with native Iceberg/Delta visibility. Brokers buffer writes (4MB/200ms), sort by topic-partition id, and flush mixed-partition objects to S3 with null offset metadata, then atomically update per-partition offset and data pointers in Oxia. A compaction manager rewrites that mixed data into columnar Parquet and commits batched files to the Iceberg/Delta catalog, making the streams queryable as lake tables.
π We Proved Multi-Cluster Kafka Works on Kubernetesβ¦ Hereβs Everything we Learned
An end-to-end stretch-cluster prototype that keeps Kafka running through a full Kubernetes cluster failure. A central Strimzi operator manages the Kafka CRs while remote clusters run constrained reconciliation, and the trick is in the networking: modified Strimzi deterministic .clusterset.local DNS (via stretch-cluster-alias annotations) plus extended TLS certificate SANs make advertised.listeners and controller.quorum.voters work across 2β10ms links. The writeup includes measured quorum, leader-election, and failover behavior.
π Why 80% of Kafka Clusters Would Fail a SOC 2 Audit Tomorrow
Evidence from 50 production-cluster scans on why most Kafka deployments would flunk a SOC 2 Type II audit. It maps specific misconfigurations to concrete control statements (CC6.7 inter-broker PLAINTEXT, CC6.1 wildcard ACLs and missing auth, CC8.1 topic auto-creation) and prescribes the fixes: SSL-only inter-broker protocol, SASL_SSL listeners, disabling auto.create.topics.enable, authenticated and segmented JMX, and an audit-log authorizer with proper retention windows.
Azure
π MCP Meets Entra ID: Solving the Dynamic Client Registration Problem
A transparent OAuthShim that retrofits RFC 7591 dynamic client registration onto Entra ID, which lacks a /connect/register endpoint. Sitting in front of Claude Code, the shim handles discovery via APIM-injected WWW-Authenticate and a mock /.well-known/oauth-protected-resource, issues ephemeral client_ids, proxies the authorization code plus PKCE to Entra, and returns unmodified JWTs for APIM's validate-azure-ad-token. It includes the APIM inbound policy chain, Redis-backed shared state for multi-instance redirects, and operational hardening (HTTPS, logging, rate limits).
MuleSoft
π How I Built an Event-Driven Integration Platform for Healthcare Using MuleSoft
A full enterprise blueprint that pairs MuleSoft's API-led layers (System/Process/Experience) with a custom async bus (LEXI) for healthcare product-data updates. It enforces a canonical model through DataWeave mappings, standardizes event contracts (eventId, correlationId, delta changedFields, SemVer), and implements subscriber idempotency with MANUAL ack on Anypoint MQ. Errors are classified for retry vs DLQ replay, and the design wires correlation-id observability alongside active-passive multi-region DR (RTO<15m, RPO<5m).
π Stop using to aggregate arrays β here's why it silently destroys performance in Mule 4
A Mule 4 performance pitfall worth knowing: accumulating results with triggers O(NΒ²) array copies because DataWeave payloads are immutable. The pragmatic fix is to use (optionally capped with maxConcurrency) to return a MessageCollection, then run a single DataWeave transform to extract payloads, dropping the copy work to O(N), with the composite-error handling that approach requires.
Redis
π Idempotent Consumers: Dedup Key, Dedup Window, or Idempotency by Design. Pick One
A failure-mode-driven "pick one" matrix for idempotent consumers, with implementation guidance for each. Use SETNX dedup keys with a TTL tuned to worst-case redelivery; reach for LRU+Bloom only when restart/eviction gaps cause bursty duplicates; and prefer idempotency-by-design via atomic state-transition SQL (WHERE pending) with side effects guarded by external idempotency keys. It also sketches a layered retrofit (window, then storage, then a design backstop) to survive Redis failover and process restarts.
SAP
π Under the Hood: How SAP Integration Suite Really Works
A look under the hood of SAP Integration Suite. It follows an iFlow from BPMN-like XML versioned in the TMN control plane, through compilation into Camel DSL, packaging as an OSGi bundle, distribution, and hot activation in Karaf without a JVM restart. Along the way it explains the five-stage deploy pipeline and why activation takes 20β40s, then traces a message through the Camel Exchange (headers/body/properties) and the generated Processor chain, with SAP adapters as the protocol entry and exit points.
WSO2
π Governing AI Agent Access to MCP Tools with WSO2 AI Gateway and WSO2 Identity Server 7.3.0
A concrete WSO2 reference architecture for governing MCP tool calls with a gateway and first-class agent identities. WSO2 IS 7.3.0 registers agents and mints signed JWTs carrying aut=AGENT and sub=AgentID, and the WSO2 AI Gateway enforces two policies: mcp-auth for JWKS-backed JWT validation and mcp-acl-list for allow-mode access with per-tool exceptions. An mcp-authz policy then maps MCP tool names to IS scopes for per-agent RBAC, demonstrated with end-to-end curl tests showing 401 and MCP capability errors.
Releases
π Apache Kafka 4.3.0
Kafka 4.3.0 lands a broad set of broker and platform knobs across 25 KIPs. Highlights include tiered-storage follower bootstrap (follower.fetch.last.tiered.offset.enable), operational cordoning via cordoned.log.dirs, and controller fetch/snapshot limits (controller.quorum.fetch.max.bytes and .fetch.snapshot.max.bytes). On the client side it adds OAuth client-assertion support for client_credentials, refines group assignment and epoch handling, and expands state and storage metrics; Streams and Connect gain state-store header support and ConnectPlugin unification.
Arazzo 1.1.0 extends declarative workflow specs beyond OpenAPI by adding AsyncAPI-backed steps: sourceDescriptions can now reference asyncapi with send/receive actions, dependsOn, correlationId, and successCriteria, plus chained workflows with fixed parameters. It also introduces a Selector Object (jsonpath/xpath/jsonpointer with version pinning), formalizes source resolution and a runtime-expression ABNF, adds identity-based $self URIs for unambiguous cross-document resolution, and aligns querystring parameter handling with OAS 3.2.
π Kroxylicious 0.21.0
The headline of Kroxylicious 0.21.0 is deep integration plumbing. A Kubernetes admission webhook injects a proxy sidecar from a KroxyliciousSidecarConfig, virtual clusters gain graceful draining via drainTimeout with completion/timeout metrics, and the proxy now handles the HAProxy PROXY protocol ahead of TLS. On the security and data path it adds Strimzi CA trust discovery, AWS KMS IRSA/EKS Pod Identity credential providers, a ServerTlsCredentialSupplier for dynamic upstream mTLS, and Avro/Protobuf record validation against Apicurio Registry.
Top comments (0)