DEV Community

Cláudio Filipe Lima Rapôso
Cláudio Filipe Lima Rapôso

Posted on

Architecting Multicloud Credential Resilience: Synchronizing AWS Secrets Manager and Azure Key Vault

Hardcoded credentials represent a fatal security anti-pattern, driving the industry toward centralized vault solutions. However, locking credential management exclusively to a single vendor infrastructure introduces a catastrophic single point of failure for multicloud architectures. If an enterprise platform orchestrates a fallback to a Microsoft Azure execution cell during an Amazon Web Services (AWS) outage, the Azure compute layer will fail to boot if it cannot retrieve database connection strings from the degraded AWS Secrets Manager. We resolve this dependency by constructing an automated, asynchronous credential replication mesh. By utilizing AWS EventBridge to capture secret rotation events and triggering an identity-federated Azure Function, engineering teams can dynamically mirror critical credentials into Azure Key Vault. This topology guarantees that isolated cloud environments retain autonomous, low-latency access to cryptographic material, ensuring continuous execution and strict zero-trust compliance during severe vendor degradation.

Prerequisites

Implementing cross-cloud secret replication requires deep expertise in event-driven security and identity federation. The infrastructure provisioning relies on Terraform version 1.7.0 or higher, utilizing the HashiCorp AWS Provider version 5.40.0 and the AzureRM Provider version 3.90.0. The synchronization logic requires Python 3.12, integrating boto3 version 1.34.0, azure-keyvault-secrets version 4.8.0, and azure-identity version 1.15.0. Operators must establish an active OpenID Connect (OIDC) trust boundary between the Azure Active Directory tenant and AWS Identity and Access Management (IAM) to authorize secure, passwordless API invocations across the network perimeter.

Step-by-Step Implementation

Architecting the Event-Driven Capture Mesh

We initiate the synchronization pipeline by configuring AWS CloudTrail and Amazon EventBridge to capture state mutations within the primary secrets vault. The architectural justification for this event-driven approach is the elimination of aggressive polling mechanisms. Continuously polling AWS Secrets Manager from an external Azure environment consumes massive network bandwidth, incurs unnecessary API costs, and introduces unacceptable replication latency. Instead, we configure a precise EventBridge rule that listens exclusively for PutSecretValue and UpdateSecret API calls logged by CloudTrail. When a developer or an automated rotation Lambda updates a database password, EventBridge intercepts the operation instantly. We then route this event to an external API Destination pointing to an Azure Function webhook. Crucially, we configure the EventBridge Input Transformer to strip away all metadata, forwarding only the precise Amazon Resource Name (ARN) of the modified secret to the Azure endpoint.

resource "aws_cloudtrail" "secrets_audit" {
  name                          = "secrets-manager-trail"
  s3_bucket_name                = aws_s3_bucket.audit_logs.id
  include_global_service_events = false

  event_selector {
    read_write_type           = "WriteOnly"
    include_management_events = true
    data_resource {
      type   = "AWS::SecretsManager::Secret"
      values = ["arn:aws:secretsmanager:${var.aws_region}:${var.account_id}:secret:multicloud/*"]
    }
  }
}

resource "aws_cloudwatch_event_rule" "secret_mutation_capture" {
  name        = "capture-secret-updates"
  description = "Routes secret rotations to Azure replica"
  event_pattern = jsonencode({
    source      = ["aws.secretsmanager"]
    detail-type = ["AWS API Call via CloudTrail"]
    detail = {
      eventName = ["PutSecretValue", "UpdateSecret"]
    }
  })
}

resource "aws_cloudwatch_event_target" "azure_function_webhook" {
  rule      = aws_cloudwatch_event_rule.secret_mutation_capture.name
  target_id = "SyncToAzureKeyVault"
  arn       = aws_cloudwatch_event_api_destination.azure_func_dest.arn
  role_arn  = aws_iam_role.eventbridge_invoke.arn

  input_transformer {
    input_paths = {
      secret_arn = "$.detail.requestParameters.secretId"
    }
    input_template = jsonencode({
      arn = "<secret_arn>"
    })
  }
}

Enter fullscreen mode Exit fullscreen mode

How do we securely transfer the highly sensitive plaintext payload across the multicloud boundary without exposing the cryptographic material to intermediate routing layers or network transit logs?

Executing the Secure Pull via OIDC Federation

We protect the plaintext credential during transit by implementing a secure pull mechanism rather than a push mechanism. The event payload delivered to Azure contains only the secret ARN, entirely devoid of cryptographic value. The architectural imperative here is zero-trust transit. If EventBridge pushed the plaintext password, it would exist temporarily in AWS transit queues and Azure ingress logs. Instead, the Azure Function receives the ARN and utilizes its assigned Azure Managed Identity to assume a federated IAM Role in AWS via OIDC. Once authenticated, the Python function connects directly to the AWS Secrets Manager API, retrieves the plaintext value into secure memory, and immediately pushes it to the localized Azure Key Vault via the Azure REST API. This ensures the sensitive material is encrypted in transit via TLS 1.3 and never touches intermediate storage layers.

Sequence Diagram

import os
import json
import boto3
import azure.functions as func
from azure.identity import ManagedIdentityCredential
from azure.keyvault.secrets import SecretClient

# Environment configuration
AZURE_KEY_VAULT_URL = os.environ["AZURE_KEY_VAULT_URL"]
AWS_ROLE_ARN = os.environ["AWS_FEDERATED_ROLE_ARN"]
AWS_REGION = os.environ["AWS_REGION"]

azure_credential = ManagedIdentityCredential()
kv_client = SecretClient(vault_url=AZURE_KEY_VAULT_URL, credential=azure_credential)

def assume_aws_role_via_oidc() -> boto3.client:
    token = azure_credential.get_token("api://AzureADTokenExchange").token
    sts_client = boto3.client('sts', region_name=AWS_REGION)

    response = sts_client.assume_role_with_web_identity(
        RoleArn=AWS_ROLE_ARN,
        RoleSessionName="AzureKeyVaultSync",
        WebIdentityToken=token
    )

    credentials = response['Credentials']
    return boto3.client(
        'secretsmanager',
        region_name=AWS_REGION,
        aws_access_key_id=credentials['AccessKeyId'],
        aws_secret_access_key=credentials['SecretAccessKey'],
        aws_session_token=credentials['SessionToken']
    )

def main(req: func.HttpRequest) -> func.HttpResponse:
    try:
        req_body = req.get_json()
        secret_arn = req_body.get('arn')

        if not secret_arn:
            return func.HttpResponse("Missing secret ARN", status_code=400)

        aws_sm_client = assume_aws_role_via_oidc()
        secret_response = aws_sm_client.get_secret_value(SecretId=secret_arn)
        plaintext_value = secret_response.get('SecretString')

        # Normalize the AWS ARN name to match Azure Key Vault naming constraints
        normalized_name = secret_arn.split(":")[-1].replace("_", "-").split("-")[0]

        kv_client.set_secret(normalized_name, plaintext_value)
        return func.HttpResponse(f"Secret {normalized_name} synchronized successfully.", status_code=200)

    except Exception as e:
        return func.HttpResponse(f"Synchronization failed: {str(e)}", status_code=500)

Enter fullscreen mode Exit fullscreen mode

When the application scales horizontally during a vendor degradation, how does the compute layer dynamically select the active vault without requiring hardcoded fallback logic scattered throughout the domain services?

Abstracting Secret Retrieval via Hexagonal Ports

We shield the application execution logic from vault location awareness by implementing a pure Python abstract port for credential management. The domain logic must never import boto3 or azure-keyvault-secrets directly. The architectural justification is strict execution portability. If a microservice is deployed to Azure AKS during an AWS outage, it must seamlessly pivot to reading from Azure Key Vault. We define a SecretProviderPort and inject a dynamic adapter factory at startup. The factory evaluates the current runtime environment variables. If the application is executing within AWS, it utilizes the AwsSecretsAdapter. If it detects an AWS connectivity failure or is explicitly booted in the Azure environment, it loads the AzureKeyVaultAdapter. The core domain simply requests a database string by name, entirely ignorant of the cryptographic provider fulfilling the request.

from abc import ABC, abstractmethod
from botocore.exceptions import ClientError

class SecretProviderPort(ABC):
    @abstractmethod
    def get_database_credentials(self, secret_name: str) -> str:
        pass

class MulticloudSecretManager:
    def __init__(self, primary_adapter: SecretProviderPort, fallback_adapter: SecretProviderPort):
        self.primary = primary_adapter
        self.fallback = fallback_adapter

    def retrieve_secret(self, secret_name: str) -> str:
        try:
            return self.primary.get_database_credentials(secret_name)
        except ClientError as e:
            print(f"Primary vault unreachable. Pivoting to fallback replica. Reason: {str(e)}")
            return self.fallback.get_database_credentials(secret_name)
        except Exception as e:
            print(f"Primary vault failure. Pivoting to fallback replica. Reason: {str(e)}")
            return self.fallback.get_database_credentials(secret_name)

# Domain execution
# The domain relies on the manager to handle the multicloud routing invisibly
# db_password = multicloud_manager.retrieve_secret("production-database-key")

Enter fullscreen mode Exit fullscreen mode

If the application gracefully falls back to the Azure vault, what structural mechanism prevents a compromised and subsequently deleted AWS secret from remaining dangerously active within the Azure replica?

Common Troubleshooting

Failing to synchronize deletion events leaves orphaned, potentially compromised credentials active in the fallback environment. If an operator deletes a secret in AWS Secrets Manager, CloudTrail logs a DeleteSecret event. Your EventBridge routing rule must be explicitly configured to capture this specific event name alongside the creation events. The Azure Function must parse this deletion payload and issue a begin_delete_secret command to Azure Key Vault. Furthermore, Azure Key Vault enables soft-delete by default. If the synchronization pipeline attempts to recreate a secret with the same name during a recovery phase while the Azure secret is in a soft-deleted state, the API will return an HTTP 409 Conflict. Your Python Azure Function must explicitly catch ResourceExistsError and execute a recover_deleted_secret operation before applying the updated plaintext value.

Another critical failure point involves the OIDC federation token expiration. If the EventBridge to Azure Function integration fails with an HTTP 401 Unauthorized in the AWS transit logs, the Azure Managed Identity token likely lacks the correct audience claims required by the AWS IAM Identity Provider. Ensure the AWS IAM trust policy explicitly validates the aud claim matching the specific Azure App Registration client ID, and verify that the token exchange lifetime parameters are configured to support rapid, ephemeral execution bursts.

Conclusion

Synchronizing credential stores across cloud boundaries guarantees that isolated compute environments maintain autonomous operation during catastrophic regional failures. By combining EventBridge routing with OIDC federated Azure Functions, architects eliminate the risks of plaintext transit while providing seamless failover capabilities through Hexagonal design patterns. As the multicloud infrastructure scales and the volume of microservices increases, organizations should consider migrating the underlying cryptographic engine entirely to HashiCorp Vault. Deploying Vault Enterprise clusters with cross-cloud performance replication provides native, multi-region synchronization and dynamic credential generation, abstracting the vault management process entirely away from the distinct cloud providers.

References

Chaganti, S., & Gomez, M. (2023). Cloud security engineering: Patterns and practices for resilient multicloud architectures. O'Reilly Media.

Fernandez, E. B. (2013). Cloud computing security: Architecture and patterns. CRC Press.

Top comments (0)