Managing long-lived credentials in a multicloud environment is a primary source of architectural fragility and security debt. When an application hosted on Microsoft Azure needs to access a private Amazon Web Services resource, such as an S3 bucket or a DynamoDB table, engineering teams often resort to creating IAM users with static access keys. These keys are frequently hardcoded, inadequately rotated, or leaked through insecure CI/CD pipelines, leading to unauthorized data egress and compromised compliance postures (Humble & Farley, 2010). The definitive solution to this vulnerability is Workload Identity Federation using OpenID Connect (OIDC). By establishing a trust relationship between the Azure Active Directory (now Microsoft Entra ID) and the AWS Identity and Access Management (IAM) control plane, we eliminate the need for static secrets entirely. This article details the engineering process of implementing a secretless, short-lived credential exchange mechanism that leverages the native identity of the Azure workload to assume granular roles within AWS.
Prerequisites
Implementing this federation requires Terraform version 1.7.0 or higher to manage cross-provider identity resources. You must have administrative access to an Azure Subscription and an AWS Account. The implementation utilizes the AWS provider (version 5.40.0+) and the AzureRM provider (version 3.90.0+). For the application-side logic, Python 3.12 is required along with the boto3 and azure-identity libraries. You should also possess a working knowledge of the JWT (JSON Web Token) structure and the OIDC protocol flow. Ensure that your Azure resources are assigned a System-Assigned or User-Assigned Managed Identity, as this provides the initial cryptographic proof of identity needed for the federation process.
Step-by-Step
Step 1: Establishing the OIDC Trust Anchor in AWS
The first architectural requirement is to configure AWS to recognize Azure as a valid identity provider. We accomplish this by creating an IAM OIDC Provider that points to the unique issuer URL of the Azure tenant. This configuration allows the AWS Security Token Service (STS) to validate tokens signed by Microsoft. We use Terraform to automate this setup, specifying the client ID (the audience) that the Azure tokens will contain. By narrowing the audience to a specific application ID in Azure, we ensure that only tokens intended for this federation are accepted. This step is a critical implementation of the Least Privilege principle at the infrastructure level, as it prevents any arbitrary Azure token from being used as a baseline for identity in the AWS environment (Wardley, 2016).
# identity_federation/aws_side.tf
data "azuread_client_config" "current" {}
# The issuer URL is tenant-specific in Azure
locals {
azure_issuer_url = "https://sts.windows.net/${data.azuread_client_config.current.tenant_id}/"
}
resource "aws_iam_openid_connect_provider" "azure_provider" {
url = local.azure_issuer_url
client_id_list = ["api://aws-federation-client"] # Audience in Azure JWT
thumbprint_list = ["9e99a48a9960b14926bb7f3b02e22da2b0ab7280"] # Microsoft Root CA
}
resource "aws_iam_role" "cross_cloud_access" {
name = "AzureWorkloadAccessRole"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
Federated = aws_iam_openid_connect_provider.azure_provider.arn
}
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"${replace(local.azure_issuer_url, "https://", "")}:aud": "api://aws-federation-client"
}
}
}
]
})
}
This configuration creates a secure gateway for Azure identities. How do we ensure that a specific Azure Managed Identity is mapped to this AWS role while preventing other services in the same Azure tenant from assuming the same permissions?
Step 2: Implementing Attribute-Based Access Control (ABAC) Mapping
We address the risk of lateral movement by implementing Attribute-Based Access Control (ABAC) within the trust policy. Instead of a broad trust relationship with the entire Azure tenant, we refine the Condition block in the AWS IAM role to validate specific claims within the Azure JWT, such as the sub (subject) or custom roles. In our Python logic, we utilize the azure-identity library to acquire an access token for the specified audience. This token is then passed to the AWS STS assume_role_with_web_identity method. By enforcing a match between the sub claim (which contains the Azure Object ID) and the IAM policy, we guarantee that only the authorized microservice can assume the role. This pattern adheres to Hexagonal Architecture by treating the identity exchange as an external adapter, keeping the core domain logic clean of authentication mechanics.
# identity_service/federation_adapter.py
import boto3
from azure.identity import ManagedIdentityCredential
from botocore.exceptions import ClientError
class MulticloudIdentityAdapter:
def __init__(self, aws_role_arn: str, azure_client_id: str):
self.aws_role_arn = aws_role_arn
self.azure_client_id = azure_client_id
self.azure_credential = ManagedIdentityCredential()
def get_aws_session(self) -> boto3.Session:
"""
Exchanges Azure Managed Identity token for AWS STS temporary credentials.
"""
try:
# 1. Get OIDC token from Azure for the specific AWS audience
azure_token = self.azure_credential.get_token(f"api://{self.azure_client_id}/.default")
# 2. Initialize AWS STS Client
sts_client = boto3.client('sts', region_name='us-east-1')
# 3. Exchange for temporary AWS credentials
response = sts_client.assume_role_with_web_identity(
RoleArn=self.aws_role_arn,
RoleSessionName="AzureWorkloadSession",
WebIdentityToken=azure_token.token
)
creds = response['Credentials']
return boto3.Session(
aws_access_key_id=creds['AccessKeyId'],
aws_secret_access_key=creds['SecretAccessKey'],
aws_session_token=creds['SessionToken']
)
except ClientError as e:
print(f"Federation failed: {e.response['Error']['Message']}")
raise
This identity exchange provides the necessary credentials for cross-cloud operations. However, if the Azure workload needs to perform frequent operations, how do we optimize the token exchange to avoid hitting AWS STS rate limits or introducing unnecessary latency?
Step 3: Optimizing Credential Caching and Refresh Cycles
High-performance architectures must minimize the overhead of the identity federation process by implementing an intelligent credential caching layer. Since the AWS STS credentials have a defined expiration period (typically one hour), requesting a new token for every individual API call is inefficient and introduces a single point of failure. We implement a wrapper that caches the boto3.Session object and only triggers a refresh when the current credentials are within a five-minute grace period of expiring. This strategy ensures that the application always has a valid session ready for use. By integrating this into the Hexagonal Architecture as a singleton adapter, we provide the domain services with a seamless interface to AWS resources while maintaining the security benefits of short-lived tokens and strictly avoiding local storage of secrets.
# identity_service/session_manager.py
import time
from datetime import datetime, timezone
class CachedSessionManager:
def __init__(self, adapter: MulticloudIdentityAdapter):
self.adapter = adapter
self._current_session = None
self._expiry_time = None
def get_session(self) -> boto3.Session:
"""
Returns a cached session or refreshes it if near expiration.
"""
now = datetime.now(timezone.utc)
if not self._current_session or (self._expiry_time and (self._expiry_time - now).seconds < 300):
print("Refreshing multicloud session...")
self._current_session = self.adapter.get_aws_session()
# We fetch the expiry from the first client call or STS response
# Simplified for implementation demonstration
self._expiry_time = self._extract_expiry(self._current_session)
return self._current_session
def _extract_expiry(self, session: boto3.Session):
# In a real implementation, extract the 'Expiration' from the STS response
# stored during the get_aws_session call.
return datetime.now(timezone.utc).replace(hour=datetime.now().hour + 1)
The credential lifecycle management ensures reliable access across cloud providers. What architectural safeguards are required to maintain security when an Azure Managed Identity is decommissioned but its associated AWS IAM role remains active?
Common Troubleshooting
A frequent issue when configuring OIDC federation is a Signature Verification Failed error during the AWS STS call. This typically occurs because the OIDC thumbprint in the AWS OIDC Provider resource is outdated. Microsoft rotates its root CA certificates periodically. Ensure your Terraform configuration uses a dynamic thumbprint retrieval mechanism or includes the current root thumbprint for the Microsoft identity platform to prevent service disruption during rotation events.
Another common failure is the InvalidIdentityToken exception. This often indicates a mismatch between the aud (audience) claim in the Azure token and the client_id configured in the AWS OIDC Provider. Verify that the scope requested in the Azure Python code precisely matches the client_id_list in Terraform. Note that Azure often prefixes the audience with api://, and this must be consistently reflected across both cloud configurations.
Finally, verify that the Azure Managed Identity has the Sign-in permission for the specific application registration. Without the correct service principal permissions in Azure, the get_token call will return a 403 Forbidden error before the request even reaches the AWS boundary. Use the Azure CLI to validate the token content manually during the initial debugging phase to ensure all required claims are present.
Conclusion
Implementing OIDC Workload Identity Federation is the most robust method for securing service-to-service communication between AWS and Azure. By eliminating static keys and leveraging short-lived, cryptographically verified tokens, you significantly reduce the risk of credential compromise. The use of Hexagonal adapters and intelligent caching ensures that this security posture does not come at the cost of performance or code maintainability. As a next step, consider implementing AWS IAM Access Analyzer to continuously monitor the federation trust policies, ensuring that your multicloud identity perimeter remains hardened against unauthorized changes or overly permissive configurations.
References
Humble, J., & Farley, D. (2010). Continuous delivery: Reliable software releases through build, test, and deployment automation. Pearson Education.
Microsoft. (2024). Workload identity federation. Microsoft Entra Documentation. https://learn.microsoft.com/en-us/entra/workload-id/workload-identity-federation
National Institute of Standards and Technology. (2020). Attribute-based access control. NIST Special Publication 800-162.
Wardley, S. (2016). Wardley maps: Topographical intelligence in business strategy. Medium. https://medium.com/wardleymaps

Top comments (0)