Juan Diego David Melo Alarcon

Posted on Feb 17

EKS Cross-Account con Lambda (No Binaries) | Integración Nativa EKS-Lambda entre cuentas

#aws #kubernetes #serverless #python

Spanish version followed by English version.

Recientemente enfrentamos un escenario interesante: necesitábamos que una AWS Lambda interactuara con controladores específicos de Kubernetes en varios clústeres de EKS. Bajo un stack estrictamente limitado a Python, tuvimos que diseñar una solución para gestionar la autenticación IAM y la comunicación con la API de K8s de forma eficiente.

El contexto

Como parte de una de las principales iniciativas de la organización, se debía habilitar la creación y modificación de recursos particulares (CRD) de EKS a demanda desde una función AWS Lambda. La función Lambda y el clúster EKS se encontraban en cuentas de AWS distintas dentro de la organización, lo cual añade una pequeña capa de gestión a la interacción entre cuentas.

La integración con Kubernetes desde Python

El primer desafío a sortear fue sobre la definición de cómo ejecutar operaciones de creación y modificación de Custom Resources en Kubernetes desde Python. Debido a restricciones del área de ciberseguridad de la organización, no estaba permitida la ejecución de comandos bash desde ningún lenguaje de programación (por ejemplo, mediante os o subprocess en Python).

La primera opción que se consideró fue el uso directo de la API REST expuesta por Kubernetes. Sin embargo, si bien esta es una opción viable, el nivel de abstracción de la implementación es bajo, por lo que requeriría un extendido tiempo de implementación para generar el acople con los endpoints requeridos.

Al indagar en la documentación de Kubernetes, se encontró que existe un cliente oficial de Kubernetes para Python. El cliente permite la creación y modificación de Custom Resources mediante los métodos create_namespaced_custom_object, replace_namespaced_custom_object y get_namespaced_custom_object. Estos métodos requieren como parámetros de entrada el group, version y plural que correspondan al Custom Resource Definition en cuestión.

Ej:

from kubernetes import client, config

def manage_custom_resource():
    group = "stable.example.com"
    version = "v1"
    plural = "mycustomresources" # Importante: siempre en plural
    namespace = "default"

    # Inicialización del cliente para Custom Objects
    custom_api = client.CustomObjectsApi()

    # Definición del cuerpo del recurso (manifest)
    resource_body = {
        "apiVersion": f"{group}/{version}",
        "kind": "MyCustomResource",
        "metadata": {"name": "example-instance"},
        "spec": {
            "setting": "value",
            "enabled": True
        }
    }

    try:
        # Ejemplo de creación
        response = custom_api.create_namespaced_custom_object(
            group=group,
            version=version,
            namespace=namespace,
            plural=plural,
            body=resource_body
        )
        print(f"Recurso creado: {response['metadata']['name']}")

    except client.exceptions.ApiException as e:
        print(f"Error interactuando con el API: {e}")

Esta implementación ofrece un nivel de abstracción mucho mayor, que no genera deuda técnica ni incurre en "reinventar la rueda" al momento de integrar la función Lambda con Kubernetes.

La autenticación con IAM

Con la integración con Kubernetes desde Python definida, surgió el siguiente desafío relacionado con la implementación: la autenticación. Dado que el cliente de Kubernetes es agnóstico a cualquier modelo de despliegue de un clúster, no cuenta con la capacidad nativa de tratar con el modelo de autenticación acoplado y propio de AWS IAM.

La implementación de Kubernetes en AWS EKS aplica la autenticación mediante tokens firmados de STS. El cliente nativo de Kubernetes permite establecer tokens de autenticación de la siguiente forma:

from kubernetes import client

def configure_k8s_client(cluster_endpoint, bearer_token):
    # Creamos la instancia de configuración del SDK
    configuration = client.Configuration()

    # URL del API Server obtenida de la consola de EKS o vía Boto3
    configuration.host = cluster_endpoint

    # Inyección del token en el header de Authorization
    configuration.api_key = f'Bearer {bearer_token}'

    # Retornamos el ApiClient configurado para realizar peticiones
    return client.ApiClient(configuration)

Pero, ¿cómo generamos ese token teniendo en cuenta que es firmado propiamente con AWS STS? Usualmente, para esta obtención se hace uso del comando de AWS CLI get-token:

aws eks get-token --cluster-name <nombre-del-cluster>

Sin embargo, al no poder invocar binarios externos, debemos replicar la lógica de este comando utilizando el SDK de AWS. La ventaja es que AWS CLI está desarrollado en Python de forma nativa. Revisando el repositorio del código fuente de AWS CLI, encontré la implementación correcta para generar el token de forma nativa desde Python mediante la clase TokenGenerator:

from awscli.customizations.eks.get_token import TokenGenerator, STSClientFactory
from botocore import session

def get_eks_token(cluster_name):
    """
    Genera un token de autenticación para un clúster de EKS
    utilizando la implementación nativa de AWS CLI.
    """
    session_aws = session.get_session()
    sts_client = STSClientFactory(session_aws).get_sts_client()

    return TokenGenerator(sts_client).get_token(cluster_name)

Para que esta implementación funcione es fundamental instalar la librería awscli con pip.

Configuración entre cuentas

Habiendo solventado los desafíos de la integración y la autenticación, falta un único aspecto técnico que se debe abordar: la configuración entre cuentas. Dado que la función Lambda y el clúster EKS se encontraban en cuentas diferentes, se debía establecer una configuración entre cuentas. Para que la función Lambda pueda acceder al clúster de EKS, debe contar con el permiso para ejecutar la acción eks:DescribeCluster. Revisé la lista de servicios que soportan las políticas basadas en recurso para validar la posibilidad de hacer la concesión de este permiso sin crear un rol IAM adicional. Sin embargo, EKS no soporta este tipo de políticas, por lo que se debe crear un rol entre cuentas en la cuenta donde se encuentra el clúster EKS para que sea asumido por la función Lambda. Este rol debe conceder la acción eks:DescribeCluster. A continuación se encuentra el detalle de la política de confianza y la política de permisos para este nuevo rol:

Política de confianza:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<ID_CUENTA_LAMBDA>:role/<NOMBRE_ROL_EJECUCION_LAMBDA>"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Política de Permisos:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "eks:DescribeCluster"
      ],
      "Resource": "arn:aws:eks:<REGION>:<ID_CUENTA_EKS>:cluster/<NOMBRE_DEL_CLUSTER>"
    }
  ]
}

Arquitectura final

El código para la implementación final es el siguiente:

import os
import boto3
from awscli.customizations.eks.get_token import TokenGenerator, STSClientFactory
from botocore import session
from kubernetes import client

# --- 1. Lógica de Autenticación AWS (Cross-Account) ---

def get_assumed_credentials(role_arn):
    """
    Asume el rol en la cuenta destino para obtener credenciales temporales.
    Utiliza el rol de ejecución de la Lambda como entidad de confianza.
    """
    sts = boto3.client('sts')
    response = sts.assume_role(
        RoleArn=role_arn,
        RoleSessionName="EKS-Lambda-CrossAccount"
    )
    return response['Credentials']

def generate_eks_token(cluster_name, credentials):
    """
    Genera el token de EKS seteando temporalmente variables de entorno del sistema
    para que la clase TokenGenerator de awscli firme correctamente la petición.
    """
    # Seteo de variables de entorno (Estrategia nativa de AWS CLI)
    os.environ['AWS_ACCESS_KEY_ID'] = credentials['AccessKeyId']
    os.environ['AWS_SECRET_ACCESS_KEY'] = credentials['SecretAccessKey']
    os.environ['AWS_SESSION_TOKEN'] = credentials['SessionToken']

    try:
        # Replicamos el comportamiento de 'aws eks get-token'
        session_aws = session.get_session()
        sts_client = STSClientFactory(session_aws).get_sts_client()
        token = TokenGenerator(sts_client).get_token(cluster_name)
    finally:
        # Higiene de seguridad: Eliminamos las credenciales del entorno
        os.environ.pop('AWS_ACCESS_KEY_ID', None)
        os.environ.pop('AWS_SECRET_ACCESS_KEY', None)
        os.environ.pop('AWS_SESSION_TOKEN', None)

    return token

# --- 2. Lógica de Configuración del Cliente Kubernetes ---

def get_k8s_custom_client(cluster_name, credentials):
    """
    Obtiene los datos del clúster (endpoint) y configura el cliente de K8s 
    usando el CustomObjectsApi para interactuar con CRDs.
    """
    # El cliente de EKS debe usar las credenciales asumidas para describir el clúster
    eks_client = boto3.client(
        'eks',
        aws_access_key_id=credentials['AccessKeyId'],
        aws_secret_access_key=credentials['SecretAccessKey'],
        aws_session_token=credentials['SessionToken'],
        region_name=os.environ.get('AWS_REGION')
    )

    cluster_info = eks_client.describe_cluster(name=cluster_name)
    endpoint = cluster_info['cluster']['endpoint']

    # Generamos el token de acceso seguro
    token = generate_eks_token(cluster_name, credentials)

    # Configuración del SDK de Kubernetes
    k8s_config = client.Configuration()
    k8s_config.host = endpoint
    k8s_config.verify_ssl = False  # Nota: En prod se recomienda usar CA data
    k8s_config.api_key = {'authorization': f'Bearer {token}'}

    api_instance = client.ApiClient(k8s_config)
    return client.CustomObjectsApi(api_instance)

# --- 3. Operación Genérica sobre Custom Resources ---

def get_custom_resource(custom_api, group, version, namespace, plural, name):
    """
    Consulta un Custom Resource genérico. 
    Ideal para gestionar operadores (ej. Strimzi, Argo, etc.)
    """
    try:
        return custom_api.get_namespaced_custom_object(
            group, version, namespace, plural, name
        )
    except client.exceptions.ApiException as e:
        if e.status == 404:
            return None
        raise e

# --- 4. Handler de la Lambda ---

def lambda_handler(event, context):
    # Variables de entorno configuradas en la Lambda
    CLUSTER_NAME = os.environ['CLUSTER_NAME']
    ROLE_ARN_DESTINO = os.environ['CROSS_ACCOUNT_ROLE_ARN']

    # Coordenadas del Custom Resource (pueden venir del evento)
    CR_GROUP = event.get('group')    
    CR_VERSION = event.get('version') 
    CR_PLURAL = event.get('plural')   
    NAME = event.get('name')
    NAMESPACE = event.get('namespace', 'default')

    # Flujo de ejecución: Asumir Rol -> Configurar Cliente -> Ejecutar Operación
    creds = get_assumed_credentials(ROLE_ARN_DESTINO)
    k8s_client = get_k8s_custom_client(CLUSTER_NAME, creds)

    resource = get_custom_resource(
        k8s_client, CR_GROUP, CR_VERSION, NAMESPACE, CR_PLURAL, NAME
    )

    return {
        'statusCode': 200,
        'body': resource if resource else "Resource not found"
    }_key_id=credentials['AccessKeyId'],
        aws_secret

Nota final

Con la implementación anterior, la capa de conectividad y autenticación queda totalmente resuelta. El último paso, común a cualquier integración con Kubernetes, es asegurar que el clúster reconozca la identidad que acabamos de crear.

Para ello, basta con mapear el ARN del rol asumido en la Cuenta B dentro de Kubernetes. Esto se gestiona mediante las EKS Access Entries (o el ConfigMap aws-auth), vinculándolo a los Roles de RBAC necesarios para operar los recursos. Es el procedimiento estándar de seguridad para otorgar permisos granulares una vez establecida la conexión.

Puedes consultar los detalles de esta configuración final en la documentación oficial.

English Version

Recently, we faced an interesting scenario: we needed an AWS Lambda to interact with specific Kubernetes controllers in several EKS clusters. Under a strictly Python-limited stack, we had to design a solution to efficiently manage IAM authentication and communication with the K8s API.

The context

As part of one of the organization's main initiatives, the creation and modification of specific EKS Custom Resources (CRDs) had to be enabled on demand from an AWS Lambda function. The Lambda function and the EKS cluster were in different AWS accounts within the organization, which adds a small management layer to the cross-account interaction.

Integrating with Kubernetes from Python

The first challenge to overcome was defining how to execute creation and modification operations for Custom Resources in Kubernetes from Python. Due to organization cybersecurity restrictions, executing bash commands from any programming language (e.g., using osor subprocessin Python) was not allowed.

The first option considered was the direct use of the REST API exposed by Kubernetes. However, while this is a viable option, the implementation's abstraction level is low, requiring extensive implementation time to create the coupling with the required endpoints.

While researching the Kubernetes documentation, an official Kubernetes client for Python was found. The client allows for the creation and modification of Custom Resources through the create_namespaced_custom_object, replace_namespaced_custom_object, and get_namespaced_custom_object methods. These methods require group, version, and plural as input parameters corresponding to the Custom Resource Definition in question.

Ex:

from kubernetes import client, config

def manage_custom_resource():
    group = "stable.example.com"
    version = "v1"
    plural = "mycustomresources" # Important: always plural
    namespace = "default"

    # Custom Objects client initialization
    custom_api = client.CustomObjectsApi()

    # Resource body definition (manifest)
    resource_body = {
        "apiVersion": f"{group}/{version}",
        "kind": "MyCustomResource",
        "metadata": {"name": "example-instance"},
        "spec": {
            "setting": "value",
            "enabled": True
        }
    }

    try:
        # Creation example
        response = custom_api.create_namespaced_custom_object(
            group=group,
            version=version,
            namespace=namespace,
            plural=plural,
            body=resource_body
        )
        print(f"Resource created: {response['metadata']['name']}")

    except client.exceptions.ApiException as e:
        print(f"Error interacting with the API: {e}")

This implementation offers a much higher abstraction level, which does not generate technical debt or incur in "reinventing the wheel" when integrating the Lambda function with Kubernetes.

IAM Authentication

With the Kubernetes integration from Python defined, the next implementation challenge arose: authentication. Since the Kubernetes client is agnostic to any cluster deployment model, it lacks the native capability to handle the coupled authentication model specific to AWS IAM.

The Kubernetes implementation on AWS EKS applies authentication through signed STS tokens. The native Kubernetes client allows setting authentication tokens as follows:

from kubernetes import client

def configure_k8s_client(cluster_endpoint, bearer_token):
    # Create the SDK configuration instance
    configuration = client.Configuration()

    # API Server URL obtained from the EKS console or via Boto3
    configuration.host = cluster_endpoint

    # Injecting the token into the Authorization header
    configuration.api_key = f'Bearer {bearer_token}'

    # Return the configured ApiClient to perform requests
    return client.ApiClient(configuration)

But, how do we generate that token considering it is signed specifically with AWS STS? Usually, this is obtained using the AWS CLI get-token command:

aws eks get-token --cluster-name <cluster-name>

However, since we cannot invoke external binaries, we must replicate the logic of this command using the AWS SDK. The advantage is that AWS CLI is natively developed in Python. Reviewing the AWS CLI source code repository, I found the correct implementation to generate the token natively from Python using the TokenGenerator class:

from awscli.customizations.eks.get_token import TokenGenerator, STSClientFactory
from botocore import session

def get_eks_token(cluster_name):
    """
    Generates an authentication token for an EKS cluster
    using the native AWS CLI implementation.
    """
    session_aws = session.get_session()
    sts_client = STSClientFactory(session_aws).get_sts_client()

    return TokenGenerator(sts_client).get_token(cluster_name)

For this implementation to work, it is essential to install the awscli library with pip.

Cross-account configuration

Having solved the integration and authentication challenges, there is one last technical aspect to address: cross-account configuration. Since the Lambda function and the EKS cluster were in different accounts, a cross-account configuration had to be established. For the Lambda function to access the EKS cluster, it must have permission to execute the eks:DescribeCluster action. I reviewed the list of services that support resource-based policies to validate the possibility of granting this permission without creating an additional IAM role. However, EKS does not support these types of policies, so a cross-account role must be created in the account where the EKS cluster resides to be assumed by the Lambda function. This role must grant the eks:DescribeCluster action. Below are the details of the trust policy and the permissions policy for this new role:

Trust Policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<LAMBDA_ACCOUNT_ID>:role/<LAMBDA_EXECUTION_ROLE_NAME>"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Permissions Policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "eks:DescribeCluster"
      ],
      "Resource": "arn:aws:eks:<REGION>:<EKS_ACCOUNT_ID>:cluster/<CLUSTER_NAME>"
    }
  ]
}

Final architecture

The code for the final implementation is as follows:

import os
import boto3
from awscli.customizations.eks.get_token import TokenGenerator, STSClientFactory
from botocore import session
from kubernetes import client

# --- 1. AWS Authentication Logic (Cross-Account) ---

def get_assumed_credentials(role_arn):
    """
    Assumes the role in the destination account to obtain temporary credentials.
    Uses the Lambda execution role as the trust entity.
    """
    sts = boto3.client('sts')
    response = sts.assume_role(
        RoleArn=role_arn,
        RoleSessionName="EKS-Lambda-CrossAccount"
    )
    return response['Credentials']

def generate_eks_token(cluster_name, credentials):
    """
    Generates the EKS token by temporarily setting system environment variables
    so that the awscli TokenGenerator class correctly signs the request.
    """
    # Environment variable setting (Native AWS CLI strategy)
    os.environ['AWS_ACCESS_KEY_ID'] = credentials['AccessKeyId']
    os.environ['AWS_SECRET_ACCESS_KEY'] = credentials['SecretAccessKey']
    os.environ['AWS_SESSION_TOKEN'] = credentials['SessionToken']

    try:
        # Replicating the 'aws eks get-token' behavior
        session_aws = session.get_session()
        sts_client = STSClientFactory(session_aws).get_sts_client()
        token = TokenGenerator(sts_client).get_token(cluster_name)
    finally:
        # Security hygiene: Remove credentials from the environment
        os.environ.pop('AWS_ACCESS_KEY_ID', None)
        os.environ.pop('AWS_SECRET_ACCESS_KEY', None)
        os.environ.pop('AWS_SESSION_TOKEN', None)

    return token

# --- 2. Kubernetes Client Configuration Logic ---

def get_k8s_custom_client(cluster_name, credentials):
    """
    Obtains cluster data (endpoint) and configures the K8s client 
    using CustomObjectsApi to interact with CRDs.
    """
    # The EKS client must use the assumed credentials to describe the cluster
    eks_client = boto3.client(
        'eks',
        aws_access_key_id=credentials['AccessKeyId'],
        aws_secret_access_key=credentials['SecretAccessKey'],
        aws_session_token=credentials['SessionToken'],
        region_name=os.environ.get('AWS_REGION')
    )

    cluster_info = eks_client.describe_cluster(name=cluster_name)
    endpoint = cluster_info['cluster']['endpoint']

    # Generating the secure access token
    token = generate_eks_token(cluster_name, credentials)

    # Kubernetes SDK configuration
    k8s_config = client.Configuration()
    k8s_config.host = endpoint
    k8s_config.verify_ssl = False  # Note: In prod, using CA data is recommended
    k8s_config.api_key = {'authorization': f'Bearer {token}'}

    api_instance = client.ApiClient(k8s_config)
    return client.CustomObjectsApi(api_instance)

# --- 3. Generic Operation on Custom Resources ---

def get_custom_resource(custom_api, group, version, namespace, plural, name):
    """
    Queries a generic Custom Resource. 
    Ideal for managing operators (e.g., Strimzi, Argo, etc.)
    """
    try:
        return custom_api.get_namespaced_custom_object(
            group, version, namespace, plural, name
        )
    except client.exceptions.ApiException as e:
        if e.status == 404:
            return None
        raise e

# --- 4. Lambda Handler ---

def lambda_handler(event, context):
    # Environment variables configured in the Lambda
    CLUSTER_NAME = os.environ['CLUSTER_NAME']
    ROLE_ARN_DESTINO = os.environ['CROSS_ACCOUNT_ROLE_ARN']

    # Custom Resource coordinates (may come from the event)
    CR_GROUP = event.get('group')    
    CR_VERSION = event.get('version') 
    CR_PLURAL = event.get('plural')   
    NAME = event.get('name')
    NAMESPACE = event.get('namespace', 'default')

    # Execution flow: Assume Role -> Configure Client -> Execute Operation
    creds = get_assumed_credentials(ROLE_ARN_DESTINO)
    k8s_client = get_k8s_custom_client(CLUSTER_NAME, creds)

    resource = get_custom_resource(
        k8s_client, CR_GROUP, CR_VERSION, NAMESPACE, CR_PLURAL, NAME
    )

    return {
        'statusCode': 200,
        'body': resource if resource else "Resource not found"
    }

Final note

With the previous implementation, the connectivity and authentication layer is fully resolved. The last step, common to any Kubernetes integration, is to ensure the cluster recognizes the identity we just created.

To do this, simply map the assumed role's ARN in Account B within Kubernetes. This is managed through EKS Access Entries (or the aws-auth ConfigMap), linking it to the required RBAC Roles to operate the resources. This is the standard security procedure for granting granular permissions once the connection is established.

You can check the details of this final configuration in the official documentation.

DEV Community