Dr. Malte Polley for AWS Community Builders

Posted on Mar 15

Implementing Automated End-to-End Testing: Leveraging Your CI/CD with Your Cloud Development Kit (CDK) App

#cdk #serverless #automation #testing

I recently embarked on a new project focused on our input management. I’m working for a German insurance broker that aims to modernize the handling of physical letters. Unfortunately, in the German insurance sector, we cannot utilize APIs or similar technical implementations. Instead, our account managers deal with letters by reading them, extracting information, and finally inputting all relevant details into our CRM.

The volume of physical letters I’m referring to is approximately 8,500 pages per month. Therefore, ensuring the quality of our event-based architecture is crucial. In this article, I will demonstrate how we implemented an end-to-end (E2E) testing approach within our CI/CD pipeline using the Cloud Development Kit (CDK) to create the infrastructure.

The Architecture

I won’t spend too much time discussing the architecture itself, as I might write another blog post about this use case in the future. However, I will provide the minimal amount of information necessary to give you a clue. Essentially, Figure 1 illustrates everything. We have an on-premise scanner that forwards all scans to the Simple Email Service (SES). Each letter is treated as an email for SES. SES stores all emails as MIME objects within S3, and a Lambda function retrieves the PDF attachment, subsequently triggering AWS Step Functions to process the PDF. Using AWS Textract, we obtain the OCR results, and we utilize Amazon Bedrock to conduct intelligent document processing (IDP). From this IDP, we extract specific information to create a new filename (I know, old school!). Finally, a Lambda function pushes the processed PDF into an SMB file share.

Implementing a Test Strategy: Demystifying E2E Tests

I won’t reiterate all the blog posts about designing test cases, but I will highlight two fundamental aspects. First, we need to establish a test hierarchy. Unit tests form the base layer, integration tests are the mid-layer, while E2E tests represent the top layer in every test implementation case. Unit tests evaluate whether a specific input produces the expected output from a function. Integration tests assess whether a component can create and handle the results from a second component. E2E tests evaluate whether all components of a system work together as intended. While hopefully everyone is using unit tests for quality assurance of Lambdas and other code components, integration tests and E2E tests are considerably more complex and costly.

Let’s take a step back and analyze what the described architecture is doing to clarify the complexity of E2E testing. First, we need to process an email. Next, we process the PDF, and finally, we push the processed PDF file into a specific folder within our cloud environment. From this abstraction, our E2E tests need to simulate sending an email to mock the scanner. Secondly, we need to track the events generated by SES. This can be tricky if you want to capture the specific event and the Lambda function that processes the MIME object. Instead, I utilized my AWS Step Function. Listing the executions of this Step Function seemed like a good idea. Finally, I could create a lookup into my SMB file share to search for my test case PDF.

Prerequisites for This Approach

You might already be considering the assumptions we need to meet, right? If your CI/CD pipeline is not running within a network segment that can route to the SMB file share, you’re out of luck. Therefore, network integration can be crucial. Additionally, it’s essential to consider IAM permissions. We use the CDK bootstrap command to create all relevant components for the deployments. However, this collection of resources does not permit listing and describing resources unrelated to the CDK. Thus, part of your application needs to be an IAM role that can be assumed by your CI/CD server. Consequently, you will need to modify the CDK bootstrap collection to allow you to assume your new IAM role for testing purposes.

Finally, since the CDK supports dynamic resource naming, which you should utilize, we cannot rely on deterministic naming schemas. The solution is to expose the email, step function, and IAM role for testing through CloudFormation Outputs. Each output will have a human-readable and constant export name, allowing us to retrieve dynamic resource names.

Lastly, we need to verify our email and domain entities within SES to utilize both. For emails, you need to click a link, and for domains, you must add records to your domain service, whether it’s Route 53 or another provider.

Discussion Around This Approach

Let’s discuss the previous points. First, the scanner depicted in the figure is not utilized in this E2E testing scenario. Fair point. My question is: does the effort to implement the scanner itself genuinely justify the evaluation of the AWS architecture? The scanner might be located separately from your environment. Ultimately, you are correct; this is not entirely E2E, but we are quite close.

Another point I would like to mention is that this E2E test uses excellent test cases. This means we use PDFs that can be effectively evaluated and processed by Textract and Bedrock, ensuring that I know how the final file will look. From my perspective, you could criticize this as well, but the essence of E2E testing is not about quality assessment in terms of high volume. It’s about configuring the system correctly to achieve predictable results. The overall performance of the system is monitored in production.

I'm also hearing you say that certain parts of the architecture are not fully automated in the initial deployment, such as the SES entities. You are entirely correct. Within SES, we create independent resources like the rule to forward incoming emails to S3 and Lambda, as well as dependent resources like domain and email entities, which need to be approved or created and retrieved, such as the SMTP credentials. That said, there is manual effort involved, but it is a one-time task. As I mentioned earlier, moving towards E2E testing requires more time and manual intervention compared to unit testing.

Implementing the Script

Let’s dive into the scripts. First, here’s the IAM Role and a sample for the CloudFormation output:

from aws_cdk import aws_iam as iam, CfnOutput
from constructs import Construct

class IAMRoleStack(Stack):
    """Create the IAM deployment."""

    def __init__(
        self,
        scope: Construct,
        construct_id: str,
        **kwargs,
    ) -> None:
        """Create the IAM deployment.

        Args:
            scope (Construct): CDK App scope
        """
        role = iam.Role(
            self,
            id="TestIAMRole",
            assumed_by=iam.ServicePrincipal("ec2.amazonaws.com"),
            description="IAM Role for E2E testing",
        )

        role.add_to_policy(
            iam.PolicyStatement(
                actions=[
                    "states:ListExecutions",
                    "states:DescribeExecution",
                ],
                resources=["*"],
            )
        )

        CfnOutput(
            self.scope,
            id="RoleArn",
            value=role.role_arn,
            export_name="E2eTestIAMRole",
            description="ARN of the created IAM Role",
        )

Next, here’s the bash implementation for our CI/CD pipeline, in my case using Azure DevOps but it can be any CI/CD tool:

export AWS_DEFAULT_REGION=${{ parameters.AwsRegion }}
KST=$( aws sts assume-role --endpoint-url https://sts.${{ parameters.AwsRegion }}.amazonaws.com --role-arn arn:aws:iam::${{ parameters.AccountID }}:role/${{ parameters.CDKDeploymentRole }}-${{ parameters.AccountID }}-${{ parameters.AwsRegion }} --role-session-name $(Build.SourceVersion) --duration-seconds 3600)
export AWS_ACCESS_KEY_ID=$(echo $KST | jq -r .Credentials.AccessKeyId)
export AWS_SECRET_ACCESS_KEY=$(echo $KST | jq -r .Credentials.SecretAccessKey)
export AWS_SESSION_TOKEN=$(echo $KST | jq -r .Credentials.SessionToken)
export EMAIL_PROCESSOR=$(aws cloudformation describe-stacks --stack-name mrht-developer-posteingang-stack --query "Stacks[0].Outputs[?ExportName=='ExportEmailProcessor'].OutputValue" --output text)
export STATE_MACHINE_ARN=$(aws cloudformation describe-stacks --stack-name mrht-developer-posteingang-stack --query "Stacks[0].Outputs[?ExportName=='ExportStateMachine'].OutputValue" --output text)
export IAM_ROLE_ARN=$(aws cloudformation describe-stacks --stack-name mrht-developer-posteingang-stack --query "Stacks[0].Outputs[?ExportName=='E2eTestIAMRole'].OutputValue" --output text)

KST=$( aws sts assume-role --endpoint-url https://sts.${{ parameters.AwsRegion }}.amazonaws.com --role-arn $IAM_ROLE_ARN --role-session-name $(Build.SourceVersion) --duration-seconds 3600)
unset AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN
export AWS_ACCESS_KEY_ID=$(echo $KST | jq -r .Credentials.AccessKeyId)
export AWS_SECRET_ACCESS_KEY=$(echo $KST | jq -r .Credentials.SecretAccessKey)
export AWS_SESSION_TOKEN=$(echo $KST | jq -r .Credentials.SessionToken)
python3 src/e2e_test/e2e_test.py

The first aws sts assume-role command helps us to retrieve the CloudFormation outputs. The required permissions for this are part of the CDK bootstrap collection. The second aws sts assume-role command enables us to initiate the testing.

Now, let’s take a look at the e2e_test.py file:

import os
import time
import logging
import boto3
from botocore.exceptions import ClientError, ParamValidationError
import smbclient
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.mime.application import MIMEApplication

logging.basicConfig(
    level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
)
logger = logging.getLogger(__name__)

class SMBClient:
    """Manage an SMB connection to access shared files."""

    def __init__(self, username: str, password: str, servername: str):
        """Initialize the SMBClient with connection details.

        Args:
            username (str): The username for the SMB connection.
            password (str): The password for the SMB connection.
            servername (str): The hostname or IP address of the SMB server.
        """
        self.username = username
        self.password = password
        self.port = 445
        self.host_name = servername

    def connect_smb(self, smb_client: smbclient) -> None:
        """Connect to the SMB server.

        Args:
            smb_client (smbclient): The SMB client instance.
        """
        logger.info(f"Trying to connect to {self.host_name} as {self.username}.")
        try:
            smb_client.register_session(
                server=self.host_name, username=self.username, password=self.password
            )
            logger.info("Session established successfully.")
        except Exception as e:
            logger.exception(f"Failed to connect to SMB share: {e}")
            raise
        return True

    def list_directory(self, path: str, smb_client: smbclient) -> None:
        """List the contents of an SMB directory.

        Args:
            path (str): The path to the directory.
            smb_client (smbclient): The SMB client instance.

        Returns:
            List[str]: A list of the directory contents, or an empty list if an error occurs.
        """
        entries = []
        try:
            for entry in smb_client.listdir(path):
                logger.info(f"Found entry: {entry}")
                entries.append(entry)
            return entries
        except Exception as e:
            logger.exception(f"Error listing directory: {e}")
            return []

def send_email(
    sender: str,
    recipient: str,
    subject: str,
    body_path: str,
    attachment_path: str,
    smtp_server: str,
    smtp_port: int,
    smtp_username: str,
    smtp_password: str,
):
    """Send an email with a PDF attachment using SMTP.

    Args:
        sender (str): The email address of the sender.
        recipient (str): The email address of the recipient.
        subject (str): The subject of the email.
        body_path (str): The path to the text file containing the body of the email.
        attachment_path (str): The path to the PDF file to be attached.
        smtp_server (str): The SMTP server address.
        smtp_port (int): The SMTP server port.
        smtp_username (str): The SMTP username.
        smtp_password (str): The SMTP password.
    """
    msg = MIMEMultipart()
    msg["Subject"] = subject
    msg["From"] = sender
    msg["To"] = recipient

    with open(body_path, "r", encoding="utf-8") as text_file:
        text = MIMEText(text_file.read())
        msg.attach(text)

    with open(attachment_path, "rb") as pdf_file:
        attachment = MIMEApplication(pdf_file.read(), _subtype="pdf")
        attachment.add_header(
            "Content-Disposition", "attachment", filename="attachment.pdf"
        )
        msg.attach(attachment)

    try:
        with smtplib.SMTP(smtp_server, smtp_port) as s:
            s.starttls()
            s.login(smtp_username, smtp_password)
            s.send_message(msg)
            logger.info("Email sent successfully.")
    except Exception as e:
        logger.error(f"Error sending email: {str(e)}")
        raise


def main():
    """Run E2E test."""
    try:
        state_machine_arn = os.getenv("STATE_MACHINE_ARN")
        stage = os.getenv("STAGE")
        smb_user = os.getenv("SMB_USERNAME")
        smb_password = os.getenv("SMB_PASSWORD")
        smb_host_ip = os.getenv("SMB_SERVER")
        smb_path = f"{smb_host_ip}\\folder$\\folder\\"
        smtp_server = os.getenv("SMTP_SERVER")
        smtp_port = int(os.getenv("SMTP_PORT"))
        smtp_username = os.getenv("SMTP_USERNAME")
        smtp_password = os.getenv("SMTP_PASSWORD")
    except KeyError as e:
        logger.exception(e)
        raise

    try:
        sender = "scan2mail@example.com"
        recipient = "scan2mail@example.com"
        subject = "Test Email with Attachment"
        body_path = "./src/e2e_test/email_body.txt"
        attachment_path = "./src/e2e_test/test.pdf"

        send_email(
            sender=sender,
            recipient=recipient,
            subject=subject,
            body_path=body_path,
            attachment_path=attachment_path,
            smtp_server=smtp_server,
            smtp_port=smtp_port,
            smtp_username=smtp_username,
            smtp_password=smtp_password,
        )

        time.sleep(15)

        stepfunctions_client = boto3.client("stepfunctions")

        list_executions_response = stepfunctions_client.list_executions(
            stateMachineArn=state_machine_arn, maxResults=1, statusFilter="RUNNING"
        )

        if list_executions_response["executions"]:
            execution_arn = list_executions_response["executions"][0]["executionArn"]
            logger.info(f"Found execution: {execution_arn}")

            while True:
                response = stepfunctions_client.describe_execution(
                    executionArn=execution_arn
                )
                status = response["status"]

                if status != "RUNNING":
                    logger.info(f"Step Function execution status: {status}")
                    break

                logger.info(
                    "Step Function is still running. Checking again in 20 seconds..."
                )
                time.sleep(20)
        else:
            logger.warning("No running execution found.")
            raise

    except (ClientError, ParamValidationError) as e:
        logger.error(f"An error occurred: {str(e)}")
        raise

    counter = 0
    try:
        smb = SMBClient(
            username=smb_user,
            password=smb_password,
            servername=smb_host_ip,
        )
        smb.connect_smb(smbclient)
        response = smb.list_directory(smb_path, smbclient)
        desired_contract_id = "__102865946013"
        for file_name in response:
            if desired_contract_id in file_name:
                logger.info(f"Found the ID '{desired_contract_id}' in the directory contents.")
                counter += 1
    except Exception as e:
        logger.exception(e)
        raise

    if counter == 0:
        raise ValueError("Target ID not found")

    logger.info("Finished E2E test")

if __name__ == "__main__":
    try:
        main()
    except Exception as e:
        logger.error(f"An error occurred: {str(e)}")
        raise

This script is executed as the last step of my CI/CD pipeline after a successful CDK deployment. First, we gather all relevant variables from the CI/CD pipeline environment. Obviously, you could also use AWS services like the Simple Systems Manager (SSM) Parameter Store or AWS Secrets Manager. With that, we grab a test.pdf from the repository, as well as a test body for the email. If you leverage multiple cases, it’s probably a better idea to implement a test case store in S3.
Using the credentials and the test case, we can create an email that will be sent to our desired email address. In this case, the sender and recipient are the same. After the email has been successfully sent, we wait for a while and then proceed to list and describe our Step Function as shown in Figure 1. If the Step Function completes successfully, the script lists the SMB target directory and searches for our desired contract ID from the test case PDF. Finally, we clean up the test from the SMB directory.

Final Thoughts

The conception and implementation of E2E tests undoubtedly require significant effort, and I completely agree that this is a relatively simple example. However, automation helps ensure quality and saves a substantial amount of time.

The manual process would look something like this: sending an email using your favorite client, hoping you don’t overlook the correct test case, waiting for the file to appear in the SMB file store, and checking whether the contract ID matches your test case. This manual approach would certainly take longer than the discussed automated method, which takes about 1 minute and 30 seconds.

What do you think? Did I get things right? What is your opinion? I’m happy to discuss further and answer any questions you may have.

Happy Coding :-)!

Your AI Code Assistant

Ask anything about your entire project, code and get answers and even architecture diagrams. Built to handle large projects, Amazon Q Developer works alongside you from idea to production code.

Start free in your IDE

Top comments (0)

Best Practices for Running Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK

This post discusses the process of migrating a growing WordPress eShop business to AWS using AWS CDK for an easily scalable, high availability architecture. The detailed structure encompasses several pillars: Compute, Storage, Database, Cache, CDN, DNS, Security, and Backup.

Read full post

DEV Community

Implementing Automated End-to-End Testing: Leveraging Your CI/CD with Your Cloud Development Kit (CDK) App

The Architecture

Implementing a Test Strategy: Demystifying E2E Tests

Prerequisites for This Approach

Discussion Around This Approach

Implementing the Script

Final Thoughts

Your AI Code Assistant

Top comments (0)

Best Practices for Running Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK

Read next

Building a Scalable, Cost-Effective Live GPS Tracking System for 300+ Trucks: Lessons from a Logistics Tech Journey

Spring Boot 3.4 application on AWS Lambda- Part 1 AWS Serverless Java Container

Testing Temporary URLs in Laravel Storage

Say Goodbye to WebDriver: Modern Alternatives for Browser Automation – Part 1

Okay