Laura

Posted on Apr 7 • Edited on May 29 • Originally published at lalidev.hashnode.dev

Course 2 of 3: CI/CD for Generative AI Applications ⚒️

#aws #genai #devops #awsdevops

Originally published on my other Hashnode blog on March 29, 2026.

Introduction
Intro to DevOps
Infrastructure As Code (IaC) in DevOps
1. The Hidden Cost of Manual Processes
2. IaC Benefits
3. AWS IaC Tools Overview
Code, Build And Test Phases
1. Code Phase
2. Build Phase
3. Test Phase
AI Capabilities in DevOps Workflows
Testing GenAI Apps
Tests in the CI Flow
Continuous Integration (CI)
Hands-on Labs: Set Up a CI/CD Pipeline
1. Create the CodePipeline
2. Create the CodeDeploy project
3. Adding CodeDeploy to CodePipeline
Serverless Deployment Strategies
AWS CodeDeploy in Your Pipeline
CI/CD for Infrastructure
Automate Infra Deployment With CDK in CI/CD Pipeline
Monitoring Your Infrastructure
1. Monitor, Log, and Audit With CloudTrail
2. AWS CloudWatch
3. A Powerful Mix: CloudWatch + CloudTrail
4. Monitoring with AWS X-Ray
Operating with Confidence
1. Configuration Change Detection with AWS Config
2. AWS Systems Manager
Wrapping Up Part Two: The Journey Continues

Introduction

This is the second blog post in a series of three, where I share my experience studying and earning the DevOps and AI on AWS Specialization from Coursera. If you didn't read the first blogpost, here's the link in case you want to have a look :)

Intro To DevOps

The heart of the first module is about what's the problem DevOps methodologies are solving: Get software updates to production as quickly as possible and keep quality at a high-level. DevOps mainly focus on two things: collaboration and automation.

When talking about the steps involved in creating and deploying software, we refer to sharing our work in source code repositories, building (creating an artifact to deploy the application), and then testing the application to ensure everything functions as expected. We usually automate this process with continuous integration to automatically build and run tests, so developers integrate their changes frequently.

After we have the artifact built and test, we are ready to release and deploy, the software is ready to go. We take this artifact and deliver to the servers that host the application. We can use tools for continuous deployment.

Another important aspect is automating the creation and updating of infrastructure and, ideally, we aim to include a step for proactive anomaly detection to quickly identify any unusual activity that might degrade our application's performance.

In summary, the process involves continuous integration, automated testing, continuous deployment, and infrastructure as code.

Infrastructure As Code (IaC) in DevOps

What is Infrastructure as Code? Let's see the definition based on AWS website:

Infrastructure as code (IaC) is the ability to provision and support your computing infrastructure using code instead of manual processes and settings. Any application environment requires many infrastructure components like operating systems, database connections, and storage. Developers have to regularly set up, update, and maintain the infrastructure to develop, test, and deploy applications.

Manual infrastructure management is time-consuming and prone to error—especially when you manage applications at scale. Infrastructure as code lets you define your infrastructure's desired state without including all the steps to get to that state. It automates infrastructure management so developers can focus on building and improving applications instead of managing environments. Organizations use infrastructure as code to control costs, reduce risks, and respond with speed to new business opportunities.

The ultimate goal is to automate as much of the tasks as possible. We move away from the manual approach. We want to have the task described as files.

Going back to the TravelGuide App in Course 1 of this DevOps and AI Specialization on AWS, we understand that the application's code requires a platform to operate, such as compute resources (in this case an EC2 instance).

Our application upgrade also requires the creation of certain cloud resources. We are utilizing a Bedrock knowledge base that needs an S3 bucket as a data source. The EC2 instances running the application will need a VPC network. Additionally, an IAM role is required for authentication, along with IAM policies to manage permissions.

Infrastructure also includes the configuration of these resources. This configuration is not static, eventually, we'll need to modify and maintain the infrastructure, and we must be agile when making these changes.

In AWS, this is essential for scaling and automating tasks, particularly in AI-driven applications such as the generative AI feature in the TravelGuideApp.

Adopting IaC eliminates repetitive manual tasks, ensuring infrastructure changes are documented, consistent, and easily auditable.

The Hidden Cost of Manual Processes

💸 Manually stopping and starting EC2 instances or modifying configurations is time-consuming.

💸 Inconsistent changes across instances increase the likelihood of mistakes.

💸 Managing large numbers of resources is impractical without automation (hard to scale).

💸 Limited traceability: Manual changes lack documentation and a change history (limited traceability).

IaC Benefits

👉 Documentation: The code can act as a form of documentation. We can check the changes on the source control (full history changes).

👉 Pull requests reviews: Other members of the team can comment on the infrastructure changes.

👉 Scale: Manual work consumes time that could be better spent on other tasks. Is better at scale when I need to make hundreds of changes.

👉 No human error: Automating tasks removed the human error with manual process.

AWS IaC Tools Overview

The AWS CloudFormation is a service that helps you model and set up your AWS resources . You create a template (JSON or YAML files) that describes all the AWS resources that you want. It uses a declarative language to define the desired state.

The AWS Cloud Development Kit or CDK is an open-source framework we use to model and provision cloud-based applications with familiar programming languages. CDK uses CloudFormation to create resources.

More on this tools on the "CI/CD for Infrastructure" section later.

Code, Build And Test Phases

Code Phase

Without developer and operation teams being integrated and collaborating on a regular basis, it might take a while to figure things out when something goes wrong. With DevOps this has evolved, these two teams can work together as a unified team. This impacts how developer writes and manage the code from the beginning to deployment.

Both teams work together more closely, collaboration becomes a key player here. They help each other and understand more about the whole application lifecycle. Developers needs to be mindful of how the code will be deployed, how can me monitored, maintained and perform in different environments.

Operations could provide feedback on how the application is performing and provide insights to developers so they can optimize the application of fix bugs. They may also assess code for security, observability, monitoring, and performance.

Occasionally, code changes can affect operations in ways developers might not immediately consider. Involving individuals with different perspectives can help address potential issues early on. Proactive communication enables both teams to anticipate issues earlier in the development cycle.

Without DevOps, much of the development process might be manual, making automation crucial. You would want to have processes that automates the application and infrastructure deployment.

Infrastructure-as-code tools allow you to create templates for infrastructure, write scripts for testing, and use automated deployment tools. This ensures that your code not only functions correctly but also includes the necessary assets for building, testing, and deploying through an automated pipeline.

How developers commit or deploy code can change with continuous integration and continuous deployment or delivery or CI/CD. This enables your code to be tested more frequently with automated tools, allowing for incremental deployment of changes. By making numerous small updates instead of infrequent, larger ones, changes reach end-users more quickly, accelerating the feedback loop.

Developers should adopt the best practice of committing smaller, more frequent changes. Additionally, each commit must be production-ready, as it could be deployed to users at any moment.

Finally, monitoring and feedback loops become integral to the development process. Developers become more aware in the operation of the application, integrating tools for logs and metrics to quickly identify issues.

Build Phase

This phase comes right after the code phase. The build phase will take the code and compile it if needed, depending on the programming language being used. This phase also includes automated testing and some linter process (linters ensure higher code quality).

The end result of this phase would be an artifact to be deployed.

If any of these steps (retrieving dependencies, compiling, packaging, or testing) fails, it results in a broken build. A broken build indicates that the code in the deployment branch is not in a functional state. In this case we want to get our code back to a good state as quick as possible.

In modern DevOps practices, it's typical to run a build with every commit. There's a service called AWS CodeBuild, is a fully managed service. It compiles your source code, runs unit tests, and generates artifacts ready for deployment. It integrates with other AWS services like AWS CodePipeline and AWS CodeDeploy.

AWS CodeBuild is the tool we use for continuous integration, it can scale up and run multiple builds while multiple developers are working on the application code.

To configure builds for your application, include a buildspec.yml file with your source code. This file outlines your desired build process, which AWS CodeBuild reads and executes.

Test Phase

Testing can also save money. The earlier we catch errors in the development process, the less expensive they are to fix.

We have Functional and Non-functional Testing. We conduct various types of testing to prevent issues from occurring in production.

While discussing continuous integration, we can add automated tests to every build. Functional testing begins with unit testing, these tests run quickly and provide feedback. We can include a linter to assess code quality. The goal of all tests is consistent: to prevent any defects from reaching production.

By incorporating DevOps principles, we can automate much of our testing, ensuring rapid feedback if any changes introduce defects.

AI Capabilities in DevOps Workflows

Generative AI applications, such as those using AWS Bedrock, present distinct challenges and opportunities within a DevOps workflow.

Returning to the TravelGuide App from course 1, we aim to follow good DevOps practices, and you might wonder if working with GenAI changes anything in the process. The app calls the AWS API in the same way other services do. Their behavior can vary based on customizations like prompt engineering and model fine-tuning. What sets it apart are the features of Bedrock that allow us to customize our responses.

We can customize by fine-tuning or pre-training a model, and it is relatively straightforward to operationalize because we access features through an API rather than building our own model. If we customize something we want to monitor the benefits of this customization, have these changes improved my app? To measure improvement, Bedrock metrics are being created by model, we can run tests and check the metrics created by Bedrock.

Useful metrics

InvocationLatency (Measure response time changes due to prompt/model updates)
Input/Output Token Count (Track token usage to optimize cost and performance)
Number of Invocations (Monitor service usage patterns)
Custom Metrics (CloudWatch, capture user feedback directly through the application).

Testing GenAI Apps

The responses from Bedrock are non-deterministic, the same input can yield different outputs or behaviors in different runs, resulting in unpredictable and non-repeatable behavior.

We can test the code in isolation because the unit tests never talk to Bedrock. We can write our own simulated responses instead of having to do setup to reproduce the same specific edge case. You can now get a sense of this isn't going to be too different from writing regular unit tests that is expecting a response from a database. I would want to mock the response so I can control exactly the response I get from Bedrock service.

Let's explore how to perform API calls to retrieve and generate data from the Bedrock knowledge base. First, we need to examine the response from the retrieve and generate API.

HTTP/1.1 200
Content-type: application/json

{
   "citations": [ 
      { 
         "generatedResponsePart": { 
            "textResponsePart": { 
               "span": { 
                  "end": number,
                  "start": number
               },
               "text": "string"
            }
         },
         "retrievedReferences": [ 
            { 
               "content": { 
                  "audio": { 
                     "s3Uri": "string",
                     "transcription": "string"
                  },
                  "byteContent": "string",
                  "row": [ 
                     { 
                        "columnName": "string",
                        "columnValue": "string",
                        "type": "string"
                     }
                  ],
                  "text": "string",
                  "type": "string",
                  "video": { 
                     "s3Uri": "string",
                     "summary": "string"
                  }
               },
               "location": { 
                  "confluenceLocation": { 
                     "url": "string"
                  },
                  "customDocumentLocation": { 
                     "id": "string"
                  },
                  "kendraDocumentLocation": { 
                     "uri": "string"
                  },
                  "s3Location": { 
                     "uri": "string"
                  },
                  "salesforceLocation": { 
                     "url": "string"
                  },
                  "sharePointLocation": { 
                     "url": "string"
                  },
                  "sqlLocation": { 
                     "query": "string"
                  },
                  "type": "string",
                  "webLocation": { 
                     "url": "string"
                  }
               },
               "metadata": { 
                  "string" : JSON value 
               }
            }
         ]
      }
   ],
   "guardrailAction": "string",
   "output": { 
      "text": "string"
   },
   "sessionId": "string"
}

Response Elements (from official docs):

citations: A list of segments of the generated response that are based on sources in the knowledge base, alongside information about the sources.
guardrailAction: Indicates whether a guardrail intervention is present in the response..
output: Contains the response generated from querying the knowledge base.
sessionId: The unique identifier of the session. When you first make a RetrieveAndGenerate request, Amazon Bedrock automatically generates this value. You must reuse this value for all subsequent requests in the same conversational session. This value allows Amazon Bedrock to maintain context and knowledge from previous interactions. You can't explicitly set the sessionId yourself.

The citation has a property retrievedReferences, this is one or more knowledge bases and metadata. The unit test will loop through the citations array and build a response that contains the generated response.

Tests in the CI Flow

At this point, the unit tests are running locally on my environment but what if I like to add these to the CI pipeline? AWS CodeBuild is great to run our tests.

We may have a local script for running the tests, and we need to configure the steps that CodeBuild will execute in a buildspec file. This file will specify what we want AWS CodeBuild to do, what commands you want to run in an specific build.

Here's the official docs in case you want to have a look.

Continuous Integration (CI)

We would like to run the tests every time a developer push a commit. We need to decide which step goes before the other, like an orchestrator. For this we can use a service called AWS CodePipeline. We can set up a pipeline once, it will detect when a new commit is being pushed and restart the pipeline whenever changes are detected.

The pipelines are built with stages (logical pieces that describe a phase in the pipeline, e.g., Source, Build, Test). We can add different stages later on, like a deploy stage. Approval and invoke actions are also available, to control what get's deployed to production or not. Waiting on this final check is an extra layer of safety.

We can also define actions (tasks executed within each stage) to run custom scripts, trigger other systems or perform checks. The customization is pretty good.

When we release changes in an execution pipeline, push a new commit, or merge a pull request, the source stage monitors the repository for changes, and each execution receives its own ID. When we run our pipeline, we can view the progress in real time and the status of each stage. If an action fails, we can retry it and also view the build logs, which is very useful for debugging.

Hands-on Labs: Set Up a CI/CD Pipeline

Create the CodePipeline

We'll set up the pipeline for CI/CD, starting with a CodeBuild project to perform linting, unit testing, and code coverage reporting.

At the top of the AWS Management Console, in the search bar, search for CodePipeline and Build Custom Pipeline.

On the Choose pipeline settings page, give it a name and configure the service role.

Continue to Add source stage page and configure the source.

On the Add build stage page, configure the build provider.

Select a name for your project and Create project. This launches a new browser window to create the CodeBuild project.

Select the Service role, you can either select an existing service role or a new service role.

In the Buildspec section, select Use a buildspec file.

Continue to CodePipeline. At this point, you can add a test page, but this time I choose Skip test stage and Skip deploy stage.

On the Review page, choose Create pipeline.

Once the pipeline is created, it automatically starts. Wait for both stages to display a status of Succeeded.

In the Build stage, click the AWS CodeBuild link and review the build logs. You can inspect both the Code Coverage and Test reports on the Reports tab.

Create the CodeDeploy project

In this task, we'll create the CodeDeploy app and deployment group to deploy the application to an EC2 instance.

Head over to the AWS console and search for a service called CodeDeploy.

Choose Applications and Create application. Select EC2 / On-premises and Create application.

Next, choose Create deployment group and select the service role.

Configure the environment, choose Amazon EC2 instances.

In the Agent configuration with AWS Systems Manager section, for Install AWS CodeDeploy Agent, select Never.

For Load balancer, clear Enable load balancing and choose Create deployment group.

Adding CodeDeploy to CodePipeline

In this task, we'll update the pipeline to add a new stage for deploying application updates to the EC2 instance. On the AWS Management Console search for CodePipeline.

Choose your app, in my case travelapp-pipeline pipeline, then choose Edit. Under Edit: Build, choose ＋ Add stage.

For Stage name, enter Deploy, then choose Add Stage.

Under Edit: Deploy, choose ＋ Add action group.

And add the following configuration

Finally, Save.

Go back to CodePipeline, scroll to the top of the pipeline, and select Release change. To confirm, click Release. Wait for the new Deploy stage to show Succeeded. The application has been successfully deployed by CodeBuild.

Serverless Deployment Strategies

Various strategies exist for deploying serverless applications. To deploy them effectively, it's important to understand these strategies and consider factors such as rollback, scaling, and monitoring.

Here's the definition of each deployment strategies from official documentation:

Blue/Green Deployment

Blue/green deployments provide releases with near zero-downtime and rollback capabilities. The fundamental idea behind blue/green deployment is to shift traffic between two identical environments that are running different versions of your application. The blue environment represents the current application version serving production traffic. In parallel, the green environment is staged running a different version of your application. After the green environment is ready and tested, production traffic is redirected from blue to green. If any problems are identified, you can roll back by reverting traffic back to the blue environment.

Linear Deployment

Linear deployment means traffic is shifted in equal increments with an equal number of minutes between each increment. You can choose from predefined linear options that specify the percentage of traffic shifted in each increment and the number of minutes between each increment.

Canary Deployment

The purpose of a canary deployment is to reduce the risk of deploying a new version that impacts the workload. The method will incrementally deploy the new version, making it visible to new users in a slow fashion. As you gain confidence in the deployment, you will deploy it to replace the current version in its entirety.

All-at-Once Deployment

All-at-once deployment means all traffic is shifted from the original environment to the replacement environment all at once.

Selecting the appropriate deployment strategy depends on the application and its specific requirements.

AWS CodeDeploy in Your Pipeline

After the continuous integration phase completes all tests and receives approval, AWS CodeDeploy is now ready to deploy the application to the production environment.

To incorporate the continuous deployment phase, we need to update the pipeline to include CodeDeploy in the CI/CD process. We can add a new stage and call it deploy and then select CodeDeploy as the action provider.

CI/CD for Infrastructure

A good practice is to maintain 2 different pipelines, one for the application code and one for the infrastructure changes.

Let's discuss elevating your automation by using infrastructure as code with AWS CloudFormation and its role in a CI/CD workflow.

AWS CloudFormation

AWS CloudFormation is a service that allows us to define our infrastructure in a file (JSON or YAML) and have CloudFormation handle the provisioning. The file is the template (something you define) and the infrastructure components are called stack (the infrastructure that gets created).

Let's check the following example, in this case AWS::S3::Bucket resource creates an Amazon S3 bucket:

{
  "Type" : "AWS::S3::Bucket",
  "Properties" : {
      "AbacStatus" : String,
      "AccelerateConfiguration" : AccelerateConfiguration,
      "AccessControl" : String,
      "AnalyticsConfigurations" : [ AnalyticsConfiguration, ... ],
      "BucketEncryption" : BucketEncryption,
      "BucketName" : String,
      "CorsConfiguration" : CorsConfiguration,
      "IntelligentTieringConfigurations" : [ IntelligentTieringConfiguration, ... ],
      "InventoryConfigurations" : [ InventoryConfiguration, ... ],
      "LifecycleConfiguration" : LifecycleConfiguration,
      "LoggingConfiguration" : LoggingConfiguration,
      "MetadataConfiguration" : MetadataConfiguration,
      "MetadataTableConfiguration" : MetadataTableConfiguration,
      "MetricsConfigurations" : [ MetricsConfiguration, ... ],
      "NotificationConfiguration" : NotificationConfiguration,
      "ObjectLockConfiguration" : ObjectLockConfiguration,
      "ObjectLockEnabled" : Boolean,
      "OwnershipControls" : OwnershipControls,
      "PublicAccessBlockConfiguration" : PublicAccessBlockConfiguration,
      "ReplicationConfiguration" : ReplicationConfiguration,
      "Tags" : [ Tag, ... ],
      "VersioningConfiguration" : VersioningConfiguration,
      "WebsiteConfiguration" : WebsiteConfiguration
    }
}

This code snippet is from the official AWS documentation. It list all possible properties.

Here's how it would look like in real life:

{
    "AWSTemplateFormatVersion": "2010-09-09",
    "Resources": {
        "S3Bucket": {
            "Type": "AWS::S3::Bucket",
            "Properties": {
                "PublicAccessBlockConfiguration": {
                    "BlockPublicAcls": false,
                    "BlockPublicPolicy": false,
                    "IgnorePublicAcls": false,
                    "RestrictPublicBuckets": false
                },
                "WebsiteConfiguration": {
                    "IndexDocument": "index.html",
                    "ErrorDocument": "error.html"
                }
            },
            "DeletionPolicy": "Retain",
            "UpdateReplacePolicy": "Retain"
        },
        "BucketPolicy": {
            "Type": "AWS::S3::BucketPolicy",
            "Properties": {
                "PolicyDocument": {
                    "Id": "MyPolicy",
                    "Version": "2012-10-17",                 
                    "Statement": [
                        {
                            "Sid": "PublicReadForGetBucketObjects",
                            "Effect": "Allow",
                            "Principal": "*",
                            "Action": "s3:GetObject",
                            "Resource": {
                                "Fn::Join": [
                                    "",
                                    [
                                        "arn:aws:s3:::",
                                        {
                                            "Ref": "S3Bucket"
                                        },
                                        "/*"
                                    ]
                                ]
                            }
                        }
                    ]
                },
                "Bucket": {
                    "Ref": "S3Bucket"
                }
            }
        }
    },
    "Outputs": {
        "WebsiteURL": {
            "Value": {
                "Fn::GetAtt": [
                    "S3Bucket",
                    "WebsiteURL"
                ]
            },
            "Description": "URL for website hosted on S3"
        },
        "S3BucketSecureURL": {
            "Value": {
                "Fn::Join": [
                    "",
                    [
                        "https://",
                        {
                            "Fn::GetAtt": [
                                "S3Bucket",
                                "DomainName"
                            ]
                        }
                    ]
                ]
            },
            "Description": "Name of S3 bucket to hold website content"
        }
    }
}

Or YAML version:

AWSTemplateFormatVersion: 2010-09-09
Resources:
  S3Bucket:
    Type: AWS::S3::Bucket
    Properties:
      PublicAccessBlockConfiguration:
        BlockPublicAcls: false
        BlockPublicPolicy: false
        IgnorePublicAcls: false
        RestrictPublicBuckets: false
      WebsiteConfiguration:
        IndexDocument: index.html
        ErrorDocument: error.html
    DeletionPolicy: Retain
    UpdateReplacePolicy: Retain
  BucketPolicy:
    Type: AWS::S3::BucketPolicy
    Properties:
      PolicyDocument:
        Id: MyPolicy
        Version: 2012-10-17                  
        Statement:
          - Sid: PublicReadForGetBucketObjects
            Effect: Allow
            Principal: '*'
            Action: 's3:GetObject'
            Resource: !Join 
              - ''
              - - 'arn:aws:s3:::'
                - !Ref S3Bucket
                - /*
      Bucket: !Ref S3Bucket
Outputs:
  WebsiteURL:
    Value: !GetAtt 
      - S3Bucket
      - WebsiteURL
    Description: URL for website hosted on S3
  S3BucketSecureURL:
    Value: !Join 
      - ''
      - - 'https://'
        - !GetAtt 
          - S3Bucket
          - DomainName
    Description: Name of S3 bucket to hold website content

To update or add a new stack CloudFormation will review the file for changes and execute the update. Similarly, for deletions, it will identify differences and proceed accordingly. If you wish to keep certain resources from being deleted, you can use the DeletionPolicy property.

Variables are not support on CloudFormation but the functionality can be achieved using parameters, mappings (for configuration lookups) and dynamic references (securely retrieve values from AWS Secrets Manager or Systems Manager Parameter Store).

We can also use nested stacks to break down complex template into smaller and reusable stacks. The CloudFormation Modules are reusable and self-contained configurations that can be used across team and projects (just like libraries).

To prevent errors in CloudFormation we could use a powerful method that uses a CloudFormation featured called change sets, it can predict the results of an update stack operation so we can check if those are the changes we want before proceeding. You do create a change set operation like this:

aws cloudformation create-change-set --stack-name MyStack \
    --change-set-name SampleChangeSet --use-previous-template \
    --parameters \
      ParameterKey="InstanceType",UsePreviousValue=true ParameterKey="KeyPairName",UsePreviousValue=true ParameterKey="Purpose",ParameterValue="production"

This way, we have a chance to review changes more carefully.

AWS CDK

The AWS Cloud Development Kit (CDK) is an open-source software development framework that allows developers to define cloud infrastructure using familiar programming languages.

CDK provides pre-built components called constructs which are abstractions of AWS resources.

This is a collection of pre-written modular and reusable pieces of code, called constructs, that you can use, modify, and integrate to develop your infrastructure quickly.

There are different construct levels, you can refer to the official docs.

AWS CDK works out of the box with AWS CloudFormation to deploy and provision infrastructure on AWS.

Choosing between AWS CDK and CloudFormation depends on the specific scenario, team expertise, and project complexity.

Automate Infra Deployment With CDK in CI/CD Pipeline

To automate deployment with CDK in a CI/CD pipeline, add a Deploy stage to the previously built CodePipeline and release the change.

In the this case, the application will authenticate with AWS services DynamoDB and Bedrock using the IAM role and instance profile created by the CDK stack.

Monitoring Your Infrastructure

Monitor, Log, and Audit With CloudTrail

AWS CloudTrail is a service that enables you to track API actions within an AWS account. It allows you to view the activity of resources and users based on the API calls made in the account. For example, if someone creates an S3 bucket, CloudTrail will log that.

This service is essential for tracking activity, as it helps identify misconfigurations or unauthorized access. It's also valuable for forensic analysis after a security event, providing historical data on all users and activities, and supports centralized management across multiple accounts.

This service offers several benefits, including enhanced cross-account visibility, simplified compliance auditing, and easier troubleshooting through aggregated logs.

AWS CloudWatch

When issues arise with your application, you'll want to address them as soon as possible and even before your customers realize it. A fantastic monitoring and observability service is AWS CloudWatch.

AWS CloudWatch comes with some metrics out of the box already but the cool thing is that you can create custom metrics. You can also set up an alarm based on those metrics, allowing the team to receive notifications when it enters the alarm state. This service includes more features like dashboards, logs, events, network monitoring, and more.

For deployment and CI/CD, Amazon CloudWatch is invaluable because it can immediately collect metrics and logs each time a new version of your application is deployed.

CloudWatch Logs agent is a software component that allows you to collect logs from your services. It allows you to monitor your infrastructure and applications more thoroughly than the default basic monitoring.

Finally, I wanted to talk about another cool feature called CloudWatch anomaly detection, you can enable anomaly detection for a metric and will use machine learning to identify unusual patterns in your data automatically. Isn't that amazing?

A Powerful Mix: CloudWatch + CloudTrail

Did you know that you can create CloudWatch metrics and alarms from CloudTrail data? You can set up a new trail and perform some actions so this new trail logs the events. Simply create a new trail, follow on-screen steps and enable CloudWatch Logs and the select log events.

Review and create the trail. Generate data by navigating the console and exploring various services to ensure the new trail logs the new data, regardless if it's done through the management console or programmatically, CloudTrail will capture all activity.

Finally, we can view the CloudTrail logs from the Trails menu option, however, we'll notice that these trails are somewhat difficult to scan when you're searching for a specific pattern or detail. If you enabled CloudWatch, you will be able to see CloudWatch Logs, not as a compressed text file but as logs which makes easier to do searches and navigate the data. We can create filters, and with these filters, we can create alarms.

Here's more about this in the official documentation.

Monitoring with AWS X-Ray

AWS X-Ray is a service that gathers data on requests handled by your application and offers tools to view, filter, and analyze this data, helping you identify issues and opportunities for optimization. It can track requests across various AWS resources and microservices applications, allowing you to identify delays and errors.

To capture data from my applications, we need to use AWS Distro for OpenTelemetry (ADOT). We can use an OpenTelemetry SDK to instrument our application and an ADOT collector to receive and export traces to the AWS X-Ray service. OpenTelemetry SDKs are an industry standard for tracing instrumentation and support AWS X-Ray, has a large number of library instrumentations implementations for a lot of different languages and it's vendor-agnostic. Alternatively, you can use the AWS X-Ray SDK, an AWS's proprietary distributed tracing solution, which is integrated with the AWS ecosystem.

After installing OpenTelemetry to instrument the application and beginning to receive traces in the AWS X-Ray service, we can now view the application's activity. This activity can be accessed on the X-Ray console, where you can view the service map generated from the traces, trace requests through various components, identify errors, and examine a visual map that illustrates the flow of requests.

Operating with Confidence

Configuration Change Detection with AWS Config

AWS Config is a service that offers a comprehensive view of the resources linked to our AWS account. It details their configurations, interrelationships, and any changes in these configurations and relationships over time. AWS Config can create a dashboard displaying noncompliant resources, it helps us understand the state of our AWS resources and how they evolve over time.

Even though AWS Config provides AWS managed rules, you can also create custom rules to check things you want and report back to AWS Config.

While AWS CloudTrail acts as a record, providing evidence of activities within the infrastructure, AWS Config highlights the changes that occur. Each service has a distinct log format, so CloudTrail details every aspect of the API call, whereas Config shows the state of the resource before and after a change.

🙌 AWS Config highlights the changes on the resource side, while CloudTrail provides the evidence.

Some benefits..

👉 Integrates effectively with other AWS services

👉 It monitors configuration changes for supported AWS services, logs the details, and maintains a history for analysis.

👉 The service can automatically evaluate resource configurations against compliance rules and identify any deviations.

AWS Systems Manager

We can gain centralized operational insights with AWS Systems Manager. It's a managed service that we can use to view and control the infrastructure on AWS. It simplifies operations such as patch management, configuration updates, and instance monitoring through a single interface, instead of managing servers manually.

It's designed to assist in mitigating and recovering from incidents that impact our applications hosted on AWS.

Key features include executing scripts or predefined workflows to simplifying repetitive tasks, applying security updates across instances, securely storing and retrieving credentials, obtaining metadata about instances, and managing them without the need for SSH, enhancing security.

There's an interesting tool of AWS Systems Manager called Parameter Store. It provides secure storage for secrets and other data. This helps enforce security and can be referenced from scripts, code or inside CloudFormation resources.

Wrapping Up Part Two: The Journey Continues

If you made it this far, THANK YOU so much 🥹🙏, it really means a lot. I started writing this series simply to document what I've been learning, and if even one person finds it useful or feels a little more inspired to explore AWS and cloud, then it was all worth it.

I especially hope this reaches other women who are curious about cloud but maybe haven't made the jump yet, trust me, you belong here just as much as anyone else, and the community is bigger and more welcoming than you might think.

Part 3 is on its way, so stay tuned, there's still more to cover and more to share.

See you in the next one 🚀

DEV Community