<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Harsh Vardhan Singh</title>
    <description>The latest articles on DEV Community by Harsh Vardhan Singh (@harsh_vardhansingh_69340).</description>
    <link>https://dev.to/harsh_vardhansingh_69340</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1597633%2F279ecbfe-005d-4343-a858-30dfd7b23105.jpg</url>
      <title>DEV Community: Harsh Vardhan Singh</title>
      <link>https://dev.to/harsh_vardhansingh_69340</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/harsh_vardhansingh_69340"/>
    <language>en</language>
    <item>
      <title>Create a cross-account glue Job using AWS CDK</title>
      <dc:creator>Harsh Vardhan Singh</dc:creator>
      <pubDate>Tue, 31 Dec 2024 10:11:29 +0000</pubDate>
      <link>https://dev.to/harsh_vardhansingh_69340/create-a-cross-account-glue-job-using-aws-cdk-16ek</link>
      <guid>https://dev.to/harsh_vardhansingh_69340/create-a-cross-account-glue-job-using-aws-cdk-16ek</guid>
      <description>&lt;p&gt;AWS Glue is a powerful service for data integration and ETL (Extract, Transform, Load) workloads, making it easier to prepare and transform data for analytics. If you’re looking to automate the creation of Glue jobs using Infrastructure as Code (IaC), AWS CDK (Cloud Development Kit) is a great choice. In this post, we’ll walk through the process of defining and deploying an AWS Glue job using AWS CDK. We will be creating a job that can connect to cross-account RDS cluster and execute an etl scripts.&lt;/p&gt;

&lt;p&gt;Note : We will not be covering AWS CLI and CDK package setup in this article.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fppch87rfqlraf43532mw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fppch87rfqlraf43532mw.png" alt="Image description" width="800" height="568"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Define the VPC stack for Glue Job
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export class GlueVpcStack extends DeploymentStack {
  public readonly vpc: Vpc;
  public readonly vpcDefaultSecurityGroupId: string;
  constructor(scope: Construct, id: string, props: DeploymentStackProps) {
    super(scope, id, props);

    const vpc = new Vpc(this, 'VPCForGlue', {
      ipAddresses: IpAddresses.cidr(Vpc.DEFAULT_CIDR_RANGE),
      subnetConfiguration: [
        {
          cidrMask: 24,
          name: 'Public',
          subnetType: SubnetType.PUBLIC,
        },
        {
          cidrMask: 24,
          name: 'Private',
          subnetType: SubnetType.PRIVATE_WITH_EGRESS,
        },
      ],
      natGateways: 1,
    });

    vpc.addGatewayEndpoint('S3GatewayEndpoint', {
      service: GatewayVpcEndpointAwsService.S3,
    });

    vpc.addInterfaceEndpoint('SecretsManagerEndpoint', {
      service: InterfaceVpcEndpointAwsService.SECRETS_MANAGER,
    });

    const vpcDefaultSecurityGroup = SecurityGroup.fromSecurityGroupId(
      this,
      'SecurityGroup',
      vpc.vpcDefaultSecurityGroup,
      {
        allowAllOutbound: false,
        mutable: true,
      },
    );
    this.vpc = vpc;
    this.vpcDefaultSecurityGroupId = vpc.vpcDefaultSecurityGroup;

    vpcDefaultSecurityGroup.addEgressRule(Peer.anyIpv4(), Port.allTraffic());
    vpcDefaultSecurityGroup.addIngressRule(Peer.securityGroupId(this.vpcDefaultSecurityGroupId), Port.allTraffic());
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: We must update the default security group of the VPC to include a self-referencing inbound rule and an outbound rule to allow all traffic from all ports. Later, we attach this security group to an AWS Glue connection to let network interfaces set up by AWS Glue communicate with each other within a private subnet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Define stack for Glue Job and Glue connection
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export class InfraStack extends DeploymentStack {
  constructor(scope: Construct, id: string, props: DeploymentStackProps, stageName: string) {
    super(scope, id, props);

    const vpcStack = new GlueVpcStack(this, id + '-VPC', props);

//Creating an IAM role to let AWS Glue access required service 
    const glueRole = new Role(this, id + '-GlueJobsRole', {
      roleName: 'GlueJobsRole-' + stageName,
      assumedBy: new ServicePrincipal('glue.amazonaws.com'),
    });
    glueRole.addManagedPolicy(ManagedPolicy.fromAwsManagedPolicyName('AmazonS3FullAccess'));
    glueRole.addManagedPolicy(ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole'));
    glueRole.addManagedPolicy(ManagedPolicy.fromAwsManagedPolicyName('SecretsManagerReadWrite'));

//Creating an AWS Glue connection 
    const glueConnection = new CfnConnection(this, 'GlueConnection', {
      catalogId: this.account,
      connectionInput: {
        connectionType: 'JDBC',
        connectionProperties: {
          JDBC_CONNECTION_URL: "jdbcUrl",
          SECRET_ID: "secretId", //secret that has DB credentials, we need to create this manually if 
          JDBC_ENFORCE_SSL: true,
        },
        physicalConnectionRequirements: {
          securityGroupIdList: [vpcStack.vpcDefaultSecurityGroupId],
          subnetId: vpcStack.vpc.privateSubnets[0].subnetId,
          availabilityZone: vpcStack.vpc.privateSubnets[0].availabilityZone,
        },
        name: 'GlueConnection',
        description: 'GlueConnection',
      },
    });

    //Creating a bucket to keep scripts run in job
    const testBucket = this.createBucket(id.toLowerCase() + '-testgluejobscripts', id + '-testgluejobscripts');
    new BucketDeployment(this, 'DeployTestScripts', {
      sources: [Source.asset('test_glue_job_scripts')], //This folder should be present under root of CDK package
      destinationBucket: testBucket,
    });

    const job = new CfnJob(this, 'TestGlueJob', {
      name: 'TestGlueJob',
      role: glueRole.roleArn,
      command: {
        name: 'pythonshell',
        pythonVersion: '3.9',
        scriptLocation: `s3://&amp;lt;bucket&amp;gt;/script_name.py`,
      },
      glueVersion: '4.0',
      executionProperty: {
        maxConcurrentRuns: 1,
      },
      connections: {
        connections: ['GlueConnection'], //This should match with the  name set inside new CfnConnection() constructor
      },
    });
  }

  private createBucket(name: string, id: string) {
    return new Bucket(this, id, {
      enforceSSL: true,
      bucketName: name,
    });
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Allow Amazon RDS to accept network traffic from AWS Glue
&lt;/h3&gt;

&lt;p&gt;For this, we update the security group attached to the Amazon RDS cluster, and whitelist the Elastic IP address attached to the NAT gateway for the AWS Glue VPC.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Deploy the CDK Stack
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cdk deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5: Verify the Glue Job
&lt;/h3&gt;

&lt;p&gt;Once the deployment is complete, navigate to the AWS Glue Console to verify that the job has been created. The job should appear with the specified configurations and script. You can run the job from console and check expected outcome.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/big-data/create-cross-account-and-cross-region-aws-glue-connections/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/big-data/create-cross-account-and-cross-region-aws-glue-connections/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_glue-readme.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_glue-readme.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>cdk</category>
      <category>etl</category>
    </item>
    <item>
      <title>Running a python script as a standalone task in ECS</title>
      <dc:creator>Harsh Vardhan Singh</dc:creator>
      <pubDate>Sun, 09 Jun 2024 08:58:56 +0000</pubDate>
      <link>https://dev.to/harsh_vardhansingh_69340/running-a-python-script-as-a-standalone-task-in-ecs-317l</link>
      <guid>https://dev.to/harsh_vardhansingh_69340/running-a-python-script-as-a-standalone-task-in-ecs-317l</guid>
      <description>&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fve23dwcm1e2w6drwir4l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fve23dwcm1e2w6drwir4l.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Prepare docker Image
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcq59frem2g1y6ad8p7ll.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcq59frem2g1y6ad8p7ll.jpg" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Dockerfile - A Dockerfile is a text file that contains a series of instructions and commands used to build a Docker image. It specifies the operating system, application code, dependencies, environment variables, and other necessary configurations to create a Docker image.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Docker Image - A Docker image is a lightweight, standalone, executable package that includes everything needed to run a piece of software, including the code, runtime, libraries, environment variables, and configurations. It serves as a template for creating Docker containers. An image is built from a Dockerfile.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Docker Container - A Docker container is a runtime instance of a Docker image. It is a lightweight, standalone, and executable unit that runs the software defined in the Docker image. Containers are used to run applications in isolation from the host system and other containers.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Create a python script
&lt;/h3&gt;

&lt;p&gt;Create file &lt;code&gt;demp.py&lt;/code&gt; in a new directory &lt;code&gt;LightningTalk&lt;/code&gt; and add code in it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create a docker file
&lt;/h3&gt;

&lt;p&gt;Create a file docker file in same directory&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;

touch Dockerfile


&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Add following code in Dockerfile&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

FROM public.ecr.aws/docker/library/python:3

WORKDIR /home/harsh/workplace/LightningTalk

COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD [ "python", "./demo.py" ]


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We are using python base image from ECR Public Gallery &lt;a href="https://gallery.ecr.aws/docker/library/python/?page=1" rel="noopener noreferrer"&gt;https://gallery.ecr.aws/docker/library/python/?page=1&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Build Docker Image
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;

docker build --platform linux/amd64 -t lightning-talk-image:test .


&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Run container for testing
&lt;/h3&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;

docker run -it --rm --name lightning-talk-task lightning-talk-image:test



&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Create ECR Registry and push image to ECR
&lt;/h2&gt;

&lt;p&gt;Run the get-login-password command to authenticate the Docker CLI to your Amazon ECR registry.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;

&lt;/span&gt;&lt;span class="gp"&gt;aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $&lt;/span&gt;AWSACCOUNTID.dkr.ecr.us-east-1.amazonaws.com
&lt;span class="go"&gt;


&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;After you have authenticated to an Amazon ECR registry with this command, you can use the client to push and pull images from that registry&lt;/p&gt;

&lt;p&gt;Create a repository in Amazon ECR using the create-repository command.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;

aws ecr create-repository --repository-name python-images --region us-east-1 --image-scanning-configuration scanOnPush=true --image-tag-mutability MUTABLE


&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Run the docker tag command to tag your local image into your Amazon ECR repository as the latest version. Copy the repositoryUri from the output in the previous step.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;

&lt;/span&gt;&lt;span class="gp"&gt;docker tag lightning-talk-image:test $&lt;/span&gt;AWSACCOUNTID.dkr.ecr.us-east-1.amazonaws.com/python-images:latest
&lt;span class="go"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Run the docker push command to deploy your local image to the Amazon ECR repository. Make sure to include :latest at the end of the repository URI.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;

&lt;/span&gt;&lt;span class="gp"&gt;docker push $&lt;/span&gt;AWSACCOUNTID.dkr.ecr.us-east-1.amazonaws.com/python-images:latest
&lt;span class="go"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Create ECS Resources and run the task
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Create a new cluster
&lt;/h3&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;

aws ecs create-cluster --cluster-name lightning-talk-cluster


&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Register a Task Definition
&lt;/h3&gt;

&lt;p&gt;Before you can run a task on your ECS cluster, you must register a task definition. Task definitions are lists of containers grouped together.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;

aws ecs register-task-definition --cli-input-json file://./fargate-task.json


&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Before running above command we need to save the task definition JSON as a file &lt;code&gt;fargate-task.json&lt;/code&gt;.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

{
    "family": "python-tasks",
    "containerDefinitions": [
        {
            "name": "lightning-talk",
            "image": "$AWSACCOUNTID.dkr.ecr.us-east-1.amazonaws.com/python-images:latest",
            "cpu": 256,
            "memory": 2048,
            "portMappings": [
                {
                    "containerPort": 80,
                    "hostPort": 80,
                    "protocol": "tcp"
                }
            ],
            "essential": true,
            "environment": [],
            "environmentFiles": [],
            "mountPoints": [],
            "volumesFrom": [],
            "ulimits": [],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/python-tasks",
                    "awslogs-create-group": "true",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                },
                "secretOptions": []
            },
            "systemControls": []
        }
    ],
    "executionRoleArn": "arn:aws:iam::$AWSACCOUNTID:role/ecsTaskExecutionRole",
    "networkMode": "awsvpc",
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "cpu": "256",
    "memory": "2048"
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Running a task
&lt;/h3&gt;

&lt;p&gt;Run following task&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;

aws ecs run-task --cluster lightning-talk-cluster --task-definition python-tasks --launch-type FARGATE --network-configuration 'awsvpcConfiguration={subnets=["subnet-xxxxx","subnet-xxxxx"],securityGroups=["sg-xxxxxx"],assignPublicIp="ENABLED"}'


&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;To get the values of subnet and subnet we can use values from default VPC and default security group.&lt;/p&gt;

</description>
      <category>ecs</category>
      <category>docker</category>
      <category>aws</category>
    </item>
  </channel>
</rss>
