<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yen Trinh</title>
    <description>The latest articles on DEV Community by Yen Trinh (@yentrinh).</description>
    <link>https://dev.to/yentrinh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F270179%2Fc50b415f-a815-47c5-85a8-84e50f81a5ac.png</url>
      <title>DEV Community: Yen Trinh</title>
      <link>https://dev.to/yentrinh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yentrinh"/>
    <language>en</language>
    <item>
      <title>Mount multiple EFS file system to AWS ECS (Fargate)</title>
      <dc:creator>Yen Trinh</dc:creator>
      <pubDate>Mon, 19 Sep 2022 12:07:39 +0000</pubDate>
      <link>https://dev.to/yentrinh/mount-multiple-efs-file-system-to-aws-ecs-fargate-1g5a</link>
      <guid>https://dev.to/yentrinh/mount-multiple-efs-file-system-to-aws-ecs-fargate-1g5a</guid>
      <description>&lt;p&gt;For example, you want to mount EFS file system to ECS Fargate as bellow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Fargate_1: mount EFS_A, EFS_B
Fargate_2: mount EFS_C, EFS_D
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Solution:&lt;/p&gt;

&lt;p&gt;You can mount EFS file system one by one according to this post:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/premiumsupport/knowledge-center/ecs-fargate-mount-efs-containers-tasks/"&gt;https://aws.amazon.com/premiumsupport/knowledge-center/ecs-fargate-mount-efs-containers-tasks/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that you will have to repeat step 5 to step 12 to mount the second EFS file system &lt;/p&gt;

&lt;p&gt;Demo: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CfSrsQD3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/y8hyhmn8vp67zuicf3uh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CfSrsQD3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/y8hyhmn8vp67zuicf3uh.png" alt="Image description" width="880" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--x-ZDNOZL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nzh985ucmfu1eqv2np3u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--x-ZDNOZL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nzh985ucmfu1eqv2np3u.png" alt="Image description" width="880" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;via JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "ipcMode": null,
    "executionRoleArn": "arn:aws:iam::&amp;lt;account-id&amp;gt;:role/ecsTaskExecutionRole",
    "containerDefinitions": [
        {
            "dnsSearchDomains": null,
            "environmentFiles": null,
            "logConfiguration": {
                "logDriver": "awslogs",
                "secretOptions": null,
                "options": {
                    "awslogs-group": "/ecs/task2",
                    "awslogs-region": "ap-northeast-1",
                    "awslogs-stream-prefix": "ecs"
                }
            },
            "entryPoint": null,
            "portMappings": [],
            "command": null,
            "linuxParameters": null,
            "cpu": 0,
            "environment": [],
            "resourceRequirements": null,
            "ulimits": null,
            "dnsServers": null,
            "mountPoints": [
                {
                    "readOnly": null,
                    "containerPath": "/efs1",
                    "sourceVolume": "efs1"
                },
                {
                    "readOnly": null,
                    "containerPath": "/efs2",
                    "sourceVolume": "efs2"
                }
            ],
            "workingDirectory": null,
            "secrets": null,
            "dockerSecurityOptions": null,
            "memory": null,
            "memoryReservation": null,
            "volumesFrom": [],
            "stopTimeout": null,
            "image": "nginx:latest",
            "startTimeout": null,
            "firelensConfiguration": null,
            "dependsOn": null,
            "disableNetworking": null,
            "interactive": null,
            "healthCheck": null,
            "essential": true,
            "links": null,
            "hostname": null,
            "extraHosts": null,
            "pseudoTerminal": null,
            "user": null,
            "readonlyRootFilesystem": null,
            "dockerLabels": null,
            "systemControls": null,
            "privileged": null,
            "name": "nginx"
        }
    ],
    "memory": "512",
    "taskRoleArn": "arn:aws:iam::&amp;lt;account_id&amp;gt;:role/ecsTaskExecutionRole",
    "family": "task2",
    "pidMode": null,
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "networkMode": "awsvpc",
    "runtimePlatform": {
        "operatingSystemFamily": "LINUX",
        "cpuArchitecture": null
    },
    "cpu": "256",
    "inferenceAccelerators": [],
    "proxyConfiguration": null,
    "volumes": [
        {
            "fsxWindowsFileServerVolumeConfiguration": null,
            "efsVolumeConfiguration": {
                "transitEncryptionPort": null,
                "fileSystemId": "fs-06f7aa4c25718fxxx",
                "authorizationConfig": {
                    "iam": "DISABLED",
                    "accessPointId": null
                },
                "transitEncryption": "DISABLED",
                "rootDirectory": "/"
            },
            "name": "efs1",
            "host": null,
            "dockerVolumeConfiguration": null
        },
        {
            "fsxWindowsFileServerVolumeConfiguration": null,
            "efsVolumeConfiguration": {
                "transitEncryptionPort": null,
                "fileSystemId": "fs-0a04ba637cxxx0e6b",
                "authorizationConfig": {
                    "iam": "DISABLED",
                    "accessPointId": null
                },
                "transitEncryption": "DISABLED",
                "rootDirectory": "/"
            },
            "name": "efs2",
            "host": null,
            "dockerVolumeConfiguration": null
        }
    ],
    "tags": []
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--QaE6sq18--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nb1cl1b7bx970stx2gai.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QaE6sq18--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nb1cl1b7bx970stx2gai.png" alt="Image description" width="880" height="212"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
    </item>
    <item>
      <title>Glue – Athena custom output fixed number Of Files</title>
      <dc:creator>Yen Trinh</dc:creator>
      <pubDate>Mon, 19 Sep 2022 12:04:41 +0000</pubDate>
      <link>https://dev.to/yentrinh/glue-athena-custom-output-fixed-number-of-files-2alb</link>
      <guid>https://dev.to/yentrinh/glue-athena-custom-output-fixed-number-of-files-2alb</guid>
      <description>&lt;h2&gt;
  
  
  Situation:
&lt;/h2&gt;

&lt;p&gt;When I only use partition clause, there are so many files in S3 bucket which is &amp;lt;1MB, this affect to the query speed and I want to make those become a bigger file. &lt;/p&gt;

&lt;h2&gt;
  
  
  Solution:
&lt;/h2&gt;

&lt;h4&gt;
  
  
  Solution 1: Use Athena "bucketing" method to custom the number of output file.
&lt;/h4&gt;

&lt;p&gt;You can see this AWS blog for more information: &lt;br&gt;
&lt;a href="https://aws.amazon.com/premiumsupport/knowledge-center/set-file-number-size-ctas-athena/?nc1=h_ls"&gt;How can I set the number or size of files when I run a CTAS query in Athena?&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, there is one drawback if you use bucketing: Bucketed table do not support INSERT INTO query. Here comes the solution 2. &lt;/p&gt;
&lt;h4&gt;
  
  
  Solution 2: Use Glue repartition
&lt;/h4&gt;

&lt;p&gt;The context is the same but now I want to use INSERT INTO query.&lt;/p&gt;

&lt;p&gt;You can refer to this AWS blog for the procedure: &lt;br&gt;
&lt;a href="https://aws.amazon.com/blogs/big-data/build-a-data-lake-foundation-with-aws-glue-and-amazon-s3/"&gt;Build a Data Lake Foundation with AWS Glue and Amazon S3&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that in the step numbered "13. View the job", we add the following code into the job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;datasource_df = dropnullfields3.repartition(&amp;lt;number of output file you want here&amp;gt;)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;right after the line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dropnullfields3 = DropNullFields.apply(frame = resolvechoice2, transformation_ctx = "dropnullfields3")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and edit the code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": "&amp;lt;your_s3_path&amp;gt;"}, format = "parquet", transformation_ctx = "datasink4")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;datasink4 = glueContext.write_dynamic_frame.from_options(frame = datasource_df, connection_type = "s3", connection_options = {"path": "&amp;lt;your_s3_path&amp;gt;"}, format = "parquet", transformation_ctx = "datasink4")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to know more about Glue repartition: &lt;br&gt;
&lt;a href="https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-samples-legislators.html"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try querying with Athena&lt;/strong&gt;&lt;br&gt;
Create table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE EXTERNAL TABLE IF NOT EXISTS demo_query (
  dispatching_base_num string,
  pickup_date string,
  locationid bigint)
STORED AS PARQUET
LOCATION 's3://athena-examples/parquet/'
tblproperties ("parquet.compress"="SNAPPY");
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Try to insert:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;insert into demo_query ("dispatching_base_num", "pickup_date", "locationid") values ('aa23dtgt', '2020-12-03', 1234);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The insert query now should work. Success! &lt;/p&gt;

</description>
      <category>aws</category>
      <category>glue</category>
    </item>
    <item>
      <title>How to query AWS load balancer log if there are terabytes of logs?</title>
      <dc:creator>Yen Trinh</dc:creator>
      <pubDate>Wed, 07 Apr 2021 10:32:43 +0000</pubDate>
      <link>https://dev.to/yentrinh/how-to-query-aws-load-balancer-log-if-there-are-terabytes-of-logs-465a</link>
      <guid>https://dev.to/yentrinh/how-to-query-aws-load-balancer-log-if-there-are-terabytes-of-logs-465a</guid>
      <description>&lt;p&gt;I want to query AWS load balancer log to automatically and on schedule send report for me.&lt;/p&gt;

&lt;p&gt;I am using Amazon Athena and AWS Lambda to trigger Athena. I created data table based on guide here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/athena/latest/ug/application-load-balancer-logs.html"&gt;https://docs.aws.amazon.com/athena/latest/ug/application-load-balancer-logs.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, I encounter following issues:&lt;/p&gt;

&lt;p&gt;1) Logs bucket increases in size day by day. And I notice if Athena query need more than 5 minutes to return result, sometimes, it produce "unknown error"&lt;/p&gt;

&lt;p&gt;2) Because the maximum timeout for AWS Lambda function is 15 minutes only. Therefore, I can not continue to increase Lambda function timeout to wait for Athena to return result (if in the case that Athena needs &amp;gt;15 minutes to return result, for example)&lt;/p&gt;

&lt;p&gt;Can you guys suggest for me some better solution to solve my problem? I am thinking of using ELK stack but I have no experience in working with ELK, can you show me the advantages and disadvantages of ELK compared to the combo: AWS Lambda + AWS Athena? Thank you!&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
