<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jacob A. Hudson</title>
    <description>The latest articles on DEV Community by Jacob A. Hudson (@jacob_hudson).</description>
    <link>https://dev.to/jacob_hudson</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F56946%2F94b990fc-2a49-4339-b673-568c406dbbae.JPG</url>
      <title>DEV Community: Jacob A. Hudson</title>
      <link>https://dev.to/jacob_hudson</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jacob_hudson"/>
    <language>en</language>
    <item>
      <title>Bulk DynamoDB Item Upload with Terraform</title>
      <dc:creator>Jacob A. Hudson</dc:creator>
      <pubDate>Tue, 28 Apr 2020 23:02:43 +0000</pubDate>
      <link>https://dev.to/jacob_hudson/bulk-dynamodb-item-upload-with-terraform-1inp</link>
      <guid>https://dev.to/jacob_hudson/bulk-dynamodb-item-upload-with-terraform-1inp</guid>
      <description>&lt;h1&gt;
  
  
  Overview
&lt;/h1&gt;

&lt;p&gt;DynamoDB is great!  It can be used for routing and metadata tables, be used to lock Terraform State files, track states of applications, and much more!  This post will offer a solution for populating multiple items (rows) of data within a DynamoDB table at create-time, entirely within Terraform.  &lt;/p&gt;

&lt;p&gt;The issue I am looking to solve here is to provision a DynamoDB lookup table entirely from Terraform, without involving extra steps that Terraform can not invoke, without a ton of extra work, and something that can be easily reproducible and scalable.&lt;/p&gt;

&lt;p&gt;The code is available here for those who just want to get to the solution is in the Github at the bottom of the post.  I am using version 0.12.24, but anything 0.12+ should work without issue.  Also, the AWS User/Role to run this configuration also needs to be able to use &lt;code&gt;dynamodb:CreateTable&lt;/code&gt; and &lt;code&gt;dynamodb:BatchWriteItem&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Requirements:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; It has to be able to be invoked form Terraform&lt;/li&gt;
&lt;li&gt; It has to be able to be committed in Version Control&lt;/li&gt;
&lt;li&gt; It has to work on any number of items for DynamoDB&lt;/li&gt;
&lt;li&gt; Ideally, no other dependencies should be needed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Knowledge Needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DynamoDB - Items&lt;/li&gt;
&lt;li&gt;Terraform - Basics of Resources and Provisioners&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Current Problem
&lt;/h1&gt;

&lt;p&gt;Provisioning an empty DynamoDB table in Terraform is quite easy, an example Terraform Configuration is below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_dynamodb_table" "basic-dynamodb-table" {
  name           = "GameScores"
  billing_mode   = "PROVISIONED"
  read_capacity  = 20
  write_capacity = 20
  hash_key       = "UserId"

  attribute {
    name = "UserId"
    type = "S"
  }


  tags = {
    Name        = "dynamodb-table-1"
    Environment = "production"
  }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://www.terraform.io/docs/providers/aws/r/dynamodb_table.html"&gt;Source&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This resource declaration will result in a blank table, that has to have data loaded later.  HashiCorp does offer a solution for managing DynamoDB Items, as shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_dynamodb_table_item" "example" {
  table_name = "${aws_dynamodb_table.example.name}"
  hash_key   = "${aws_dynamodb_table.example.hash_key}"

  item = &amp;lt;&amp;lt;ITEM
{
  "exampleHashKey": {"S": "something"},
  "one": {"N": "11111"},
  "two": {"N": "22222"},
  "three": {"N": "33333"},
  "four": {"N": "44444"}
}
ITEM
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This works quite well but is limited to one item per Terraform resource.  That does not scale well and produces massive Terraform Configuration files.  In fact, the Terraform Documentation itself gives the same warning:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Note: This resource is not meant to be used for managing large amounts of data in your table, it is not designed to scale. You should perform regular backups of all data in the table, see AWS docs for more.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This is clearly not an optimal solution, so what can be done?  Let's see what AWS has to offer, since DynamoDB is an AWS Product.  Scanning through the documentation reveals two possible methods, &lt;code&gt;PutItem&lt;/code&gt; and &lt;code&gt;BatchWriteItem&lt;/code&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;code&gt;PutItem&lt;/code&gt; vs &lt;code&gt;BatchWriteItem&lt;/code&gt;
&lt;/h1&gt;

&lt;p&gt;DynamoDB offers a few methods for writing data to tables, &lt;code&gt;PutItem&lt;/code&gt; and &lt;code&gt;BatchWriteItem&lt;/code&gt;.  Some key details of each are below:&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;code&gt;PutItem&lt;/code&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Is used to upload a single item&lt;/li&gt;
&lt;li&gt;Can determine if the field exists before uploading&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PutItem looks like what the &lt;code&gt;dynamodb_table_item&lt;/code&gt; resource is using in the previous section.  An example is below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws dynamodb put-item \
    --table-name MusicCollection \
    --item '{"Artist": {"S": "Obscure Indie Band"}}' \
    --condition-expression "attribute_not_exists(Artist)"
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/cli/latest/reference/dynamodb/put-item.html"&gt;Source&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This will add the item above to the MusicCollection table, on the condition the artist does not already exist.  It could be possible to use a loop and define some standard to store multiple items, iterate over each item, and add it to the table; but that seems like a lot of work.  Let's check out alternatives.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;code&gt;BatchWriteItem&lt;/code&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Can upload many items to a table at once&lt;/li&gt;
&lt;li&gt;Will simply overwrite all items that have matching Primary Keys&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/cli/latest/reference/dynamodb/batch-write-item.html"&gt;Source&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;An example is below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws dynamodb batch-write-item \
    --request-items file://request-items.json
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Here is a snippet of &lt;code&gt;request-items.json&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "MusicCollection": [
        {
            "PutRequest": {
                "Item": {
                    "Artist": {"S": "No One You Know"},
                    "SongTitle": {"S": "Call Me Today"},
                    "AlbumTitle": {"S": "Somewhat Famous"}
                }
            }
        },
        {
            "PutRequest": {
                "Item": {
                    "Artist": {"S": "Acme Band"},
                    "SongTitle": {"S": "Happy Day"},
                    "AlbumTitle": {"S": "Songs About Life"}
                }
            }
        },
        ...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;(Notice, the table to upload the items to is declared in the JSON itself). &lt;/p&gt;

&lt;p&gt;The JSON example could be any number of items, it can be controlled in a version control system, and the documentation gives the following warning about updating items:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;BatchWriteItem cannot update items. To update items, use the UpdateItem action.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;That is not an issue for this case as all data will live inside of one JSON file in Version Control.&lt;/p&gt;

&lt;p&gt;It looks as we have a working solution, as &lt;code&gt;BatchWriteItem&lt;/code&gt; will load as many items into a table as we like, will be able to do everything at once, and we can centralize data management of the table through a JSON file.&lt;/p&gt;

&lt;p&gt;Now, how can we get this to be invoked solely from Terraform?&lt;/p&gt;

&lt;h1&gt;
  
  
  Terraform Provisioners
&lt;/h1&gt;

&lt;p&gt;A provisioner in Terraform allows for the execution of a file into either the local machine running Terraform for the machine Terraform just provisioned. Provisioners can configure infrastructure, typically virtual machines, either on the local node (that is running Terraform) or the remote machine (that Terraform created).  In this case, we will use &lt;code&gt;local-exec&lt;/code&gt;, which will allow running a file on the machine that Terraform is running on.  For more information, check out the &lt;a href="https://www.terraform.io/docs/provisioners/index.html"&gt;docs&lt;/a&gt; on Provisioners.  &lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;code&gt;Local-Exec&lt;/code&gt; Provisioner
&lt;/h1&gt;

&lt;p&gt;An example of &lt;code&gt;local-exec&lt;/code&gt;, with EC2 in this case, is below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_instance" "web" {
  # ...

  provisioner "local-exec" {
    command = "echo The server's IP address is ${self.private_ip}"
  }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The example above is for EC2; However, &lt;code&gt;local-exec&lt;/code&gt; can run for any AWS resource, including DynamoDB!&lt;/p&gt;

&lt;h1&gt;
  
  
  Solution
&lt;/h1&gt;

&lt;p&gt;Alright, so we now have the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Terraform Configuration to Build a DynamoDB Table&lt;/li&gt;
&lt;li&gt;A Method for uploading multiple items to said table&lt;/li&gt;
&lt;li&gt;A Solution for executing the data load from Terraform&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The only thing left now is to put everything together!&lt;/p&gt;

&lt;p&gt;I will provide a very simple DynamoDB table, with 1 unit of Read and Write capacity, no encryption, no streams, and no Autoscaling.  Within the DynamoDB resource, I invoke the &lt;code&gt;local-exec&lt;/code&gt; provisioner to kick off a Shell script on the same machine that is running Terraform (which also has the AWSCLI installed), this will run &lt;code&gt;BatchWriteItem&lt;/code&gt; for the table I just created and load all of the sample data.&lt;/p&gt;

&lt;p&gt;Here is the Terraform configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;provider "aws" {
  region  = "us-east-2"
}

resource "aws_dynamodb_table" "basic-dynamodb-table" {
  name           = "ExternallyManagedTable"
  billing_mode   = "PROVISIONED"
  read_capacity  = 1
  write_capacity = 1
  hash_key       = "UserId"

  attribute {
    name = "UserId"
    type = "S"
  }

  provisioner "local-exec" {
    command = "bash populate_db.sh"
  }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;I have instructed Terraform to use the &lt;code&gt;aws&lt;/code&gt; provider to build a &lt;code&gt;dynamodb&lt;/code&gt; resource, then use the &lt;code&gt;local-exec&lt;/code&gt; provisioner to invoke the shell script below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#!/usr/bin/env bash

aws dynamodb batch-write-item --request-items file://items.json
{% endhighlight %}

The shell script references the following JSON:

{% highlight json %}
{
    "ExternallyManagedTable": [
        {
            "PutRequest": {
                "Item": {
                    "UserId": {
                        "S": "A"
                    },
                    "Title": {
                        "S": "Principal Engineer"
                    }
                }
            }
        },
        {
            "PutRequest": {
                "Item": {
                    "UserId": {
                        "S": "E"
                    },
                    "Title": {
                        "S": "Senior Engineer"
                    }
                }
            }
        },
        {
            "PutRequest": {
                "Item": {
                    "UserId": {
                        "S": "I"
                    },
                    "Title": {
                        "S": "Mid Level Engineer"
                    }
                }
            }
        },
        {
            "PutRequest": {
                "Item": {
                    "UserId": {
                        "S": "O"
                    },
                    "Title": {
                        "S": "Lead Engineer"
                    }
                }
            }
        },
        {
            "PutRequest": {
                "Item": {
                    "UserId": {
                        "S": "U"
                    },
                    "Title": {
                        "S": "Associate Engineer"
                    }
                }
            }
        },
        {
            "PutRequest": {
                "Item": {
                    "UserId": {
                        "S": "Y"
                    },
                    "Title": {
                        "S": "Half-Present Intern"
                    }
                }
            }
        }
    ]
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This will construct our table, which is a simple metadata lookup table for a headless engineering team comprised entirely of vowels!&lt;/p&gt;

&lt;p&gt;To begin, let's Initialize the Terraform configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;terraform init

Initializing the backend...

Initializing provider plugins...

The following providers do not have any version constraints in configuration,
so the latest version was installed.

To prevent automatic upgrades to new major versions that may contain breaking
changes, it is recommended to add version = "..." constraints to the
corresponding provider blocks in configuration, with the constraint strings
suggested below.

* provider.aws: version = "~&amp;gt; 2.59"

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;On to the plan stage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.


------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_dynamodb_table.basic-dynamodb-table will be created
  + resource "aws_dynamodb_table" "basic-dynamodb-table" {
      + arn              = (known after apply)
      + billing_mode     = "PROVISIONED"
      + hash_key         = "UserId"
      + id               = (known after apply)
      + name             = "ExternallyManagedTable"
      + read_capacity    = 1
      + stream_arn       = (known after apply)
      + stream_label     = (known after apply)
      + stream_view_type = (known after apply)
      + write_capacity   = 1

      + attribute {
          + name = "UserId"
          + type = "S"
        }

      + point_in_time_recovery {
          + enabled = (known after apply)
        }

      + server_side_encryption {
          + enabled     = (known after apply)
          + kms_key_arn = (known after apply)
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

Note: You didn't specify an "-out" parameter to save this plan, so Terraform
can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Finally, time to Apply our configuration and create the table!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;terraform apply -auto-approve
aws_dynamodb_table.basic-dynamodb-table: Creating...
aws_dynamodb_table.basic-dynamodb-table: Provisioning with 'local-exec'...
aws_dynamodb_table.basic-dynamodb-table (local-exec): Executing: ["/bin/sh" "-c" "bash populate_db.sh"]
aws_dynamodb_table.basic-dynamodb-table (local-exec): {
aws_dynamodb_table.basic-dynamodb-table (local-exec):     "UnprocessedItems": {}
aws_dynamodb_table.basic-dynamodb-table (local-exec): }
aws_dynamodb_table.basic-dynamodb-table: Creation complete after 7s [id=ExternallyManagedTable]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;If you go to the console or scan the table, you will see all data is present!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws dynamodb scan --table-name ExternallyManagedTable
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The result should be something like below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "Items": [
        {
            "Title": {
                "S": "Mid Level Engineer"
            },
            "UserId": {
                "S": "I"
            }
        },
        {
            "Title": {
                "S": "Principal Engineer"
            },
            "UserId": {
                "S": "A"
...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;For those new to DynamoDB, the &lt;code&gt;S&lt;/code&gt; is the datatype, String in this case.  &lt;/p&gt;

&lt;p&gt;If you have any questions, comments, concerns, or requests as to what Cloud/DevOps/Automation/Monitoring/SRE topics you would me to cover next (I am working out a master list of future posts and will announce it later); thanks again for reading and hopefully this helps improve Terraform/DynamoDB workflows!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/jacob-hudson/terraform-bulk-upload"&gt;Github Repo&lt;/a&gt;&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>aws</category>
      <category>dynamodb</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
