DEV Community

Thomas H Jones II
Thomas H Jones II

Posted on • Originally published at thjones2.blogspot.com on

3 1

Crib-Notes: EC2 UserData Audit

Sometimes, I find that I'll return to a customer/project and forget what's "normal" for them in how they deploy their EC2s. If I know a given customer/project tends to deploy EC2s that include UserData, but they don't keep good records of what they tend to do for said UserData, I find the following BASH scriptlet to be useful for getting myself back into the swing of things:

for INSTANCE in $( aws ec2 describe-instances --query 'Reservations[].Instances[].InstanceId' --output text )
do
   printf "###############\n# %s\n###############\n" "${INSTANCE}"
   aws ec2 describe-instance-attribute --instance-id "${INSTANCE}" --attribute userData --query 'UserData.Value' --output -d | base64 -d -
   echo
done | tee /tmp/EC2-UserData.log

To explain, what the above does is:

  1. Initiates a for-loop using ${INSTANCE} as the iterated-value
  2. With each iteration, the value injected into ${INSTANCE} is derived from a line of output from the aws ec2 describe-instances command. Normally, this command outputs a JSON document containing a bunch of information about each instance in the account-region. Using the --query option, the output is constrained to only output each EC2 instance's InstanceId value. This is then piped through sed so that the extraneous characters are removed, resulting in a clean list of EC2 instance-IDs.
  3. The initial printf line creates a bit of an output-header. This will make it easier to pore through the output and keep each iterated instance's individual UserData content separate
  4. Instance UserData is considered to be an attribute of a given EC2 instance. The aws ec2 describe-instance-attribute command is what is used to actually pull this content from the target EC2. I could have used a --query filter to constrain my output. However, I instead chose to use jq as it allows me to both constrain my output as well as do output-cleanup, eliminating the need for the kind of complex sed statement I used in the loop initialization (cygwin's jq was crashing this morning when I was attempting to use it in the loop-initialization phase - in case you were wondering about the inconsistent constraint/cleanup methods). Because the UserData output is stored as a BASE64-encoded string, I have to pipe the cleaned-up output through the base64 utility to get my plain-text data back.
  5. I inject a closing blank line into my output stream (via the echo command) to make the captured output slightly easier to scan.
  6. I like to watch my scriptlet's progress, but still like to capture that output into a file for subsequent perusal, thus I pipe the entire loop's output through tee so I can capture as I view.

I could have set it up so that each instance's data was dumped to an individual output-file. This would have saved the need for the printf and echo lines. However, I like having one, big file to peruse (rather than having to hunt through scads of individual files) ...and a single file-open/close action is marginally faster than scads of open/closes.

In an account-region that had hundreds of EC2s, I'd probably have been more selective with which instance-IDs I initiated my loop. I would have used a --filter statement in my aws ec2 describe-instances command - likely filtering by VPC-ID and one or two other selectors.

Billboard image

Deploy and scale your apps on AWS and GCP with a world class developer experience

Coherence makes it easy to set up and maintain cloud infrastructure. Harness the extensibility, compliance and cost efficiency of the cloud.

Learn more

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more