In this post is described how to get the data to analyze the changes in the dynamodb data. This post describes how to (semi) automate the export of the dynamodb table data and analyze it with Athena. This post describes how you can do that manually.
One approach is with a lambda and another approach is with step functions. Both approaches implement the steps for triggering the export to a S3 bucket, create an athena table for that exported data and prepare a namend query for analyzing.
The data for this example looks like this.
With lambda
This lambda triggers the export with via the sdk and create or update a named query.
The query creates the athena table. The export id will be set by the lambda by replacing the “s3location” with something like s3://<<bucket name>>/ddb-exports/AWSDynamoDB/<<ddb-export-id>>/data/
.
CREATE EXTERNAL TABLE ddb_exported_table (
Item struct<pk:struct<S:string>,
person:struct<M:struct<
jobArea:struct<S:string>,
firstname:struct<S:string>,
gender:struct<S:string>,
jobType:struct<S:string>,
jobDescriptor:struct<S:string>,
lastname:struct<S:string>
>>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3Location'
TBLPROPERTIES ( 'has_encrypted_data'='true');
SELECT
item.pk.S as pk,
item.person.M.firstname.S as firstname,
item.person.M.lastname.S as lastname,
item.person.M.jobArea.S as jobArea,
item.person.M.gender.S as gender,
item.person.M.jobType.S as jobType,
item.person.M.jobDescriptor.S as jobDescriptor
FROM "db_name"."table_name";
After you started the lambda you have to wait until the export is finished. Then you can run the query for creating the athena table. The lambda has already deleted the old table. After that you can use the prepared query for analyzing.
A more orchestrated approach is with step function. That’s better for waiting for the results :)
With step functions
This are the steps, which are orchestrated by the step function.
It’s definend here
The step function could be startet with the default values.
It takes some minutes to complete.
The “recent queries” section list the steps for dropping the old table and create the new one.
After it’s finished you can choose the saved query with the name sfn-ddb-export-read-table
. It can be used to query all the data from the dynamodb table and could be adapted to more “complex” queries.
Code
JohannesKonings / examples-aws-dynamodb-analytics-cdk
Example how to Analyze DynamoDB data with Athena in different ways created with AWS CDK
examples-aws-dynamodb-analytics-cdk
diff over all: npm run cdk -- diff
deploye over all: npm run cdk -- deploy
scenarios
ddb-kinesis-athena: DynamoDb -> DynamoDb Streams -> Kinesis Data Streams -> Data Firehose -> S3 -> Glue Crawler -> Athena
diff: npm run cdk:ddb-kinesis-athena -- diff
deploy: npm run cdk:ddb-kinesis-athena -- deploy
ddb-kinesis-athena: DynamoDb -> EventBridge Pipe (DynamoDb Streams -> Data Firehose) -> S3 -> Glue Crawler -> Athena
diff: npm run cdk:ddb-pipe-athena -- diff
deploy: npm run cdk:ddb-pipe-athena -- deploy
archive
deploy options
The config file controls the deplyoement options.
Firehose
The formats JSON
and Parquet
can be choosen
export const config: Config = {
...
kinesisFormat: 'JSON',
...
}
export const config: Config = {
...
kinesisFormat: 'PARQUET',
...
}
Quicksight
export const config: Config = {
...
isQuicksight: true,
...
}
cd cdk
QUICKSIGHT_USERNAME=<<Quicksight user name>> npx cdk deploy
desription
see…
(https://github.com/JohannesKonings/test-aws-dynamodb-athena-cdk)
Top comments (0)