DEV Community

Cover image for Example how to visualize DynamoDB item changes with Quicksight (S3 source) created with CDK
Johannes Konings for AWS Community Builders

Posted on • Originally published at johanneskonings.dev on

Example how to visualize DynamoDB item changes with Quicksight (S3 source) created with CDK

This post is about how to visualize the DynamoDb data changes with Quicksight. It’s an extension of this post, which describes how to analyze the data with Athena. The setting for creating the DynamoDb table and putting the data changes to a S3 bucket is the same. Instead of creating an Athena table of the data in the S3 bucket, this data is linked to a data source in Quicksight.

Quicksight activation and costs

Quicksight needs to be activated before using. It’s enough to use the standard edition for this scenario. The first 30 days are free. Costs are listed here.

quicksight open service

quicksight sign up

quicksight choose standard

Enter a name and an email address

quicksight create standard

Quicksight role

In the standard edition, Quicksight uses a standard role that could be configured via the Quicksight console.

quicksight permission access to aws services

Unfortunately, it is not possible to allow for specific KMS keys. For that, we need to add a policy to the role aws-quicksight-service-role-v0.

This ads the needed permissions to that role.

import { Construct } from 'constructs'
import { aws_iam as iam, aws_s3 as s3 } from 'aws-cdk-lib'

export interface QuicksightRoleProps {
  name: string
  bucket: s3.IBucket
}

export class QuicksightRole extends Construct {
  constructor(scope: Construct, id: string, props: QuicksightRoleProps) {
    super(scope, id)

    const quicksightRoleName = 'aws-quicksight-service-role-v0'

    const quicksightRole = iam.Role.fromRoleName(this, 'quicksight-role', quicksightRoleName)

    quicksightRole.attachInlinePolicy(
      new iam.Policy(this, `${props.name}-policy`, {
        statements: [
          new iam.PolicyStatement({
            actions: ['kms:Decrypt', 's3:GetObject', 's3:List*'],
            resources: [props.bucket.bucketArn, `${props.bucket.bucketArn}/*`, props.bucket.encryptionKey!.keyArn],
          }),
        ],
      })
    )
  }
}

Enter fullscreen mode Exit fullscreen mode

Now it’s possible to create a datasource from the S3 bucket.

Create a datasource and dataset

In Quicksight a data source is the connection to the data and the data set is using this data source and defining how the data will be used.

Quicksight has a lot of different data sources. We want to use the data from S3.

quicksight datasource kinds

Currently, there is no L2 CDK construct for data sources so we need to use the L1 cloud formation.

In the case of a S3 data source, it’s a link to a manifest file

const datasource = new quicksightCfn.CfnDataSource(this, 'datasource', {
      name: datasourceName,
      type: 'S3',
      awsAccountId: Stack.of(this).account,
      dataSourceId: datasourceName,
      dataSourceParameters: {
        s3Parameters: {
          manifestFileLocation: {
            bucket: props.bucket.bucketName,
            key: manifestKey,
          },
        },
      },
      permissions: permissionsDatasource,
    })

Enter fullscreen mode Exit fullscreen mode

This is the definition of the manifest file. More about manifest files here

const manifest = {
      fileLocations: [
        {
          URIPrefixes: [`s3://${props.bucket.bucketName}/${props.prefix}/`],
        },
      ],
      globalUploadSettings: {
        format: 'JSON',
      },
    }

Enter fullscreen mode Exit fullscreen mode

The dataset then defines which fields will be used and has the potential to format these fields.

const dataset = new quicksightCfn.CfnDataSet(this, 'dataset', {
      name: datasetName,
      awsAccountId: Stack.of(this).account,
      dataSetId: datasetName,
      importMode: 'SPICE',
      physicalTableMap: {
        itemChanges: {
          s3Source: {
            dataSourceArn: datasource.attrArn,
            uploadSettings: {
              format: 'JSON',
            },
            inputColumns: [
              {
                name: 'awsRegion',
                type: 'STRING',
              },
              {
                name: 'eventID',
                type: 'STRING',
              },
              {
                name: 'eventName',
                type: 'STRING',
              },
              {
                name: 'userIdentity',
                type: 'STRING',
              },
              {
                name: 'recordFormat',
                type: 'STRING',
              },
              {
                name: 'tableName',
                type: 'STRING',
              },
              {
                name: 'dynamodb.ApproximateCreationDateTime',
                type: 'STRING',
              },
              {
                name: 'dynamodb.Keys.pk.S',
                type: 'STRING',
              },
              {
                name: 'dynamodb.NewImage.pk.S',
                type: 'STRING',
              },
              {
                name: 'dynamodb.NewImage.person.M.jobArea.S',
                type: 'STRING',
              },
              {
                name: 'dynamodb.NewImage.person.M.firstname.S',
                type: 'STRING',
              },
              {
                name: 'dynamodb.NewImage.person.M.gender.S',
                type: 'STRING',
              },
              {
                name: 'dynamodb.NewImage.person.M.jobType.S',
                type: 'STRING',
              },
              {
                name: 'dynamodb.NewImage.person.M.jobDescriptor.S',
                type: 'STRING',
              },
              {
                name: 'dynamodb.NewImage.person.M.lastname.S',
                type: 'STRING',
              },
              {
                name: 'dynamodb.SizeBytes',
                type: 'STRING',
              },
              {
                name: 'eventSource',
                type: 'STRING',
              },
            ],
          },
        },
      },
      logicalTableMap: {
        logicalTableProperty: {
          alias: `${datasetName}-alias`,
          source: { physicalTableId: 'itemChanges' },
        },
      },
      permissions: permissionsDataset,
    });

Enter fullscreen mode Exit fullscreen mode

The whole definition is here

Here is a post that describes it with Python CDK.

permission to see the datasource and dataset

Datasources and datasets will be only displayed if your user has permission for that. That is not automatically the case if you deploy it with CDK. Therefore you need to put your user ARN to the permission. One way to do that is with an environment variable

QUICKSIGHT_USERNAME=<<Quicksight user name>> npx cdk deploy

const quicksightUsername = process.env.QUICKSIGHT_USERNAME
    const principalArn = `arn:aws:quicksight:${Stack.of(this).region}:${Stack.of(this).account}:user/default/${quicksightUsername}`

    const permissionsDatasource = [
      {
        principal: principalArn,
        actions: [
          'quicksight:DescribeDataSource',
          'quicksight:DescribeDataSourcePermissions',
          'quicksight:PassDataSource',
          'quicksight:UpdateDataSource',
          'quicksight:DeleteDataSource',
          'quicksight:UpdateDataSourcePermissions',
        ],
      },
    ]

    const permissionsDataset = [
      {
        principal: principalArn,
        actions: [
          'quicksight:DescribeDataSet',
          'quicksight:DescribeDataSetPermissions',
          'quicksight:PassDataSet',
          'quicksight:DescribeIngestion',
          'quicksight:ListIngestions',
          'quicksight:UpdateDataSet',
          'quicksight:DeleteDataSet',
          'quicksight:CreateIngestion',
          'quicksight:CancelIngestion',
          'quicksight:UpdateDataSetPermissions',
        ],
      },
    ]

Enter fullscreen mode Exit fullscreen mode

You can find the Quicksight username here

quicksight username

Deployment and refresh of the dataset

If you deploy this setup the first time, the data does not exist, but the datasource links already to the data. That causes a deployment error. To avoid this a little dummy data file will be deployed with a SDK call of a CDK custom resource

const dummyJsonString = JSON.stringify({ dummy: 'dummy'}); // Delete after deplyoment
    const customResourcePutObject = new custom_resources.AwsCustomResource(this, 'prefix-creation', { // add -put
      onCreate: {
        service: 'S3',
        action: 'putObject',
        parameters: {
          Bucket: props.bucket.bucketName,
          Key: `${props.prefix}/dummy.json`,
          Body: dummyJsonString,
        },
        physicalResourceId: custom_resources.PhysicalResourceId.of('prefix-creation'),
      },
      policy: custom_resources.AwsCustomResourcePolicy.fromSdkCalls({ resources: custom_resources.AwsCustomResourcePolicy.ANY_RESOURCE }),
    });
    props.

Enter fullscreen mode Exit fullscreen mode

After the datasource is deployed this will be removed

const customResourceDeleteObject = new custom_resources.AwsCustomResource(this, 'prefix-creation-delete', {
      onCreate: {
        service: 'S3',
        action: 'deleteObject',
        parameters: {
          Bucket: props.bucket.bucketName,
          Key: `${props.prefix}/dummy.json`,
        },
        physicalResourceId: custom_resources.PhysicalResourceId.of('prefix-creation'),
      },
      policy: custom_resources.AwsCustomResourcePolicy.fromSdkCalls({ resources: custom_resources.AwsCustomResourcePolicy.ANY_RESOURCE }),
    });
    props.bucket.grantReadWrite(customResourceDeleteObject);
    customResourceDeleteObject.node.addDependency(dataset);

Enter fullscreen mode Exit fullscreen mode

After there is some data in the dynamodb you have to refresh the dataset. This is how it looks like if you create 5 new entries and then modify 1 and refresh again and use this data in an analysis.

quicksight refresh dataset

quicksight dataset imported rows

quicksight dataset analysis

quicksight analysis inserts

quicksight analysis modify

Cost Alert 💰

⚠️ Don’t forget to delete the Quicksight account after testing.

quicksight delete account

Code

GitHub logo JohannesKonings / examples-aws-dynamodb-analytics-cdk

Example how to Analyze DynamoDB data with Athena in different ways created with AWS CDK

examples-aws-dynamodb-analytics-cdk

diff over all: npm run cdk -- diff deploye over all: npm run cdk -- deploy

scenarios

ddb-kinesis-athena: DynamoDb -> DynamoDb Streams -> Kinesis Data Streams -> Data Firehose -> S3 -> Glue Crawler -> Athena

diff: npm run cdk:ddb-kinesis-athena -- diff deploy: npm run cdk:ddb-kinesis-athena -- deploy

ddb-kinesis-athena architecture

ddb-kinesis-athena: DynamoDb -> EventBridge Pipe (DynamoDb Streams -> Data Firehose) -> S3 -> Glue Crawler -> Athena

diff: npm run cdk:ddb-pipe-athena -- diff deploy: npm run cdk:ddb-pipe-athena -- deploy

ddb-pipe-athena architecture

archive

overview

deploy options

The config file controls the deplyoement options.

Firehose

The formats JSON and Parquet can be choosen

export const config: Config = {
    ...
    kinesisFormat: 'JSON',
    ...
}
Enter fullscreen mode Exit fullscreen mode
export const config: Config = {
    ...
    kinesisFormat: 'PARQUET',
    ...
}
Enter fullscreen mode Exit fullscreen mode

Quicksight

export const config: Config = {
    ...
    isQuicksight: true,
    ...
}
Enter fullscreen mode Exit fullscreen mode

cd cdk

QUICKSIGHT_USERNAME=<<Quicksight user name>> npx cdk deploy

desription

see…

Top comments (3)

Collapse
 
ryankarlos profile image
Ryan Nazareth

@johanneskonings nice blog series - and data analytics workflow, well done ! Have you also tried Dynamo DB Streams for capturing change items in Dynamo ? docs.aws.amazon.com/amazondynamodb... I wanted to try it out on a project. I think there are limitations in comparison to kinesis, in terms of data retention (24 hrs) and fewer consumers per shard but no duplicate records and seems easier to setup.

Collapse
 
johanneskonings profile image
Johannes Konings

Thanks.
Dynamo DB Streams I not tried yet. Another advantage of Dynamo DB Streams would be that it has no hourly costs compared to Kinesis data stream.
Maybe I have a look later.

In this video is an explanation how to use a lambda instead of the Kinesis data stream. It should not be a problem to switch 😊

Image description

https://www.youtube.com/watch?v=17AmrTqn0GY&t=1s

Collapse
 
ryankarlos profile image
Ryan Nazareth

That seems a good incentive to use DynamoDB streams already - ive had bad experience in the past of forgetting to tear down kinesis stream and got a huge bill. I'll check it out - thanks for the video link !