Our data-analytics team eventually started to play with an AWS Redshift cluster instead of the MariaDB RDS service.
Actually, the current task is to spin up a simple Proof of Concept Redshift’s cluster in the AWS.
Let’s do it quickly, without details – if this will go to the Production, I’ll add another post with a more detailed overview.
For now, we are interested in two main parameters – the type of the node. It can be set as Dense Storage or Dense Compute, and will configure a cluster’s storage type, CPU, memory, etc:
When you launch a cluster, one option you specify is the node type. The node type determines the CPU, RAM, storage capacity, and storage drive type for each node. The dense storage (DS) node types are storage optimized. The dense compute (DC) node types are compute optimized.
- Dense Storage: create a simple data warehouse for big data by lower price by using HDD disks
- Dense Compute: create a “production-like” cluster with fast CPU, lot of memory and SSD-drives
For the PoC obviously chose the Dense Storage type.
Let’s start with an IAM-role creation – data-analytics will use AWS S3, so we need to grant Redshift permissions to work it.
Go to the IAM, create a new role, set its type to the Redshift – Customizable:
This role will be used later during the cluster creation.
Go to the AWS Console, Redshift, click on the Quick launch cluster:
Set values to the Cluster identifier, Database name, Master user name, Master user password, and select the IAM role created above:
Wait for 5-10 minutes till its status will become Available:
Then, after the cluster is in the Available state – its DB Health still in the unknown state:
Go to the cluster, chose Modify:
Create a new AWS Security Group, allow connections to the 5439 port:
Go back to the cluster, Modify again, and specify the new Security Group created:
And to test it’s working – let’s configure an SQL-workbench connection.
Install it on Arch Linux from AUR:
$ yaourt -S sql-workbench
$ sqlworkbench &
Find an URL to download drivers here>>>.
$ wget https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/18.104.22.1688/RedshiftJDBC42-no-awssdk-22.214.171.1248.jar
Go to the Connection Profile, at the bottom – Manage Drivers:
Chose Redshift and the driver’s file you downloaded above:
Go to the AWS, Redshift’s cluster, find its Connection string:
Go to the Workbench, in the Profile set the JDBC string:
Create a test table::
> create table testtable (id int, name varchar(10));
Add some data:
> insert into testtable values (1, ‘val’);