DEV Community

Timothy Cummins
Timothy Cummins

Posted on

Amazon Redshift: Changing Data Warehousing

Today I want to talk about a tool I am excited to use, Amazon Redshift. Redshift is a Cloud Based Data Warehouse where you can query petabytes of structured data and is just one of the many tools provided by Amazon Web Services.

Data Warehouse

Let us start with what a Data Warehouse is. A Data Warehouse is a database where a company can store their organized data so that it can be quickly accessed for analysis. To do this the data structure and schema must be defined in advance usually through transactional systems and business applications.

So now you're probably thinking cool a place to store my data what makes Amazon Redshift so cool? Well the unique thing about Redshift is that it is at the forefront of Performance, Security and Scalability.

Performance

With Redshift Amazon is setting new standards for the speeds you are able to query your data. The increase in the speeds can mostly be broken down into four things: Columnar Storage, Data Compression, Zone Maps and Slice. Though I will just be touching on Columnar Storage and Data Compression.

Now lets start with the Columnar Storage, normally your standard Data Warehouse would just store your data row by row like it was inputed and by doing that when you queried for a result it would have to process a ton more data, but not with Columnar Storage. Instead by storing the data in columns it allows the database to more precisely search through values of just that column rather than checking and discarding the rest of the data in the rows saving tons of time.

The second feature that I would like to touch on that increases the performance would be the Data Compression. Since Redshift stores data by the column it has the ability to store data by its type on disc instead of row by row letting it compresses each column by its individual type. Also when uploading your data into Redshift you can have go through some of AWS’ algorithms where it will automatically find the most compressed data type for your columns.

Security

Nothing to fear, Amazon Web Services Security is here. With Redshift, Amazon really put all its forces into making sure the data you are trusting is accessible by you and only you (or your business). For starters when using this product you get the security from AWS that amazon uses for their own data, I mean if they trust it can’t you. On top of that though Cluster security groups are created with each data cluster and data encryption is enabled on each and every one.

Scalability

Normally with a Data Warehouse increasing or decreasing the amount of data you are storing requires major investments with purchasing and setting up additional hardware and software to contain it. But not with Amazon, since everything is on the cloud you can easily change the number or type of nodes you need and with managed storage, capacity is added automatically to support workloads of up to 8 Petabytes of compressed data. Concurrent users is also no problem as Amazon claims they can support a virtually unlimited number of users and queries by adding transient capacity in seconds as concurrency increases.

Data Lake Integration

Ultimately the feature I find the coolest out of all the things that make Redshift great is the integration with their Data Lake. So as I mentioned earlier Redshift is used to query your structured data, but what about all the other data you are collecting but not sure how you are going to use it yet? Well that is where the Data Lake comes in, you can store all of your semi-structured or unstructured data data in the Lake and then when you know what you want to do with it you can structure it with AWS glue and combine it with your structured data in Redshift. When you finish up with your queries and restructuring your data and are ready to try out some of the other AWS products such as Amazon SageMaker for machine learning you can export your data directly into that service and Amazon will take care of the formatting and data movement.

Alt Text

Top comments (0)