DEV Community

Aryama
Aryama

Posted on • Updated on

Snowflake Introduction

Image description

What is Snowflake?

  • It is an analytics database or Data warehouse as a service

  • It is a SaaS tool to load , analyze and report on massive data volumes

  • It provides data storage and data analytics solution

  • Data warehouse solution sitting on a cloud only

  • Snowflake cloud providers - Mircosoft Azure , Amazon Web services , Google cloud platform

  • Pay per second billing model. You only pay for what you store and running compute. When compute is not used, it is not charged.

Alternatives to Snowflake - Google BigQuery , Amazon Redshift , Databricks

Snowflake architecture

Architecture diagram:

Image description

It has a multi clustered shared architechture

Snowflake’s architecture is a hybrid of traditional shared-disk and shared-nothing database architectures. Similar to shared-disk architectures, Snowflake uses a central data repository for persisted data that is accessible from all compute nodes in the platform. But similar to shared-nothing architectures, Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores a portion of the entire data set locally. This approach offers the data management simplicity of a shared-disk architecture, but with the performance and scale-out benefits of a shared-nothing architecture.

It decouples both compute and storage.

  • It has 3 layers -> Storage layer , Compute layer ,Cloud Service layer

Storage layer -
At the centre , we have storage - stores table , views , data is stored in both structured and well as semi structured.

They are Compressed & Encrypted(AES 256) and then stored.

Snowflake converts them into optimised columnar compressed format (proprietary to snowflake)

Compute layer - Virtual warehouses are connected directly to the GCP compute instance/Amazon EC2 instances. VW are the place where the queries are executed. VWs can be scaled up or down on demand

Comes in Various Sizes
• X-Small - Single Node(DDL)
• Small - Two Nodes
• Medium - Four Nodes(Data load)
Large - Eight Nodes
• X Large - Sixteen(Data processing)

Storage and compute charged independently and only for usage

Eg: If you store TBs of data and no processing, you will be charged only for storage and not for processing.

Cloud Service layer

  1. Authentication & Authorisation
  2. User & Session Management
  3. Query Compilation, Optimisation & Data caching
  4. Virtual Warehouse Management , Coordinate Data Storage/Updates & Transaction
  5. Metadata Management - Zero Copy Cloning , Time Travel, Data Sharing
  6. Manage and Maintain the life cycle of a query

Features

  1. Unlimited Storage & Compute - Advantages of infinite scalability, elasticity, & redundancy features and hence you can store more and scale up/down your compute as needed.

  2. Supported by all the major cloud providers

  3. Data Platform as Service - There is virtually no software to install, configure or manage.Ongoing maintenance, management, upgrades, and tuning are handled by Snowflake

  4. Time Travel Feature & Fail Safe -
    As part of continuous data protection lifecycle, snowflake allows you to access historical data (table, schema or database) at any point with in the defined period.

  5. Clone or Zero Copy Clone - Clone or Zero Copy Clone creates a copy of database, schema or table without actually copying the data. It is a snapshot of the data to the source object.The clone object is writable object and independent of source object.Cloning is just an SQL statement and since it does not need any additional space, many data copy challenges can be easily solved.Cloning feature is also used to build environments like Prod to QA or QA to Dev or visa-versa without any extra storage cost

  6. Support for semi structured data

  7. On demand pricing

Micro partitions

Top comments (0)