augusto kiniama rosa

Posted on Apr 3, 2024 • Originally published at Medium on Feb 28, 2024

Snowflake Hybrid Tables: All You Need to Know

#hybridtables #data #snowflake #realtimeanalytics

Experience the power of Snowflake’s latest innovation — Hybrid Tables, In Public Preview. Let Me Guide You through My Learnings of the Feature.

Photo by Markus Spiske on Unsplash

The Background

Let me give you some background information first: what is all the fuss about on UNISTORE, aka HYBRID tables? Ten years ago, when Snowflake officially went to the public, they focused on doing a single thing super well. Snowflake built an analytical Online processing (OLAP) type database. Snowflake created this fantastic product that is simple, fast, and flexible. With the separation of Compute vs Storage, they can scale almost unlimitedly and independently. This is what we, customers and partners of Snowflake, appreciate about it.

Snowflake has continuously improved its software every week, similar to any SaaS company. On the other hand, its competitors, such as Oracle, Microsoft, IBM, and Teradata, are typically slow to introduce new features. You can follow my monthly blog posts to keep up with the latest Snowflake features. I have worked in technology companies for 30 years since I was 15, and it’s rare to see such a high rate of innovation from many companies.

Snowflake has released the Hybrid Tables feature into Public Preview, which has changed how things work. They can now perform OLAP (Online Analytical Processing) and OLTP (Online Transaction Processing), which means that a single table type can deliver fast transaction-based performance. As someone with inside knowledge, I have seen how performance has improved since it was first announced in a private preview. As we move into Public Preview, there have been notable improvements in analytical workloads by 50X and transactional workloads by 10X.

Hybrid Tables Overview

Hybrid tables give you fast, high-concurrency point operations with unique constraints, indexing, and integrity checks.

It is a different type of table compared to the Snowflake Traditional Standard tables, and the infrastructure behind it is unique. The secret sauce is the combination of columnar key-value pair databases and caching. Like Iceberg tables, it is just a distinct implementation of standard tables.

As of the Public Preview, you can expect an average of around 20ms for both saves and reads from a HYBRID Table, resulting in a double-digit millisecond latency.

One of the most significant differences between HYBRID and Standard Tables is that HYBRID uses row locking, allowing for much higher concurrency and throughput.

But remember, the same features of the Snowflake Cloud still apply, like caching for repeat read results without using compute capacity.

Unique features of HYBRID Tables:

Primary keys are unique identifiers within a table and enable fast analysis. Hybrid Tables consistently enforce primary keys, while other Snowflake tables provide primary keys but do not enforce them.
Foreign keys create a connection between data in different tables by referring to primary keys.
Referential integrity validates the data before loading to implement foreign key relationships and prevent destructive actions.
Secondary indexes are utilized to speed up the querying of data based on an attribute other than the primary key. This super simple process creates an index on a non-primary vital column.
Hybrid Tables can be queried and joined with any other table in Snowflake.
Row locking instead of table locking for writes
All indexing and maintenance will continue to run behind the scenes without the need to be managed by the customer

For client support of hybrid tables, this are the minimum versions.

Under the hood, Snowflake keeps two copies of the data in two places, one for real-time and writing needs and one for analytical needs. For analytical needs, they are using a large object store instead of the operational row store for real-time needs.

How To Use It

Like any other table, it is easy to launch using the following SQL command.

-- Create hybrid table
CREATE OR REPLACE HYBRID TABLE augusto_hybrid (
  id NUMBER PRIMARY KEY AUTOINCREMENT START 1 INCREMENT 1,
  col1 VARCHAR NOT NULL,
  col2 VARCHAR NOT NULL
);

But let’s look at a case with FOREIGN KEY and Secondary Indexes

CREATE OR REPLACE HYBRID TABLE Orders (
    Orderkey number(38,0) PRIMARY KEY,
    Customerkey number(38,0),
    Orderstatus varchar(20),
    Totalprice number(38,0),
    Orderdate timestamp_ntz,
    Clerk varchar(50),
CONSTRAINT fk_o_customerkey FOREIGN KEY (Customerkey) REFERENCES Customers(Customerkey),
INDEX index_o_orderdate (Orderdate)); -- secondary index to accelerate time-based lookups

Let’s run some testing on the above table

-- Insert data into hybrid table
INSERT INTO augusto_hybrid VALUES(1, 'A1', 'B1');
INSERT INTO augusto_hybrid VALUES(2, 'A2', 'B2');
INSERT INTO augusto_hybrid VALUES(3, 'A3', 'B3');
INSERT INTO augusto_hybrid VALUES(4, 'A4', 'B4');

-- Update data in hybrid table
UPDATE augusto_hybrid SET col2 = 'B3-updated' WHERE id = 3;

-- Delete data from hybrid table
DELETE FROM augusto_hybrid WHERE id = 4;

-- Select data from hybrid table
SELECT * FROM augusto_hybrid;

Limits and Features Not Available Yet

Snowflake implements a few virtual limits that I hear can be bumped up. For example, there is a limit of 1,000 TPS, which your Sales Engineering team can bump up. Remember, during Public Preview, all support still goes through Engineering, not the Support Team.

Note that analytical queries inside HYBRID Tables are around 2 to 3X slower than Snowflake Standard Analytical queries.

Features Not Yet Supported:

Cloning (For HYBRID tables, cloning will require a copy or restore from backup instead of a Metadata operation that is used for Standard Tables)
Clustering Keys
Collations
Data Retention Period
Data sharing
Dynamic Tables
Fail-safe
Materialized Views
Periodic rekeying
Query Acceleration Service
Replication
Search Optimization Service
Snowpipe
Streams
Tri-secret secure encryption
Time Travel
UNDROP

Initial Limits:

Only available at AWS West Oregon, AWS EU Ireland, and AWS Asia Pacific (Sydney)
100 GB limit per account, this can be increased and I am told it can go as high as 10 TB.
1000 TPS (transactions per second), this can be increased at the customer’s request

Costs

The biggest changes here are the introduction to two more billing types: Hybrid Table Storage and Hybrid Table Requests, with Compute costs remaining precisely as before.

Hybrid table storage costs are based on a flat monthly rate per gigabyte of stored data.

Please note that requesting hybrid tables will require additional credits as it utilizes serverless resources on the underlying row storage clusters. The amount of serverless resources you consume is calculated based on the data you read from or write to the storage clusters.

HYBRID tables are premium storage and requests, so I say move data to Standard tables as soon as possible. This discourages using HYBRID tables as the analytical engine unless you need to.

In great news, you are not paying twice for the data being stored twice for HYBRID tables. Snowflake is taking care of that for us.

To keep an eye on the storage resources, you can utilize the following views:

STORAGE_USAGE View (see the HYBRID_TABLE_STORAGE_BYTES column).
DATABASE_STORAGE_USAGE_HISTORY View (see the AVERAGE_HYBRID_TABLE_STORAGE_BYTES column).
HYBRID_TABLES View (data is on a specific hybrid table; see the BYTES column).

Best Use-Cases

Like any other real-time database, from SQL Server to Postgres databases, you can use HYBRID tables for most use-cases outside of performance needs that HYBRID Tables do not fit and other features not yet included, but Snowflake does provide guidance on this with the following

Application state
Data serving
Transactional applications

One clear idea that fit in between application state and data serving is creating your own Master Data Management application to serve as Reference of our key data using Hybrid tables, UDFs and Streamlit all inside Snowflake available to internal data warehouse use or external applications.

I plan to create a demo application at some point to demonstrate some of these concepts.

Conclusion

HYBRID tables open the Snowflake platform to a lot of other workloads and start to become a much more exciting platform that can cover your technology needs from end to end. Even though it is a Public Preview, it already provides a good enough performance for various use cases and potentially most internal or external application needs without managing your infrastructure.

I’m Augusto Rosa, Snowflake Data Super Hero, and Snowflake SME. I’m the VP of Engineering for Infostrux Solutions. Thanks for reading my blog post. You can follow me on LinkedIn or my Blog Posts.

Subscribe to Infostrux Medium Blogs https://medium.com/infostrux-solutions for the most interesting Data Engineering and Snowflake news.

DEV Community