murugan

Posted on Feb 7, 2024

Singlestore, MemSQL Basic understanding

#singlestore #memsql

Single Store

SingleStore DB is a distributed, relational database that handles both transactions and real-time analytics at scale. Querying is done through standard SQL drivers and syntax, leveraging a broad ecosystem of drivers and applications. Read the links below to get familiar with SingleStore DB:

Reference

How SingleStore DB Works

SingleStore DB is a distributed, relational database that handles both transactions and real-time analytics at scale. It is accessible through standard SQL drivers and supports ANSI SQL syntax including joins, filters, and analytical capabilities (e.g. aggregates, group by, and windowing functions).

Data Ingestion

SingleStore DB can load data continuously or in bulk from a variety of sources. Popular loading sources include: files, a Kafka cluster, cloud repositories like Amazon S3, HDFS, or from other databases. As a distributed system, SingleStore DB ingests data streams using parallel loading to maximize throughput.

SingleStore Pipelines is an easy-to-use built-in capability that extracts, transforms, and loads external data using sources such as Kafka, S3, Azure Blob, and filesystems.

Deploy

SingleStore DB can be deployed on bare metal, on virtual machines, or in the cloud by using SingleStore Tools

Reference

Managed Service
- The scalable cloud database for data-intensive applications, deployed on AWS, Azure, or GCP
DB Software
- Manually deploy and manage a SingleStore database cluster on your own hardware.

DB Creation

DROP TABLE IF EXISTS play_game;
CREATE DATABASE play_game;
USE play_game;

CREATE TABLE play_game (
    msgId INT,
    msgDateTime DATETIME NOT NULL,
    gameName  varchar(60),
    SORT KEY (timeSince),
    SHARD KEY (msgId)
);

Pipelines

MemSQL Pipelines is a MemSQL Database feature that natively ingests real-time data from external sources. As a built-in component of the database, Pipelines can extract, transform, and load external data without the need for third-party tools or middleware. Pipelines is robust, scalable, highly performant, and supports fully distributed workloads.

Alter pipeline
Create Pipeline
Create Pipeline with Transform
Create Pipeline into Procedure
Extract Pipeline into outfile

Reference

Create Pipeline

CREATE PIPELINE play_game
AS LOAD DATA KAFKA '127.0.0.1:9092/play-games-1'
SKIP DUPLICATE KEY ERRORS
INTO TABLE play_game
FORMAT JSON (
    msgId <- msgId,
    msgDateTime <- msgDateTime,
    gameName <- gameName,
    timeSince <- timeSince
)

START PIPELINE play_game;

Basic Comments

Syntax	Comments
Delete Pipeline	DELETE PIPELINE play_games;
Delete Database	DROP DATABASE IF EXISTS play_game;
Delete Table	DROP TABLE IF EXISTS play_game;
Stop Pipeline	STOP PIPELINE play_game;
Test Pipeline	Test PIPELINE play_game;
Extract Pipeline	EXTRACT PIPELINE play_games INTO OUTFILE 'file_name.json'

Reference

Load Data

The easiest way to load data is to first upload it to Amazon S3 or Azure Blob Storage. Then, use SingleStore Pipelines to extract your data.

Load Data from Amazon Web Services (AWS)
Load Data from Microsoft Azure
Load Data from the Filesystem using a Pipeline
Load Data from Kafka
Load Data from MySQL etc

Load Data from KAFA

bin/kafka-topics.sh --create --topic play-games-1 --partitions 1 --bootstrap-server localhost:9092 --replication-factor 1


bin/kafka-topics.sh --describe --topic play-games-1 --bootstrap-server localhost:9092

bin/kafka-console-producer.sh --topic play-games-1 --bootstrap-server localhost:9092

Insert into the data from Kafka usimng JSON

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic play-games-1 < PlayActivity.json

bin/kafka-console-consumer.sh --topic play-games-1 --from-beginning --bootstrap-server localhost:9092

List Topic

bin/kafka-topics.sh --list --bootstrap-server localhost:9092

Delete Topic

bin/kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic DummyTopic

Single Store Stored Procedure Example

USE quickstart_kafka;
CREATE TABLE test2 (id int, fname varchar(50), lname varchar(50), addr varchar(50));

DELIMITER //
CREATE OR REPLACE PROCEDURE process_users(GENERIC_BATCH query(GENERIC_JSON json)) AS
BEGIN
INSERT INTO test3(id,fname,lname,addr)
SELECT GENERIC_JSON::id, GENERIC_JSON::fname,GENERIC_JSON::lname,GENERIC_JSON::addr
FROM GENERIC_BATCH;
END //
DELIMITER ;

CREATE or replace PIPELINE jsonproce1 AS LOAD DATA KAFKA '127.0.0.1:9092/test2'
INTO PROCEDURE process_users (GENERIC_JSON <- %)FORMAT JSON ;

test pipeline jsonproce1;

start pipeline jsonproce1;

Manage Data

Local and Unlimited Database Storage Concepts
Unlimited Data Storage
Backing Up and Restoring Data
Exporting Particular Database >> mysqldump -h 127.0.0.1 -u root -p -P 3306 foo > foo.sql
Export All Database >> mysqldump -h 127.0.0.1 -u root -p -P 3306 --all-databases > full_backup.sql

Aditional Concepts:

Open MemSQL Studio

https://aws.amazon.com/quickstart/architecture/memsql/
https://aws-quickstart.s3.amazonaws.com/quickstart-memsql/doc/memsql-on-the-aws-cloud.pdf

--> SQL Editor
--> Master Aggregator Node (1 Master) (Aggregator we will share the meta data)
--> Child Aggregator Node (2 Nodes)
--> Leaf Node (4 Nodes) - Leaf node we will share the data

Every cluster atleast 1 master aggregater and 1 leaf node

Cluster

A cluster encompasses all of the nodes that are included in a complete SingleStore DB installation. A cluster contains aggregator nodes and leaf nodes.

Connect to your Cluster

You have three ways to connect to your SingleStore DB cluster: SingleStore DB Studio, the singlestore client application, or through any compatible third-party applications.

Node

A node is a server that has an installation of a SingleStore DB instance.

Leaf

A leaf is a node that stores a subset of a cluster’s data. A cluster typically contains many leaves.

Partition

A partition contains a subset (a shard) of a database’s data. A leaf contains multiple partitions. When you run CREATE DATABASE, SingleStore DB splits the database into partitions,
which are distributed evenly among available leaves. With CREATE DATABASE, you can specify the number of partitions with the PARTITIONS=X option.

If you don’t specify the number of partitions explicitly, the default is used (the number of leaves times the value of the default_partitions_per_leaf engine variable.

Aggregator

An aggregator is a node that routes queries to the leaves, aggregates intermediate the results, and sends the results back to the client.
There are two types of aggregators: master and child. A cluster contains exactly one master aggregator, a specialized aggregator responsible for cluster monitoring and failover.
A cluster may also contain zero or more child aggregators (depending on query volume).

Time Series Functions:

For storing and manipulating time series data, SingleStore supports the following functions:

FIRST
LAST
TIME_BUCKET

FIRST
An aggregate function that returns the first value of a set of input values, defined as the value associated with the minimum time.

LAST
An aggregate function that returns the last value of a set of input values, defined as the value associated with the maximum time.

SELECT TIME_BUCKET('4d'), gameName
FROM play_activity ORDER BY 2, 1;

SELECT TIME_BUCKET('1d'), gameName
FROM play WHERE timeSince BETWEEN '2021-10-06T00:00:10.530+00:00' AND '2021-10-07T23:00:10.530+00:00' ORDER BY 2, 1;

SELECT TIME_BUCKET('6h', timeSince) as timeSince, gameName
FROM play
WHERE timeSince > now() - INTERVAL 1 day ORDER BY 2, 1;

Additional comments

Restart Studio

sudo systemctl restart singlestoredb-studio

Installation Steps

sudo apt-get update

http://localhost:9000/cluster
http://localhost:8080/cluster/localhost/dashboard

References

timeSince

DEV Community

Singlestore, MemSQL Basic understanding

Single Store

How SingleStore DB Works

Data Ingestion

Deploy

DB Creation

Pipelines

Create Pipeline

Load Data

Load Data from KAFA

Single Store Stored Procedure Example

Manage Data

Aditional Concepts:

Cluster

Connect to your Cluster

Node

Leaf

Partition

Aggregator

Time Series Functions:

Additional comments

Installation Steps

References

Top comments (0)