Single Store
SingleStore DB is a distributed, relational database that handles both transactions and real-time analytics at scale. Querying is done through standard SQL drivers and syntax, leveraging a broad ecosystem of drivers and applications. Read the links below to get familiar with SingleStore DB:
How SingleStore DB Works
SingleStore DB is a distributed, relational database that handles both transactions and real-time analytics at scale. It is accessible through standard SQL drivers and supports ANSI SQL syntax including joins, filters, and analytical capabilities (e.g. aggregates, group by, and windowing functions).
Data Ingestion
SingleStore DB can load data continuously or in bulk from a variety of sources. Popular loading sources include: files, a Kafka cluster, cloud repositories like Amazon S3, HDFS, or from other databases. As a distributed system, SingleStore DB ingests data streams using parallel loading to maximize throughput.
SingleStore Pipelines is an easy-to-use built-in capability that extracts, transforms, and loads external data using sources such as Kafka, S3, Azure Blob, and filesystems.
Deploy
SingleStore DB can be deployed on bare metal, on virtual machines, or in the cloud by using SingleStore Tools
- Managed Service
- The scalable cloud database for data-intensive applications, deployed on AWS, Azure, or GCP
- DB Software
- Manually deploy and manage a SingleStore database cluster on your own hardware.
DB Creation
DROP TABLE IF EXISTS play_game;
CREATE DATABASE play_game;
USE play_game;
CREATE TABLE play_game (
msgId INT,
msgDateTime DATETIME NOT NULL,
gameName varchar(60),
SORT KEY (timeSince),
SHARD KEY (msgId)
);
Pipelines
MemSQL Pipelines is a MemSQL Database feature that natively ingests real-time data from external sources. As a built-in component of the database, Pipelines can extract, transform, and load external data without the need for third-party tools or middleware. Pipelines is robust, scalable, highly performant, and supports fully distributed workloads.
- Alter pipeline
- Create Pipeline
- Create Pipeline with Transform
- Create Pipeline into Procedure
- Extract Pipeline into outfile
Create Pipeline
CREATE PIPELINE play_game
AS LOAD DATA KAFKA '127.0.0.1:9092/play-games-1'
SKIP DUPLICATE KEY ERRORS
INTO TABLE play_game
FORMAT JSON (
msgId <- msgId,
msgDateTime <- msgDateTime,
gameName <- gameName,
timeSince <- timeSince
)
START PIPELINE play_game;
Basic Comments
Syntax | Comments |
---|---|
Delete Pipeline | DELETE PIPELINE play_games; |
Delete Database | DROP DATABASE IF EXISTS play_game; |
Delete Table | DROP TABLE IF EXISTS play_game; |
Stop Pipeline | STOP PIPELINE play_game; |
Test Pipeline | Test PIPELINE play_game; |
Extract Pipeline | EXTRACT PIPELINE play_games INTO OUTFILE 'file_name.json' |
Load Data
The easiest way to load data is to first upload it to Amazon S3 or Azure Blob Storage. Then, use SingleStore Pipelines to extract your data.
- Load Data from Amazon Web Services (AWS)
- Load Data from Microsoft Azure
- Load Data from the Filesystem using a Pipeline
- Load Data from Kafka
- Load Data from MySQL etc
Load Data from KAFA
bin/kafka-topics.sh --create --topic play-games-1 --partitions 1 --bootstrap-server localhost:9092 --replication-factor 1
bin/kafka-topics.sh --describe --topic play-games-1 --bootstrap-server localhost:9092
bin/kafka-console-producer.sh --topic play-games-1 --bootstrap-server localhost:9092
Insert into the data from Kafka usimng JSON
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic play-games-1 < PlayActivity.json
bin/kafka-console-consumer.sh --topic play-games-1 --from-beginning --bootstrap-server localhost:9092
List Topic
bin/kafka-topics.sh --list --bootstrap-server localhost:9092
Delete Topic
bin/kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic DummyTopic
Single Store Stored Procedure Example
USE quickstart_kafka;
CREATE TABLE test2 (id int, fname varchar(50), lname varchar(50), addr varchar(50));
DELIMITER //
CREATE OR REPLACE PROCEDURE process_users(GENERIC_BATCH query(GENERIC_JSON json)) AS
BEGIN
INSERT INTO test3(id,fname,lname,addr)
SELECT GENERIC_JSON::id, GENERIC_JSON::fname,GENERIC_JSON::lname,GENERIC_JSON::addr
FROM GENERIC_BATCH;
END //
DELIMITER ;
CREATE or replace PIPELINE jsonproce1 AS LOAD DATA KAFKA '127.0.0.1:9092/test2'
INTO PROCEDURE process_users (GENERIC_JSON <- %)FORMAT JSON ;
test pipeline jsonproce1;
start pipeline jsonproce1;
Manage Data
- Local and Unlimited Database Storage Concepts
- Unlimited Data Storage
- Backing Up and Restoring Data
- Exporting Particular Database >> mysqldump -h 127.0.0.1 -u root -p -P 3306 foo > foo.sql
- Export All Database >> mysqldump -h 127.0.0.1 -u root -p -P 3306 --all-databases > full_backup.sql
Aditional Concepts:
Open MemSQL Studio
https://aws.amazon.com/quickstart/architecture/memsql/
https://aws-quickstart.s3.amazonaws.com/quickstart-memsql/doc/memsql-on-the-aws-cloud.pdf
--> SQL Editor
--> Master Aggregator Node (1 Master) (Aggregator we will share the meta data)
--> Child Aggregator Node (2 Nodes)
--> Leaf Node (4 Nodes) - Leaf node we will share the data
Every cluster atleast 1 master aggregater and 1 leaf node
Cluster
A cluster encompasses all of the nodes that are included in a complete SingleStore DB installation. A cluster contains aggregator nodes and leaf nodes.
Connect to your Cluster
You have three ways to connect to your SingleStore DB cluster: SingleStore DB Studio, the singlestore client application, or through any compatible third-party applications.
Node
A node is a server that has an installation of a SingleStore DB instance.
Leaf
A leaf is a node that stores a subset of a cluster’s data. A cluster typically contains many leaves.
Partition
A partition contains a subset (a shard) of a database’s data. A leaf contains multiple partitions. When you run CREATE DATABASE, SingleStore DB splits the database into partitions,
which are distributed evenly among available leaves. With CREATE DATABASE, you can specify the number of partitions with the PARTITIONS=X option.
If you don’t specify the number of partitions explicitly, the default is used (the number of leaves times the value of the default_partitions_per_leaf engine variable.
Aggregator
An aggregator is a node that routes queries to the leaves, aggregates intermediate the results, and sends the results back to the client.
There are two types of aggregators: master and child. A cluster contains exactly one master aggregator, a specialized aggregator responsible for cluster monitoring and failover.
A cluster may also contain zero or more child aggregators (depending on query volume).
Time Series Functions:
For storing and manipulating time series data, SingleStore supports the following functions:
FIRST
LAST
TIME_BUCKET
FIRST
An aggregate function that returns the first value of a set of input values, defined as the value associated with the minimum time.
LAST
An aggregate function that returns the last value of a set of input values, defined as the value associated with the maximum time.
SELECT TIME_BUCKET('4d'), gameName
FROM play_activity ORDER BY 2, 1;SELECT TIME_BUCKET('1d'), gameName
FROM play WHERE timeSince BETWEEN '2021-10-06T00:00:10.530+00:00' AND '2021-10-07T23:00:10.530+00:00' ORDER BY 2, 1;SELECT TIME_BUCKET('6h', timeSince) as timeSince, gameName
FROM play
WHERE timeSince > now() - INTERVAL 1 day ORDER BY 2, 1;
Additional comments
Restart Studio
- sudo systemctl restart singlestoredb-studio
Installation Steps
sudo apt-get update
http://localhost:9000/cluster
http://localhost:8080/cluster/localhost/dashboard
References
timeSince
Top comments (0)