What if you need to store a variety of data decentralized? Objects, arrays, dates, numbers, strings, yes anything. Is it necessary to develop a powerful DBMS for this? Indeed, often we just need to store and receive data in a distributed, openly, but as simple as possible and without any special claims.
In this article, I would like to reveal a little bit about the metastocle library, which can be used to solve the above problem easily, but with some limitations.
A bit of background
About a year ago, there was a desire and need to create a music storage. See this article for more details. From the very beginning, it was clear that you need to write everything so that you can do the same with other entities in the future: books, videos, etc. it was decided to divide everything into layers that can be used independently.
Metastocle is one of the layers that allows you to store and retrieve many types of data (but not files), as opposed to the storacle layer, which implements working with files.
When we save files, we need to write the hashes somewhere so that we can access them later. This is exactly why we need metastocle. It's where we keep everything we need: the names of the songs, links to files etc.
As a result, all this was brought to a certain universal form, and the system consists of three main entities:
- Collections - an entity for defining the data structure, various options, and so on.
- Documents - data itself, as objects.
- Actions(Instructions) - a set of rules for processing the required data: filtering, sorting, limiting, and so on.
Let's look at a couple of examples:
Server:
const Node = require('metastocle').Node;
(async () => {
try {
const node = new Node({
port: 4000,
hostname: 'localhost'
});
// Creating a collection
await node.addCollection('test', { limit: 10000, pk: 'id' });
await node.init();
}
catch(err) {
console.error(err.stack);
process.exit(1);
}
})();
Client:
const Client = require('metastocle').Client;
(async () => {
try {
const client = new Client({
address: 'localhost:4000'
});
await client.init();
// Adding a document
const doc = await client.addDocument('test', { text: 'hi' });
// Updating this document
await client.updateDocuments('test', { text: 'bye' }, {
filter: { id: doc.id }
});
// Adding another document
await client.addDocument('test', { id: 2, text: 'new' });
// Getting the second document
const results = await client.getDocuments('test', {
filter: { id: 2 }
});
// Getting it differently
const doc2 = await client.getDocumentById('test', 2));
// Adding more documents
for(let i = 10; i <= 20; i++) {
await client.addDocument('test', { id: i, x: i });
}
// Getting the documents that meet all the conditions
const results2 = await client.getDocuments('test', {
filter: { id: { $gt: 15 } },
sort: [['x', 'desc']],
limit: 2,
offset: 1,
fields: ['id']
});
// Deleting documents with id > 15
await client.deleteDocuments('test', {
filter: { id: { $gt: 15 } }
});
}
catch(err) {
console.error(err.stack);
process.exit(1);
}
})();
Customers can not create a collection. The network sets the structure itself, and users only work with documents. Collections can also be described declaratively, via node options:
const node = new Node({
port: 4000,
hostname: 'localhost',
collections: {
test: { limit: 10000, pk: 'id' }
}
});
Main collection parameters:
- pk - a primary key field. You can omit this if it is not required. If this field is specified, a uuid hash is created by default. But you can pass any integer or string.
- limit - maximum number of documents per node
- queue - queue mode: if enabled, when the limit is reached, certain documents are deleted to record new ones
- limitationOrder - if the limit and queue are enabled, then you can specify sorting rules to determine which documents to delete. By default, those that have not been used for a long time are deleted.
- schema - document field structure
- defaults - default values for document fields
- hooks - document field hooks
- preferredDuplicates - you can specify the preferred number of duplicate documents in the network
The structure of the collection fields (schema) can be described as:
{
type: 'object',
props: {
count: 'number',
title: 'string',
description: { type: 'string' },
priority: {
type: 'number',
value: val => val >= -1 && val <= 1
},
goods: {
type: 'array',
items: {
type: 'object',
props: {
title: 'string',
isAble: 'boolean’
}
}
}
}
}
All rules can be found in the function utils.validateSchema() in https://github.com/ortexx/spreadable/blob/master/src/utils.js
Default values and hooks can be like that:
{
defaults: {
date: Date.now
priority: 0
'nested.prop': (key, doc) => Date.now() - doc.date
},
hooks: {
priority: (val, key, doc, prevDoc) => prevDoc? prevDoc.priority + 1: val
}
}
Main features of the library:
- Working on the CRUD principle
- Storing all Javascript data types that can be serialized, including nested ones.
- Data can be added to storage through any node.
- Data can be duplicated for greater reliability.
- Queries may contain nested filters
Isomorphism
The client is written in javascript and is isomorphic, it can be used directly from your browser.
You can upload a file https://github.com/ortexx/metastocle/blob/master/dist/metastocle.client.js as a script and get access to window.ClientMetastocle or import via the build system etc
Client Api
- async Client.prototype.addDocument() - adding a document to the collection
- async Client.prototype.getDocuments() - getting documents from the collection according some instructions
- async Client.prototype.getDocumentsСount() - getting the number of documents in the collection
- async Client.prototype.getDocumentByPk() - getting a document from a collection using the primary key
- async Client.prototype.updateDocuments() - updating documents in the collection according to some instructions
- async Client.prototype.deleteDocuments() - deleting documents from the collection according to some instructions
Basic actions (instructions)
.filter - data filtering, example:
{
a: { $lt: 1 },
$and: [
{ x: 1 },
{ y: { $gt: 2 } },
{
$or: [
{ z: 1 },
{ "b.c": 2 }
]
}
]
}
.sort - data sorting, example:
{ sort: [['x', 'asc'], ['y.z', 'desc']] }
.limit - amount of data
.offset - starting position for data selection
.fields - the required fields
All instructions and possible values are described in more detail in the readme.
Using the command line
The library can be used via the command line. To do this you need to install it globally: npm i -g metastocle --unsafe-perm=true --allow-root. After that, you can run the necessary actions from the project directory.
For example, metastocle -a getDocumentByPk -o test -p 1 -c ./config.js, to get a document with the primary key 1 from the collection "test". All actions can be found in https://github.com/ortexx/metastocle/blob/master/bin/actions.js
Limitations
- All data is first stored in memory, and later written to a file, at certain intervals, and when exiting the process. Therefore, first, you need to have enough RAM, and second, keep in mind that you will not be able to run multiple processes to work with the same database.
- Sharding at the level of the entire network has not been implemented very effectively yet. Priority is given to duplication, because the size of the network is unstable: nodes can be disconnected, connected, and so on at any time. So if you want to get a large amount of data from the network, keep in mind that all this will be collected via the HTTP protocol, without much optimization.
I came to the choice of the stack and these restrictions deliberately, because there was no goal and possibility to create a full-fledged DBMS.
Although the library is still a bit crude in terms of optimizing data requests, but if you follow certain rules, everything is fine:
- You need to narrow the selection of data as much as possible, and try to organize everything so that you get documents by keys, or by some other fields, but filtered to the optimal size.
- If you still need to pull a lot of data, you will have to limit each server, based on their optimal size, to transfer them over the network. For example, if 10,000 documents in a collection weigh 100 KB in compressed form, then by limiting the collection at each node to this value, we will get everything at an acceptable speed.
For any questions, please contact:
Top comments (0)