DEV Community

Cover image for BLOCKSET v0.2
Sergey Shandar
Sergey Shandar

Posted on

BLOCKSET v0.2

blockset v0.2

I'm pleased to announce that blockset v0.2 has been released. It's the first working version.

What's the blockset?

The blockset application is a command line program that can store and retrieve data blocks using a content-dependent tree (CDT) hash function as a universal address of the blocks. The CDT hash function splits data into small connected parts of various sizes. The algorithm allows the detection of the same parts in blocks, even if they are located in different positions of the files. In essence, storage and network systems based on a CDT hash function should save space and traffic by detecting the same duplicate parts in data blocks. For example, it may save significant space if we store build artifacts of CI in such storage.

CDT function in the blockset

There are a lot of possible CDT hash functions. As a community, we should select only a few, to make communication and storage more efficient. After multiple attempts, I selected one, which I call CDT0. It uses SHA224 as a compress function and Crockford's base32 (45 characters) as a printable address, suitable for URLs and file names. I would like to publish an RFC for the function when I have more time.

The CDT storage

There are different ways we can build storage based on the CDT0 function. The blockset stores parts as a set of relatively small files located in a cdt0/ folder. Keeping block parts as files in the cdt0/ folder storage has its pros and cons, which are outlined below:

Advantages of the blockset storage

  • A simple file copy command can synchronize multiple storages. When files in different storages have matching names, they contain identical content, eliminating user dilemmas over potential overwrites.
  • The files can be stored statically on CDN, and a relatively simple script can download parts and restore a requested data block. Each blockset file is relatively small (about several kilobytes), so the script can use a simple fetch function. There is no need for fancy P2P network protocols, nodes, and custom servers.

Disadvantages of the blockset storage

  • As mentioned before, the blockset maintains many small files. Keeping a lot of small files is not space-efficient.
  • Each blockset file represents only one hash. However, the CDT hash function offers a superior better resolution. This higher resolution can increase the likelihood of detecting identical parts within data blocks.

There are multiple solutions to how these problems can be solved. We can use multiple different internal storage formats and synchronize multiple storages using different protocols as long as we use the same CDT function.

Installation of blockset

The blockset can be installed on any computer and platform that supports Rust. To install Rust, see this page.

Installing the blockset:

cargo install blockset
Enter fullscreen mode Exit fullscreen mode

Uninstalling the blockset:

cargo uninstall blockset
Enter fullscreen mode Exit fullscreen mode

Commands

Address validation:

blockset validate 3v1d4j94scaseqgcyzr0ha5dxa9rx6ppnfbndck971ack
Enter fullscreen mode Exit fullscreen mode

Calculate address:

blockset address ./README.md
Enter fullscreen mode Exit fullscreen mode

Add to the local storage cdt0/:

blockset add ./LICENSE
Enter fullscreen mode Exit fullscreen mode

Get a file by address:

blockset get ngd7zembwj6f2tsh4gyxrcyx26h221e3f2wdgfbtq87nd ./old.md
Enter fullscreen mode Exit fullscreen mode

Internals

The blockset is an open-source project under GPL-3 license. You can find its source code here. The project is written in Rust, and we've made a deliberate choice to minimize the use of macros. This enhances code readability and reduces hidden control flows, ensuring a more transparent and developer-friendly experience. Currently, the blockset code has no third-party dependencies. All source files except main.rs don't use I/O directly, which allows us to achieve and maintain 100% code coverage.

Don't hesitate to contact me if you would like to know more, would like to build on either CDT0 or blockset, or need another license:

Top comments (0)