Introduction to Scramjet Data Processing Platform

#javascript #typescript #bigdata #serverless

New engine for serverless data processing applications

Our Scramjet data apps engine has its approach to deploying and running serverless applications; in several aspects different from solutions in areas such as software buses, integration platforms, and FaaS offerings. This article explains our approach. We hope you might find it interesting.

“3 in 1” data processing platform

The heart of our solution and its data engine is called “Scramjet Transform Hub”. It’s available as a standalone software package on GitHub and will be the core element of our Scramjet Cloud Platform.

We name our approach “3 in 1 data processing platform” as it combines 3 concepts into one solution:

data processing engine
serverless data applications
complete API with dedicated CLI (covering both I/O and management endpoints)

Let’s look at each point separately.

Data processing engine

Scramjet Transform Hub creates unified deployment, runtime, management, and execution plan for serverless applications (sequences).

In short, STH allows you to start data processing in 3 simple steps:

Deploy

si sequence send <sequence-package-tar>

Run

si sequence run <sequence-id>

Send data

curl -H “Content-Type: application/octet-stream” — data-binary “@file.txt” <instance-input-endpoint>

You are free to post to our sequence simple HTTP requests, file, send a stream or even read data from another stream or API.
Please notice that, contrary to typical microservices architectures, there is no expensive step of building containerized image, pushing it to the registry, and then downloading it to the container orchestrator to run the microservice. You can move from directory with code to sequence processing your data in less than a minute.

We do package our apps but their size is measured in kilobytes, not in hundreds of megabytes as in the case of container images. Minimal app design gives better performance, optimized resources usage, and a simpler CI/CD process.

We have prepared a short, 3-minute demo, showing the whole application (sequence) preparation, deployment, and run process on our YouTube channel. Check it below:

Serverless data applications

We call the user applications sequences. They have capabilities to perform continuous data and stream processing, they have no run time limits or input data size limits.

Each sequence has a straightforward structure — it’s a directory with at least two core files:

package.json — simple JSON file describing sequence metadata
index.[js/ts] — JavaScript or TypeScript file with sequence code. You are free to structure your app in multiple files if you like.

Below there is the content of one of our sample “hello world” sequences, yielding integer numbers.

const {DataStream} = require(“scramjet”);

module.exports = async function(stream, start = 0, end = 1000) {
 await DataStream.from(async function*() {
 let i = +start || 0;
 while(i++ < end) {
 await new Promise(res => setTimeout(res, 1000));
 yield {x: i};
 }
 })
 .do(console.log)
 .run()
}

You can find intro readme and more samples in our dedicated repository scramjetorg/scramjet-cloud-docs

API & CLI

Let’s look at Transform Hub API via commands available in our CLI:

pack [options] — package directory with sequence code into tar.gz file
host [command] — monitor and check the version of the host
config|c [command] — display and manage config
sequence|seq [command] — pack, deploy, manage and monitor sequences (app templates)
instance|inst [command] — manage and monitor instances (running apps)

The above commands (and related API) cover complete management of the data processing engine and serverless apps running on top of it.
Once started, each running instance exposes the following API endpoints:

input, output
stdin, stdout, stderr
log, monitoring
_event (to instance), event (from instance)
stop, kill

This approach follows the “batteries included” approach, and each running instance is handled in the same way.

Why our Scramjet Cloud Platform

Our approach shown above has several benefits:

Freedom and flexibility — no artificial limits on data size and execution time of apps; no “execution time limit” or “payload size limit”.
Great price for value-effective data workflows with fully programmable data acquisition, ability to create patterns between instances performing various data processing tasks.
Performance by design — instantaneous execution of data without proxies, queues, and gateways. Light apps with minimal resources consumption.
Works cross-native (Edge & Cloud) — out of the box spanning between locations. Run the same type of apps on edge or smart devices via standalone Scramjet Transform Hub and in our Scramjet Cloud Platform.

As a summary, below you will find a diagram showing various patterns of chaining data processing on our platform: