DEV Community

Yorkie Liu
Yorkie Liu

Posted on

Faster Pipcook 1.2, machine learning in JavaScript

2 months later, Pipcook ushered in the release of the second stable version (v1.2). Let’s take a look at the improvements and enhancements in this version.

Features at a glance

In the past 2 months, the development team has made targeted optimizations for service startup, plug-in installation, and pipeline execution time, especially the time when the pipeline was executed the most by internal users to start training. It takes more than 5 minutes to start training the model, and the now optimized Pipeline only needs 10 seconds to start.

Train the model faster

In the v1.0 version, each pipeline is divided into different stages, such as DataCollect for collecting data sets, ModelDefine for defining models, or DatasetProcess for processing data sets. In the last stable version, training a simple component (picture) classification task took nearly 2 minutes to process the data (the time grows linearly with the size of the data set).

There are two reasons for this:

  • In the definition of v1.0 Pipeline, before the previous stage has completely processed the data, it will not enter the next stage, but in fact, for example, in the process of data collection and processing, there is a lot of I/O waiting time and CPU idle time.
  • In the definition of v1.0 Pipeline, data type plug-ins (DataCollect, DataAccess, DataProcess) were previously passed through the path of the file, which not only caused a large number of repeated disk read and write operations in a pipeline process, but also It makes it impossible to start some calculations for numbers like normalization (Normalization).

So in PR#410, the mechanism of asynchronous Pipeline is introduced and Sample is used as the unit of data transfer between plug-ins. The advantages of this are:

  • Once the first sample is produced by the previous plug-in, the subsequent plug-ins can be loaded. This solves the problem that the subsequent plug-ins need to wait for all the data to be processed, which greatly advances the training start time.
  • Reduce unnecessary and repetitive read and write operations. The Sample is transferred to the memory between plug-ins, and the processed value is stored in the memory for later use by the plug-ins.

With the help of the asynchronous pipeline, we successfully reduced the pipeline entry time from 1 minute 15 seconds to 11 seconds, and also shortened the overall training time.

Plug-in installation is faster

In the new version, we have also optimized the plug-in installation process. At present, most of the Pipelines in Pipcook still rely on the Python ecology. Therefore, when installing these plug-ins, the dependencies of Python and Node.js will be installed at the same time. In v1 Before .2, Pipcook was installed serially, so in PR#477, we parallelized the installation of Python and Node.js packages to reduce the overall installation time.

In subsequent versions, we will continue to explore the optimizations brought about by parallelization, and try to analyze each installation task (Python and Node.js packages) and schedule the installation tasks to achieve a more reasonable parallel installation.

Faster initial startup

Starting from Pipcook 1.2, users no longer need to install Pipboard locally. We deployed Pipboard as an online service through Vercel and migrated all the code to imgcook/pipboard.

Users can use the functions of Pipboard through https://pipboard.vercel.app/, but there are still some parts that need to be adjusted, such as the remote Pipcook Daemon is not supported.

The subsequent release cycle of Pipboard will be independent of Pipcook, that is to say, we encourage everyone to develop their own Pipboard based on Pipcook SDK, and Pipboard itself will be provided as a demo or a sample application provided by default.

Support Google Colab

If you continue to follow Pipcook users, you must have noticed early on that a link to Google Colab has been added at the beginning of some tutorials in the official document! Yes, Pipcook supports running on Google Colab, which means that for beginners who are trapped without GPU, they can learn Pipcook through the free GPU/TPU on Google Colab, just start from the following two links , You can start your front-end component identification journey:

Plug-in Python runtime for algorithm engineers

In order to facilitate algorithm engineers to contribute models to Pipcook at a lower threshold, we have added support for pure Python runtime. For contributors, in addition to defining an additional package.json, it can be completed without writing any JavaScript code. The development of plug-ins (model classes), and in order to facilitate algorithm engineers to get started easily, we developed a NLP (NER) Pipeline when the Python-based plug-in is running. The related plug-ins are as follows:

Pipcook SDK released

As mentioned earlier, we moved Pipboard out of Pipcook and released it independently. We hope that developers can develop Pipboard or any other form of application that suits their needs through Pipcook SDK. Therefore, we will officially release Pipcook SDK in v1.2. It supports the use of designated Pipcook services in the Node.js and JavaScript runtime environment to complete the management of Pipeline and training tasks.

const client = new PipcookClient('your pipcook daemon host', port);
const pipelines = await client.pipeline.list(); // list all pipelines

Pipcook SDK API documentation: Click here.

Daily (Beta) version and Release version

In order to allow users to use Pipcook selectively, we have updated our version release cycle in the past two months. The specific rules are as follows:

  • The Beta version or Daily version will be automatically built and released by the CI (GitHub Actions) system every day. If you want to try the latest version, users can use pipcook init beta or pipcook init --beta to get it.
  • Release version
    • The base version (such as 1.1, 1.3, etc.) is an unstable version, which mainly incorporates some larger experimental features
    • The even-numbered version (such as 1.0, 1.2, etc.) is a stable version, mainly for stability, performance, etc. more repairs and optimization
    • All Release versions will follow the Semver2.0 specification

Next version plan (v1.4)

According to the plan, we will release Pipcook v1.4 in two months. The development team will still focus on how to make Pipcook "faster".

For example, after training the model, if you want to use it in the Node.js environment, you still need very lengthy steps to perform the installation of NPM (which will install Python and related dependencies), we hope that the model after training can be directly Use it without any tedious pre-steps.

In terms of models, we will support a more lightweight target detection model (YOLO/SSD), which can easily perform target detection tasks in some simple scenarios.

Further reading

Top comments (0)