DEV Community

Sukma Rizki
Sukma Rizki

Posted on

Pipeline Concept

The definition that is easiest to understand from the author is several processes that run concurrently, each of which is part of a series of process stages that are related to each other.

The analogy is like this: imagine a process flow for routine database autobackups, where there are many databases to be backed up. For the backup itself we use the go program, not a shell script. Perhaps in outline a series of process stages that will be carried out are as follows.

  1. We need a data list of all databases that must be backed up, along with their access addresses and credentials.
  2. We run the backup process, either sequentially (after db1, finish, continue db2, continue db3, etc.), or in parallel (db1, db2, db3, etc. backup processes and others are run simultaneously).
  3. In each database backup process, several processes are carried out

A. Perform a dump operation on the database, the output is in the form of many files saved to a folder.
B. The dump files are then archived in .zip or .tar .gz format (for example)
C. The archive file is sent to a backup server for example AWS S3.

If you pay attention to the case above, it might be better in terms of performance if the backup process for many databases is done in parallel. and to this the author agrees.

And it would be even better if the processes for each database backup process, A, B and C, were run concurrently, by making the three processes (A, B, C) a concurrent process, then the I/O would be more efficient. Later, between processes A, B and C, the execution will remain sequential (because it must run sequentially. It is not permissible if, for example, B is executed first and then A); However, the goroutine which will be responsible for the execution of process A is complete. we can continue with execution B (which is the next stage of process A) plus execution of other processes (another database); in parallel. So the goroutine that handles A doesn't become idle.

Please pay attention to the following visualization. The column is a representation of goroutines that run simultaneously. But because the three goroutines are a series of processes, the processes are always sequential, while the rows represent a sequence.

Image description

In Go, generally the process in the form of a goroutine that is executed is concurrent, but in flow it must be sequential, it is called a pipeline, so for the moment let's just assume that pipeline A is a goroutine for process A, pipeline B is goroutine B and so on.

To make it easier to understand the table, please follow the sequential explanation:

  1. Sequence 1: Pipeline A will perform a dump process from db1. at the same time, pipelines B and C are Idle. 2.Sequence2: the db1 dumy process has been completed, then proceed to the next stage, namely the achive db1 data dump process carried out by pipeline B. and at the same time, pipeline A carries out the db2 dump process. Pipeline C is still idle.
  2. Sequence 3: pipeline A is running the db3 dump process. At the same time, pipeline B has not yet run the db2 achiving process which has been dumped because archiving db1 is still not finished. The pipeline is still idle.
  3. Sequence 4: the db1 archiving process is complete, then proceed to the next stage, namely sending the archive to the backup server, the process of which is handled by pipeline C. At the same time, pipeline B starts to run db2 data dump archiving and pipeline A dumping
  4. ... and so on.

In this example we assume that pipeline A has only one goroutine, pipeline B also has one goroutine, and so does pipeline C. But actually in real world implementation there can be many goroutines for each pipeline (many goroutines for pipeline A, many goroutines for pipeline B , many goroutines for pipeline C).

I hope that my writing can explain. Even if it is not clear, the internet is open with LAN sources.

Top comments (0)