In the previous post, we discussed about sequential algorithms and the RAM Model. Here we will talk about some Parallel machine models that we can select for our task. But before that lets have an understanding what kind of parallelism we can achieve.
Parallelism on a single node
Unless you are using a couple decade old computer, you probably are using a system that has more than 1 cores. And if you can have more than one cores, we can utilise all of them for our algorithm. As everything resides on the same node, we can utilize shared memory to run our algorithms
Distributed Computing
This is where one system is not enough, for several reasons, and we have to utilize multiple nodes to be able to complete the task. An example of this is supercomputers or high performance computers. You must have done matrix multiplication on a 3x3 matrix. Now imagine, you need to do it on matrices which are shared as files a couple gigabytes each.
We are going to focus on parallelism on a single node.
PRAM or Parallel RAM
The attributes of PRAM Model are as follows
- It has p Processing elements, each having a unique id. The number of processors that are to be utilized can also depend upon the size of the input.
- Each processing element can run its own RAM-style program.
- Each processing element has its own registers but shared memory.
- Synchronising the processing elements also has some overhead.
So we have several processing elements(pe) doing its computations. Some information is being stored in registrations, some operations are being performed, and then resultant information is then being sent back to the memory.
But what if multiple processing elements write to the same memory location? We have conflict!
On the basis of this we can have several sub-divisions to the PRAM model.
EREW-PRAM
Exclusive Read Exclusive Write
If multiple pe try to access the same memory location, read or write, the program will crash
CREW-PRAM
Concurrent Read, Exclusive Write
Its fine if multiple pe are trying to read the same memory cell at the same time, but only one should write at a time.
CRCW-PRAM
Concurrent Read and Write
For this a rule is required for concurrent writes.
Execution Costs
We saw that in the RAM Model that the number of instructions tells the total computation required.
There is a bit more to the cost metrics of a PRAM Model as now the instructions are being run in parallel. So on top of space(memory access) and time(which is the maximum time by any PE taken to finish the allocate task), total instructions (all instructions combined).
Remember, if the parallel algorithm is taking more time than the best sequential algorithm you know of, its better if you forget about it, or atleast try to optimize it.
If interested, you can have a look at parallel implementation of the kmp string matching algorithm using WaitGroups
and Channels
.
Let me know in the comments if you would like a deep dive into how to design parallel algorithms.
Top comments (0)