Now I know why NVIDIA stocks are high

#machinelearning #datascience #computerscience #tensorflow

I was curious when I was constantly getting notifications that NVIDIA stocks were high, but I didn't pay attention to it for a very long time (I knew subconsciously it had something to do with chip making. That's it.).
Finally, when I did that, I learned about GPGPU. And I shared my findings with you guys in the last article I wrote.

Harnessing GPU Power for General-Purpose Computing

While I was doing my weekend's mundane, purposeless reading, I found this page on Apple.

Apple - Metal: Computations on GPU

Interesting huh? I will give a simplified version of the above.

Performing Calculations on a GPU with Metal

Essentially, you get the GPU device through Metal, send the data to it, get it processed with the code you have written in MSL, and get the result.

Now let's get to the interesting part. I wanted to see how much of a difference this process actually makes. Keep in mind that there is processing involved in getting the data into the GPU and getting the result out.

I used the example provided by Apple but changed the operation it had. I felt it was too simple, so I changed it.

From

result[index] = inA[index] + inB[index];

To

float dotProduct = inA[index] * inB[index];
result[index] = 1.0 / (1.0 + exp(-dotProduct));

I made a graph using Claude to visually show the complexity.

And instead of just checking results from the GPU using the for loop, I used DispatchQueue to process the same data to compare CPU and GPU performance. For that, I recorded the start time and end time to get the elapsed time for both.

Using the DispatchQueue

- (void) usingDispatchQueue
{
    float* a = _mBufferA.contents;
    float* b = _mBufferB.contents;
    uint64_t start = mach_absolute_time();
dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
    dispatch_apply(arrayLength, queue, ^(size_t index) {
        // Compute the expected dot product
        float dotProduct = a[index] * b[index];

        // Apply the sigmoid function to the dot product
        float expected = 1.0 / (1.0 + exp(-dotProduct));
        //printf("Expected: %f \n", expected);

    });

    uint64_t end = mach_absolute_time();
    uint64_t elapsed = end - start;

    mach_timebase_info_data_t info;
    mach_timebase_info(&info);
    double elapsedNano = (double)elapsed * (double)info.numer / (double)info.denom;

    printf("Time taken - DispatchQueue: %f nanoseconds\n", elapsedNano);

}