DEV Community

제민욱
제민욱

Posted on • Edited on

1

CUDA Series

[olcf's CUDA series]https://vimeo.com/showcase/6729038

01. CUDA C Basics

slide

  • Host: The CPU and its memory
  • Device: The GPU and its memory

Simple Processing Flow

Image description

  1. COPY memory (from CPU to GPU)
  2. Load GPU program and Execute
  3. COPY memory (from GPU to CPU)
  4. Free

Problem::vector addition

Image description

  • 1:1 (input:output)

Concepts

__global void mykernel(void) {};

mykernel<<<N,1>>>(); // Grid (N blocks), Block(1 thread)

Enter fullscreen mode Exit fullscreen mode
  • __global__ is kernel code (run in device)
  • <<<GRID, Block>>>, which means
    • GRID: # of blocks per grid
    • Block: # of threads per block
// 1-1. prepare gpu's global memory
cudaMalloc((void **)&d_a, size);

// 1-2. copy (to device A from host A)
cudaMemcpy(d_a, a, size, cudaMemcpyHostToDevice);

// 2. Load and Execute
add<<<N,1>>>(d_a, d_b, d_c)

// 3. Copy (GPU -> CPU)
cudaMemcpy(c, d_c, size, cudaMemcpyDeviceToHost);

// 4. Free
free(a);cudaFree(d_a);
Enter fullscreen mode Exit fullscreen mode

02. Shared Memory

slide

Problem::1D Stencil

Image description

  • It is not an 1:1 (input: ouput) problem.
  • e.g. blue element is read seven times (if radius 3)

Image description

Concept::Shared Memory

Image description

  • On Chip memory (>= Global memory)
  • Per Block (invisible other blocks)
  • User managed memory
__shared__ int s[64];
...
Enter fullscreen mode Exit fullscreen mode

Starting from Volta (2017 and later), __shared__(SW) and the L1 cache(SW) share the same on-chip SRAM(HW) resources. Developers can configure how much of this SRAM is allocated to shared memory versus L1 cache depending on the application needs.

03. CUDA Optimization (1 of 2)

https://vimeo.com/showcase/6729038/video/398824746

AWS Q Developer image

Your AI Code Assistant

Generate and update README files, create data-flow diagrams, and keep your project fully documented. Built to handle large projects, Amazon Q Developer works alongside you from idea to production code.

Get started free in your IDE

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay