GPU Algorithms » Single Task (cudaFlow)

cudaFlow provides a template function that constructs a task to run the given callable using a single kernel thread.

Run a Task with a Single Thread

You can create a task to run a kernel function just once, i.e., using one GPU thread. This is handy when you want to set up a single or a few global parameters that do not need multiple threads and will be used by multiple kernels afterwards. The following example creates a single-task kernel that sets gpu_parameter to 1.

int* gpu_parameter;
cudaMalloc(&gpu_parameter, sizeof(int));
tf::Task = taskflow.emplace([&] (tf::cudaFlow& cf) {
  // create a single task to set the gpu_parameter to 1
  tf::cudaTask set_par = cf.single_task([gpu_parameter] __device__ () {
    *gpu_parameter = 1;
  })
  
  // create two kernel tasks that need access to gpu_parameter
  tf::cudaTask kernel1 = cf.kernel(grid1, block1, shm1, my_kernel_1, ...);
  tf::cudaTask kernel2 = cf.kernel(grid2, block2, shm2, my_kernel_2, ...);

  set_par.precede(kernel1, kernel2);
});

Since the callable runs on GPU, it must be declared with a __device__ specifier.

Miscellaneous Items

The single-task algorithm is also available in tf::cudaFlowCapturerBase::single_task.