Async process

Jul 20, 2015 at 9:02 PM
Can somebody post some working code utilizing async cuda?

Should I use CudaPageLockedHostMemory to buffer data before sending it to kernel? What is the workflow exactly? I checked managedcuda sources, it looks like only CudaPageLockedHostMemory has async transfer.
Jul 20, 2015 at 10:25 PM
That's right, only CudaPageLockedHostMemory has async copy methods and CudaDeviceVariable has only AsyncDeviceToDevice copy methods. Page-locked host memory is necessary so that async host to device and device to host copies are really async (normal host memory is automatically synched, I think).

Check out the simpleStreams sample in the download section for some real code which is a 1:1 port of the sample in the cuda SDK.