Cuda capability 5.0 support

Aug 12, 2014 at 6:44 PM
Hi Michael,

I have GTX 750 Ti which is supposed to be Maxwell 5.0 (I have checked device properties and here is 5.0). However when I try to use dynamic parallelism - to call kernel from kernel - I get strange error when I am trying to load module (I am ofc compiling with compute_50, sm_50 and -rdc=true which I have found is needed (btw I am using it anyway)):
"ErrorNoBinaryForGPU: This indicates that there is no kernel image available that is suitable for the device. This can occur when a user specifies code generation options for a particular CUDA source file that do not include the corresponding device configuration."
I have written test app and compiled it directly using nvcc and it was working. This is why I do suspect managedCUDA.

Best regards, Martin.
Aug 12, 2014 at 10:57 PM
Hi Martin,

this is not a problem caused by managedCuda, it is the driver API of CUDA the we use. In order to use dynamic parallelism, you need to link the ptx file first with the cuda device runtime library before you load the kernel.

For example a modified vector add kernel from the Cuda samples:
//called from kernel addKernel:
__global__ void addKernel2(int *c, const int *a, const int *b)
    int i = threadIdx.x;
    c[i] = a[i] + b[i];
//called from host:
extern "C"
__global__ void addKernel(int *c, const int *a, const int *b)
    addKernel2<<<1, 5>>>(c, a, b);
If you compile this to a ptx (with option -rdc=true) then you do the following in C# using CudaLinker:
CudaContext ctx = new CudaContext();
//Add an info and error buffer to see what the linker wants to tell us:
CudaJitOptionCollection options = new CudaJitOptionCollection();
CudaJOErrorLogBuffer err = new CudaJOErrorLogBuffer(1024);
CudaJOInfoLogBuffer info = new CudaJOInfoLogBuffer(1024);
options.Add(new CudaJOLogVerbose(true));
byte[] tempArray = null;

    CudaLinker linker = new CudaLinker(options);
    linker.AddFile(@"kernel.ptx", CUJITInputType.PTX, null);
    //important: add the device runtime library!
    linker.AddFile(@"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.0\lib\Win32\cudadevrt.lib", CUJITInputType.Library, null);
    tempArray = linker.Complete();

    CudaKernel k = ctx.LoadKernelPTX(tempArray, "addKernel");

    CudaDeviceVariable<int> a = new int[] { 10, 20, 30, 40, 50 };
    CudaDeviceVariable<int> b = new int[] { 1, 2, 3, 4, 5 };
    CudaDeviceVariable<int> c = new CudaDeviceVariable<int>(5);

    k.GridDimensions = 1;
    k.BlockDimensions = 1;
    k.Run(c.DevicePointer, a.DevicePointer, b.DevicePointer);

    int[] erg = c;
catch (Exception) //if done right, only catch linker errors...
    MessageBox.Show(err.Value); //tell what went wrong
//Todo: clean up everything...
Aug 14, 2014 at 1:58 PM
Thanks for reply,

I have adopted your code, included those log classes and changed LoadModulePTX to LoadKernelPTX and it is working. Problem is introduced if I try LoadModulePTX then compiler is throwing this ErrorNoBinaryForGPU.

Anyway, problem solved! (

Best, Martin.