This project is read-only.

JIT PTX Examples

Mar 31, 2013 at 4:19 AM
Great work!

Do you have any JIT PTX examples?
Mar 31, 2013 at 12:56 PM
Hi,

I modified here the VecAdd sample using now PTX JIT options. It's not really straight forward as values are passed as object[] and must be casted back to their intended type to retrieve output values:
//Setting PTX JIT options:
CUJITOption[] ops = new CUJITOption[6];
object[] vals = new object[6];

ops[0] = CUJITOption.WallTime;
ops[1] = CUJITOption.InfoLogBufferSizeBytes;
ops[2] = CUJITOption.InfoLogBuffer;
ops[3] = CUJITOption.ThreadsPerBlock; //seems to set inside CUDA the value CUJITOption.MaxRegisters
ops[4] = CUJITOption.ErrorLogBufferSizeBytes;
ops[5] = CUJITOption.ErrorLogBuffer;

uint bufferSize = 256;
vals[0] = 0.0f;
vals[1] = bufferSize;
vals[2] = new byte[bufferSize];
vals[3] = (uint)10; //a meaningless number to provoke some warning output...
vals[4] = bufferSize;
vals[5] = new byte[bufferSize];

//load kernel and JIT compile it with given options 
CudaKernel vectorAddKernel = ctx.LoadKernelPTX(stream, "VecAdd", ops, vals);

//Output result values
Console.WriteLine("PTX JIT compilation time: " + ((float)vals[0]).ToString() + " [ms]");
Console.WriteLine("PTX JIT Threads per Block: " + ((uint)vals[3]).ToString());
            
if ((uint)vals[1] > 0)
{
    System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
    string str = enc.GetString(vals[2] as byte[]);
    Console.WriteLine("PTX JIT Info output: " + str);
}

if ((uint)vals[4] > 0)
{
    System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
    string str = enc.GetString(vals[5] as byte[]);
    Console.WriteLine("PTX JIT Error output: " + str);
}
But ptx-kernels get only JIT-compiled if a compiled kernel is not cached. In order to actually see some output in log buffers re-create the ptx file.

Best,
Michael
Mar 31, 2013 at 4:00 PM
Thank you very much.

I will try it.
Mar 31, 2013 at 5:40 PM
I modified the kernel and that worked. I would like to send a string for a kernel but I don't see how to convert a string to an unmanaged stream or a byte array that it likes. Is there some easy way to do that?
Mar 31, 2013 at 5:59 PM
System.Text.ASCIIEncoding encoder = new System.Text.ASCIIEncoding();
byte[] bytearray = encoder.GetBytes("some string");
There's also an UTF encoder in case you need it.
Mar 31, 2013 at 6:09 PM
I tried that but it hangs when I try and load the kernel.
Mar 31, 2013 at 6:13 PM
Stream stream = Assembly.GetExecutingAssembly().GetManifestResourceStream(resource);

This returns and unmanaged stream which works. Sending a managed stream makes it hang just like sending a managed byte array.
Mar 31, 2013 at 6:13 PM
Can you ensure the string is a valid ptx kernel representation? I.e. a direct ouput from nvcc?
Mar 31, 2013 at 6:19 PM
I'm want to do JIT PTX so here's what I am sending:
    .version 1.4
    .target sm_10, map_f64_to_f32
    .entry VecAdd (
        .param .u64 __cudaparm_VecAdd_A,
        .param .u64 __cudaparm_VecAdd_B,
        .param .u64 __cudaparm_VecAdd_C,
        .param .s32 __cudaparm_VecAdd_N)
    {
    .reg .u16 %rh<4>;
    .reg .u32 %r<6>;
    .reg .u64 %rd<10>;
    .reg .f32 %f<5>;
    .reg .pred %p<3>;
    .loc    15  27  0
$LDWbegin_VecAdd:
    mov.u16     %rh1, %ctaid.x;
    mov.u16     %rh2, %ntid.x;
    mul.wide.u16    %r1, %rh1, %rh2;
    cvt.u32.u16     %r2, %tid.x;
    add.u32     %r3, %r2, %r1;
    ld.param.s32    %r4, [__cudaparm_VecAdd_N];
    setp.le.s32     %p1, %r4, %r3;
    @%p1 bra    $Lt_0_1026;
    .loc    15  31  0
    cvt.s64.s32     %rd1, %r3;
    mul.wide.s32    %rd2, %r3, 4;
    ld.param.u64    %rd3, [__cudaparm_VecAdd_A];
    add.u64     %rd4, %rd3, %rd2;
    ld.global.f32   %f1, [%rd4+0];
    ld.param.u64    %rd5, [__cudaparm_VecAdd_B];
    add.u64     %rd6, %rd5, %rd2;
    ld.global.f32   %f2, [%rd6+0];
    add.f32     %f3, %f1, %f2;
    ld.param.u64    %rd7, [__cudaparm_VecAdd_C];
    add.u64     %rd8, %rd7, %rd2;
    st.global.f32   [%rd8+0], %f3;
$Lt_0_1026:
    .loc    15  32  0
    exit;
$LDWend_VecAdd:
    } // VecAdd
That works.

This doesn't:
            string ptx = null;

            ptx = @"
                .version 1.4
                .target sm_10, map_f64_to_f32
                .entry VecAdd (
                    .param .u64 __cudaparm_VecAdd_A,
                    .param .u64 __cudaparm_VecAdd_B,
                    .param .u64 __cudaparm_VecAdd_C,
                    .param .s32 __cudaparm_VecAdd_N)
                {
                .reg .u16 %rh<4>;
                .reg .u32 %r<6>;
                .reg .u64 %rd<10>;
                .reg .f32 %f<5>;
                .reg .pred %p<3>;
                .loc    15  27  0
            $LDWbegin_VecAdd:
                mov.u16     %rh1, %ctaid.x;
                mov.u16     %rh2, %ntid.x;
                mul.wide.u16    %r1, %rh1, %rh2;
                cvt.u32.u16     %r2, %tid.x;
                add.u32     %r3, %r2, %r1;
                ld.param.s32    %r4, [__cudaparm_VecAdd_N];
                setp.le.s32     %p1, %r4, %r3;
                @%p1 bra    $Lt_0_1026;
                .loc    15  31  0
                cvt.s64.s32     %rd1, %r3;
                mul.wide.s32    %rd2, %r3, 4;
                ld.param.u64    %rd3, [__cudaparm_VecAdd_A];
                add.u64     %rd4, %rd3, %rd2;
                ld.global.f32   %f1, [%rd4+0];
                ld.param.u64    %rd5, [__cudaparm_VecAdd_B];
                add.u64     %rd6, %rd5, %rd2;
                ld.global.f32   %f2, [%rd6+0];
                add.f32     %f3, %f1, %f2;
                ld.param.u64    %rd7, [__cudaparm_VecAdd_C];
                add.u64     %rd8, %rd7, %rd2;
                st.global.f32   [%rd8+0], %f3;
            $Lt_0_1026:
                .loc    15  32  0
                exit;
            $LDWend_VecAdd:
                } // VecAdd";

            System.Text.ASCIIEncoding encoder = new System.Text.ASCIIEncoding();
            byte[] ptxbytes = encoder.GetBytes(ptx);

            //load kernel and JIT compile it with given options 
            CudaKernel vectorAddKernel = ctx.LoadKernelPTX(ptxbytes, "VecAdd", ops, vals);
Mar 31, 2013 at 6:35 PM
Well, one could complain to Nvidia about that, but to get things running you only need to append a "new line" at the end of your string:
    .loc    15  32  0
    exit;
$LDWend_VecAdd:
    } // VecAdd
";
Marked as answer by Darren996 on 4/4/2014 at 4:09 AM
Mar 31, 2013 at 7:14 PM
Haa haa haa! Wow! Thanks!