Measuring CUDA execution time

Apr 17, 2013 at 10:34 AM
HI!

I wrote simple CUDA kernel which calculates something on several float vectors.

Using managedCuda I can measure kernel execution time - it is returned from Run() method.

But can I somehow measure time spent on transfering input data from host to GPU and result s from GPU back to host?

Thanks in advance!
Coordinator
Apr 17, 2013 at 12:55 PM
Hi,

have a look at CudaStopWatch. CudaStopWatch uses the GPU timer and cuEvents to measure Time: simply call its Start() method before your start copying and Stop() afterwards. Then you can get elapsed time using GetElapsedTime() or GetElapsedTimeNoSync() if you do a manual sync on the stop event.

Michael
Apr 18, 2013 at 8:09 AM
Hi!

Thanks for your quick reply.

I used CudaStopWatch and CopyToDevice() and CopyToHost() methods:
timer.Start();
AB.CopyToDevice(hostAB);
timer.Stop();
//...
timer.Start();
results.CopyToHost(hostResults);
timer.Stop();
It seems to work fine, but I`ve noticed that transfer results back to host takes much more time then sending input data, even if the size of results is smaller then the size of input data.

Do you have similar experiences?

Orzeh
Coordinator
Apr 18, 2013 at 9:35 PM
Hi,

what do you mean by "much more time"? Can you give some numbers, also the size of your data that you copy? Usually there's a lot of perturbation in measured times, especially if you use default WDM drivers on Windows (vs. TCC or Linux drivers), why you should also average several measurements.

On my PC, copy to host is more or less the same as copy to device...
  • Michael