This project is read-only.

managedCUDA in

Jan 7, 2013 at 8:50 PM


I’m looking at having a play with CUDA using, can you explain how I would implement managedCUDA in and if possible give a simple example of it been used in

e.g. a simple bit of code that could benefit from parallel processing

Module Module1

    Sub Main()
        Dim sWatch As System.Diagnostics.Stopwatch = New System.Diagnostics.Stopwatch()
        Dim sLoopTime As System.Diagnostics.Stopwatch = New System.Diagnostics.Stopwatch()

        Dim m As Long = 0

        For i As Int32 = 0 To 1000000
            For j As Int32 = 0 To 1000000
                m = m + 1

        Console.WriteLine("Done - M:{0}, Time:{1}", m, sWatch.Elapsed())
        Console.WriteLine("Press any key to exit")
    End Sub

End Module


Done - M:1000002000001, Time:00:33:19.7167315
Press any key to exit

Jan 9, 2013 at 2:29 PM


your given sample is not a good starting point for Cuda. If you want to sum up “1” in a loop, every intermediate step depends on previous steps. How do you want to paralyze here every step? I don’t say that it’s not possible, but it’s not a good starting point to learn Cuda.

Instead you should have a look at the provided sample code, especially the vectorAdd sample. What it does is the following: Given two large arrays “a” and “b” with random data, it computes a third array “c” with the sum for each element: a(i) + b(i) = c(i). In VB this would look like:

Dim a(10000) As Single 'float in C#
Dim b(10000) As Single 'float in C#
Dim c(10000) As Single 'float in C#

'Todo: Fill a and b with some data...

For i As Integer = 0 To 10000
    c(i) = a(i) + b(i)

whereas the Cuda implementation can be found in the file

I know that the sample code is given in C# and not VB, but if you know that VB syntax

“Dim ‘varName’ As ‘Type’”

gets to

“‘Type’ ‘varName’; “

in C# and array elements are addressed using “[]” and not “()” like in VB, the code should be quiet understandable. On the other hand, your Kernel code has to be written in Cuda C, so take it as a possibility to familiarize with C like syntax :-)

Hope this helps 


Jan 10, 2013 at 8:35 AM
Edited Jan 10, 2013 at 8:36 AM

Thanks for your reply

If I was going to paralyse my previous example I would break it up into chunks of work

e.g. doing it with a Parallel.For Loop I would do


Module Module1

    Sub Main()

        Dim sWatch As New Stopwatch()
        Dim m As Long = 0
        Dim lReturedVal As Long = 0

        Dim lStart As Int64 = 0
        Dim lChunk As Int64 = 1000000
        Dim iOffset As Int32 = 0

        Dim lStartHolder As New List(Of Int64)
        Dim lEndHolder As New List(Of Int64)


        ' Create data chunks
        For i As Int32 = 0 To 1000000
            lStartHolder.Add(i * lChunk + iOffset)
            lEndHolder.Add(i * lChunk + lChunk)
            iOffset = 1

        Parallel.For(0, lStartHolder.Count(), Sub(i)
                                                  Dim temp As New _add(i)
                                                  lReturedVal = temp.Process(lStartHolder(i), lEndHolder(i))
                                                  m += lReturedVal
                                              End Sub)

        Console.WriteLine("Done - M:{0}, Time:{1}", m, sWatch.Elapsed())
        Console.WriteLine("Press any key to exit")


    End Sub

End Module

Public Class _add
    Private _sName As String = String.Empty

    Public Sub New(ByVal val As String)
        _sName = val
        'Console.WriteLine(String.Format("{0} created!", val))
    End Sub

    Public Function Process(ByVal myStart As Int64, ByVal myStop As Int64) As Int64
        Dim lReturn As Int64 = 0

        ' Console.WriteLine(String.Format("{0}: adding values from {1} to {2}", _sName, myStart, myStop))
        For i As Int64 = myStart To myStop
            lReturn += 1

        'Console.WriteLine(String.Format("{0}: ending value {1}", _sName, lReturn))
        Return lReturn
    End Function
End Class


Now I think I would be able to turn the “_add” class into a CUDA function

So how could I do this with managedCUDA??

I’ve add a reference to ManagedCuda.dll and can


Imports ManagedCuda

but this is where i'm a bit lost any help??

Jan 12, 2013 at 5:18 PM


As I said, start easy. It seems you think of parallel more in the way of multi core CPUs than in the way of data parallelism as Cuda does (The idea of one core per data element…).

As managedCuda is a wrapper around the so called Cuda driver API, the general way to go is the same as described in Cuda programming guide:

Write host code in C#/ and the device kernel part in Cuda C in a separate cu-file. The cu-file is compiled using nvcc to either a hardware version dependent cubin-image or a more general ptx-file.

Then in host code you create a Cuda context for your device, load the Cuda image file and configure launch parameters as grid and block sizes. Allocate device memory and copy input data to device, launch your kernel and copy data back to host. The vectorAdd sample is very basic and shows how to do all these steps. As the code is kept very basic and simple, the fact that it’s written in C# shouldn’t stop you to understand it.