SparseMatrix and C#

Feb 10, 2013 at 7:54 PM
Hi all.
I'm pretty new to managedCUDA. I want to solve a linear sparse equation system but was not able to find ANY example even on the web!
can you please give a C# code example with a random sparse matrix (just want to see how should define matrix for you library).

Regards
Coordinator
Feb 11, 2013 at 11:42 AM
Hi,

to use cusparse library from CUDA with C# you have two options: either you use the same API as described in cusparse manual (You can find the entire API as members of CudaSparseNativeMethods), or you use the managedCuda wrapped API (a bit more object oriented and bit more like C#) as the following:
using ManagedCuda;
using ManagedCuda.BasicTypes;
using ManagedCuda.CudaSparse;

(...)

//Create a CudaSparseContext
CudaSparseContext ctx = new CudaSparseContext();
ctx.SetPointerMode(cusparsePointerMode.Host);

int m = 10; //matrix of size 10x10
int nnz = 15; //15 non zero elements

//Matrix descriptor for A
CudaSparseMatrixDescriptor descriptor = new CudaSparseMatrixDescriptor(cusparseMatrixType.Triangular, cusparseFillMode.Upper, cusparseDiagType.NonUnit, cusparseIndexBase.Zero);


//Matrix:
//    |0 1 2 3 4 5 6 7 8 9
//----|-------------------
//  0 |1 0 2 0 3 0 4 0 0 0
//  1 |0 5 0 0 6 0 0 0 0 0
//  2 |0 0 7 0 0 0 0 0 0 0
//  3 |0 0 0 8 0 0 0 0 0 0
//  4 |0 0 0 0 9 0 0 0 0 0
//  5 |0 0 0 0 0 8 0 0 0 0
//  6 |0 0 0 0 0 0 7 0 0 0
//  7 |0 0 0 0 0 0 0 6 0 0
//  8 |0 0 0 0 0 0 0 0 5 4
//  9 |0 0 0 0 0 0 0 0 0 3

//Non zero elements values of A
CudaDeviceVariable<float> csrValA = new float[] {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f, 8.0f, 7.0f, 6.0f, 5.0f, 4.0f, 3.0f};
CudaDeviceVariable<int> csrColIndA = new CudaDeviceVariable<int>(nnz);
CudaDeviceVariable<int> csrRowPtrA = new CudaDeviceVariable<int>(m + 1);

//Column index of each element
int[] csrColIndHost = new int[] { 0, 2, 4, 6, 1, 4, 2, 3, 4, 5, 6, 7, 8, 9, 9 };
//Index in column vector where a new row starts
int[] csrRowPtrHost = new int[] { 0, 4, 6, 7, 8, 9, 10, 11, 12, 14, nnz};

csrColIndA.CopyToDevice(csrColIndHost);
csrRowPtrA.CopyToDevice(csrRowPtrHost);

CudaDeviceVariable<float> X = new CudaDeviceVariable<float>(m * m);
float[] x_host = new float[m * m];
for (int i = 0; i < m; i++)
{
    x_host[i * m + i] = 1.0f;
}
X.CopyToDevice(x_host);

//Result matrix
CudaDeviceVariable<float> Y = new CudaDeviceVariable<float>(m * m);

CudaSparseSolveAnalysisInfo info = new CudaSparseSolveAnalysisInfo();
ctx.CsrsmAnalysis(cusparseOperation.NonTranspose, m, descriptor, csrValA, csrRowPtrA, csrColIndA, info);

ctx.CsrsmSolve(cusparseOperation.NonTranspose, m, m, 1.0f, descriptor, csrValA, csrRowPtrA, csrColIndA, info, X, m, Y, m);

float[] result = Y;
Whereas you can still find the detailed function descriptions in the original manual. If you have problems to understand the dense matrix format itself, google for "CSR" and all the other formats used in cusparse. Every dense matrix library uses the same formats and there are a lot of good descriptions available ;-)

Best
Michael
Feb 11, 2013 at 12:52 PM
Hi Michael,
Thanks for you explanation.
I want to solve sparse EQ system to check the performance on huge (or giant :) ) matrices.
  1. Is there any way to do not create and pass an array n*m to API? for example for a 5000x5000 matrix, an array with length 25 M should be created that will need extra memory and so on.
  2. I didn't find out how you created the 'csrRowPtrHost' vector. Can you please explain more? I also saw the cusparse manual (at http://docs.nvidia.com/cuda/pdf/CUDA_CUSPARSE_Users_Guide.pdf) but didn't find anything about this indexing.
Regards
Coordinator
Feb 11, 2013 at 2:06 PM
Hi,

to 1) There's no sparse matrix lib solver I know able to do that, but I usually don't work with sparse matrices and don't want to tell they don't exist ;-), for cusparse you will have to provide storage for a n*m matrix.
to 2) Have a look at http://netlib.org/linalg/html_templates/node91.html#SECTION00931100000000000000, CSR is the same as CRS and be aware that indices can start at 0 or 1, depending on definition, cusparse can handle both.

Best
Feb 25, 2013 at 11:09 PM
Edited Feb 25, 2013 at 11:11 PM
Hi, i'm doing my end of studies work about sparse matrices operations using GPGPU, and i'm using managed CUDA.
Thank you very much for the example, i'm trying to do the same but using complex number matrices. I have a problem witch calling the analisis and solving methods. I use this:

CudaSparseNativeMethods.cusparseZcsrsm_analysis(cusparseOperation.NonTranspose, m, descriptor, csrValA, csrRowPtrA, csrColIndA, info);

CudaSparseNativeMethods.cusparseZcsrsm_solve(cusparseOperation.NonTranspose, m, m, uno , descriptor, csrValA, csrRowPtrA, csrColIndA, info, B, m, X, m);

because i try to use ctx.Zcsrsm_solve and doesn't work. And with the above lines i have an error of arguments.

what member functions may i use for calling complex numbers analysys and solving?
Can you help me?. Thanks
Feb 26, 2013 at 5:09 PM
Edited Feb 26, 2013 at 5:20 PM
Hi again,

Jeje, sorry!!. Reading the source code of cudasparsecontext.cs i found taht you renamed the functions to CsrsmAnalysis and CsrsmSolve. And now it works

Thanks for your great work!!

Best.
Abraham
Coordinator
Feb 26, 2013 at 7:55 PM
Hi,
well you gave yourself an answer faster than I could :-)
If you use the CudaSparseNativeMethods members, you need to initiate a context as it is done in CudaSparseContext constructor; the cusparse manual from NVIDIA describes all this in detail. Using CudaSparseContext class on the other hand omits all these init calls to you, as this is all done internally.

Best
Michael
Feb 28, 2013 at 2:17 AM
Hi,

I wrote my program and it runs ok, but i have a problem:
Solving Ax=b systems cudasparse is slower than other "no-gpu" algoritms(for example QRSolve).
I'm doing something wrong? is there any way to perform the operations?

Best
Abraham

Here is my code( sorry coments in spanish ;)
public static cuFloatComplex[] solvecuda(int m, int n, int nnz, cuFloatComplex[] Ahost, cuFloatComplex[] Bhost)
{

            CudaSparseContext ctx = new CudaSparseContext(); // creamos contexto cuda managed(hace implicitamente todo)
            ctx.SetPointerMode(cusparsePointerMode.Host); // definimos el puntero a host
            CudaSparseMatrixDescriptor descriptor = new CudaSparseMatrixDescriptor();// descriptor de matriz de tipo general
            CudaDeviceVariable<cuFloatComplex> A = new CudaDeviceVariable<cuFloatComplex>(m); /// Creamos la matriz DENSA A en el DEVICE
            A.CopyToDevice(Ahost); /// Copiamos A al DEVICE
            CudaDeviceVariable<cuFloatComplex> csrValA = new CudaDeviceVariable<cuFloatComplex>(nnz); /// Creamos la matriz DENSA A en el DEVICE
            CudaDeviceVariable<int> csrColIndA = new CudaDeviceVariable<int>(nnz);  // creamos array de indices de columnas en DEVICE
            CudaDeviceVariable<int> csrRowPtrA = new CudaDeviceVariable<int>(n + 1); // creamos array de filas en DEVICE

            // PRIMERO Transformacion de DENSA A CSR de A en DEVICE
            // Calculamos nnzPerRow

            CudaDeviceVariable<int> nnzPerRowCol = new CudaDeviceVariable<int>(n);
            int nnzTotalDevHostPtr = 0;
            ctx.Nnz(cusparseDirection.Column, n, n, descriptor, A, n, nnzPerRowCol, ref nnzTotalDevHostPtr);

            // Pasamos a CSR
            ctx.Dense2csr(n, n, descriptor, A, n, nnzPerRowCol, csrValA, csrRowPtrA, csrColIndA);
            // Ahora tenemos:
            // csrValA,csrRowPtrA y csrColIndA que es la matriz a en formato CSR para poder operar

            /// SEGUNDO  asignacion de B y X

            //Vector Resultado X

            CudaDeviceVariable<cuFloatComplex> X = new CudaDeviceVariable<cuFloatComplex>(n);

            //Vector RIGHT-HAND B en formato DENSE

            CudaDeviceVariable<cuFloatComplex> B = new CudaDeviceVariable<cuFloatComplex>(n);
            B.CopyToDevice(Bhost); /// Copiamos B a DEVICE

            /// Llamamos a analisis y solve

            // Analisis de la matriz A
            CudaSparseSolveAnalysisInfo info = new CudaSparseSolveAnalysisInfo();

            ctx.CsrsvAnalysis(cusparseOperation.NonTranspose, n, descriptor, csrValA, csrRowPtrA, csrColIndA, info);// Analizamos
            // Resolver el sistema Ax=B
            CudaDeviceVariable<cuFloatComplex> uno = new cuFloatComplex(1, 1); // escalar (1,1) para hacer neutro el RIGHT-SIDE de AX=B
            ctx.CsrsvSolve(cusparseOperation.NonTranspose, n, uno, descriptor, csrValA, csrRowPtrA, csrColIndA, info, B, X);//Resolvemos
            // convertimos Cintensidades de cuFloatComplex[] a AP.Complex[] 

            cuFloatComplex[] result=new  cuFloatComplex[n];
            X.CopyToHost(result);
            

            /// Limpiamos los recursos capturados
            /// 
            // 
            if (csrValA != null)
                csrValA.Dispose();
            if (csrRowPtrA != null)
                csrRowPtrA.Dispose();
            if (csrColIndA != null)
                csrColIndA.Dispose();
            if (nnzPerRowCol != null)
                nnzPerRowCol.Dispose();
            if (uno != null)
                uno.Dispose();
            if (info != null)
                info.Dispose();
            if (A != null)
                A.Dispose();
            if (X != null)
                X.Dispose();
            if (B != null)
                B.Dispose();
            if (descriptor != null)
                descriptor.Dispose();
            if (ctx != null)
                ctx.Dispose();
}
Coordinator
Feb 28, 2013 at 12:22 PM
Hi,
well I don't know what you compare, but be aware that only the call to ctx.CsrsvSolve(...) does actually solve your equation system. All the rest is only needed once for initialisation and should not be taken into account for performance measurements or be called if you have to solve several systems of equations. A context is needed only once, and the analysis method ctx.CsrsvAnalysis(...) is only needed if dimensions change. Also you don't mention what hardware you're using: Intel Core I7 Extreme Edition vs. Geforce 2xx or a K20x vs. Pentium 3 mobile?
Also the first call to cusparse library will be very slow as .net needs to load the whole DLL-file, which is quiet large, so don't include this into your time measurements, too.

Best,
Michael
Mar 4, 2013 at 9:38 PM
Edited Mar 4, 2013 at 9:41 PM
Hi,

First of all thank you for your indications.
My problem is that i must solve several systems of equations and as you say i must define once the context, analysys method and etc.. and i defined those on every call to my function.

I'm tryng to do this but i dont know how . I tryed to create a public static class with the context,analysys, etc.. but later i saw in the reference that this is incorrect.
Can you help me with this? how can i create the context and access to it in other point of my program?
Thanks for all

Best,
Abraham