USERS_GUIDE.md 28.9 KB
Newer Older
Andreas Marek's avatar
Andreas Marek committed
1
## Users guide for the *ELPA* library ##
Andreas Marek's avatar
Andreas Marek committed
2 3 4 5 6 7 8

This document provides the guide for using the *ELPA* library in user applications.

### Online and local documentation ###

Local documentation (via man pages) should be available (if *ELPA* has been installed with the documentation):

Andreas Marek's avatar
Andreas Marek committed
9 10
For example "man elpa2_print_kernels" should provide the documentation for the *ELPA* program which prints all
the available kernels.
Andreas Marek's avatar
Andreas Marek committed
11

12
Also a [online doxygen documentation] (http://elpa.mpcdf.mpg.de/html/Documentation/ELPA-2017.05.001/html/index.html)
Andreas Marek's avatar
Andreas Marek committed
13 14
for each *ELPA* release is available.

Andreas Marek's avatar
Andreas Marek committed
15 16 17

## API of the *ELPA* library ##

18
With release 2017.05.001.rc1 of the *ELPA* library the interface has been rewritten substantially, in order to have a more
Andreas Marek's avatar
Andreas Marek committed
19
generic interface and to avoid future interface changes.
Andreas Marek's avatar
Andreas Marek committed
20 21

For compatibility reasons the interface defined in the previous release 2016.11.001 is also still available
Andreas Marek's avatar
Andreas Marek committed
22 23 24
IF AND ONLY IF *ELPA* has been build with support of this legacy interface.

If you want to use the legacy interface, please look to section "B) Using the legacy API of the *ELPA* library.
Andreas Marek's avatar
Andreas Marek committed
25

26 27
The legacy API defines all the functionallity as it has been defined in *ELPA* release 2016.11.011. Note, however,
that all future features of *ELPA* will only be accessible via the new API defined in release 2017.05.001.rc1 or later.
Andreas Marek's avatar
Andreas Marek committed
28 29 30 31 32

## A) Using the final API definition of the *ELPA* library ##

Using *ELPA* with the latest API is done in the following steps

Andreas Marek's avatar
Andreas Marek committed
33 34
- include elpa headers "elpa/elpa.h" (C-Case) or use the Fortran module "use elpa"

Andreas Marek's avatar
Andreas Marek committed
35
- define a instance of the elpa type
Andreas Marek's avatar
Andreas Marek committed
36

Andreas Marek's avatar
Andreas Marek committed
37
- call elpa_init
Andreas Marek's avatar
Andreas Marek committed
38 39
  note, that at the moment the only supported API version number is 20170403

Andreas Marek's avatar
Andreas Marek committed
40
- call elpa_allocate to allocate an instance of *ELPA*
Andreas Marek's avatar
Andreas Marek committed
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
  note that you can define (and configure individually) as many different instances
  for ELPA as you want, e.g. one for CPU only computations and for larger matrices on GPUs

- use ELPA-type function "set" to set matrix and MPI parameters

- call the ELPA-type function "setup"

- set or get all possible ELPA tunable options with ELPA-type functions get/set

  At the moment the following tunable options are available:

     - "solver" can either be ELPA_SOLVER_1STAGE or ELPA_SOLVER_2STAGE
     - "real_kernel" can be one of the available real kernels (a list of available kernels can be
        queried with the ELPA helper binary elpa2_print_kernels)
     - "complex_kernel" can be one of the available complex kernels (a list of available kernels can be
     - "qr" can be either 0 or 1, switches QR decomposition off/on for ELPA_SOLVER_2STAGE
       only available in real-case for blocksize at least 64
     - "gpu" can be either 0 or 1, switches GPU computations off or on, assuming that the installation
       of the ELPA library has been build with GPU support enables
     - "timings" can be either 0 or 1, switches time measurements off or on
     - "debug" can be either 0 or 1, switches detailed debug messages off/on

Andreas Marek's avatar
Andreas Marek committed
63 64
- call ELPA-type function solve or others

Andreas Marek's avatar
Andreas Marek committed
65 66
  At the moment the following ELPA compute functions are available:

67 68
    - "eigenvectors" solves the eigenvalue problem for single/double real/complex valued matrices and
                     returns the eigenvalues AND eigenvectors
69 70
    - "eigenvalues" solves the eigenvalue problem for single/double real/complex valued matrices and
                     returns the eigenvalues
Andreas Marek's avatar
Andreas Marek committed
71 72 73 74
    - "hermetian_multipy" computes C = A^T * B (real) or C = A^H * B (complex) for single/double
      real/complex matrices
    - "cholesky" does a cholesky factorization for a single/double real/complex matrix
    - "invert_triangular" inverts a single/double real/complex triangular matrix
Andreas Marek's avatar
Andreas Marek committed
75
    - "solve_tridiagonal" solves the single/double eigenvalue problem for a real tridiagonal matrix
Andreas Marek's avatar
Andreas Marek committed
76 77

- if the ELPA object is not needed any more call ELPA-type function destroy
Andreas Marek's avatar
Andreas Marek committed
78

Andreas Marek's avatar
Andreas Marek committed
79
- call elpa_uninit at the end of the program
Andreas Marek's avatar
Andreas Marek committed
80 81 82 83 84

## B) Using the legacy API of the *ELPA* library ##

The following description describes the usage of the *ELPA* library with the legacy interface.

Andreas Marek's avatar
Andreas Marek committed
85
### General concept of the *ELPA* library ###
Andreas Marek's avatar
Andreas Marek committed
86

Andreas Marek's avatar
Andreas Marek committed
87 88 89 90
The *ELPA* library consists of two main parts:
- *ELPA 1stage* solver
- *ELPA 2stage* solver

91
Both variants of the *ELPA* solvers are available for real or complex singe and double precision valued matrices.
Andreas Marek's avatar
Andreas Marek committed
92

Andreas Marek's avatar
Andreas Marek committed
93
Thus *ELPA* provides the following user functions (see man pages or [online] (http://elpa.mpcdf.mpg.de/html/Documentation/ELPA-2016.11.001/html/index.html) for details):
Andreas Marek's avatar
Andreas Marek committed
94

95 96 97 98 99
- elpa_get_communicators                        : set the row / column communicators for *ELPA*
- elpa_solve_evp_complex_1stage_{single|double} : solve a {single|double} precision complex eigenvalue proplem with the *ELPA 1stage* solver
- elpa_solve_evp_real_1stage_{single|double}    : solve a {single|double} precision real eigenvalue proplem with the *ELPA 1stage* solver
- elpa_solve_evp_complex_2stage_{single|double} : solve a {single|double} precision complex eigenvalue proplem with the *ELPA 2stage* solver
- elpa_solve_evp_real_2stage_{single|double}    : solve a {single|double} precision real eigenvalue proplem with the *ELPA 2stage* solver
Andreas Marek's avatar
Andreas Marek committed
100 101
- elpa_solve_evp_real_{single|double}           : driver for the {single|double} precision real *ELPA 1stage* or *ELPA 2stage* solver
- elpa_solve_evp_complex_{single|double}        : driver for the {single|double} precision complex *ELPA 1stage* or *ELPA 2stage* solver
102 103 104



Andreas Marek's avatar
Andreas Marek committed
105
Furthermore *ELPA* provides the utility binary "elpa2_print_available_kernels": it tells the user
Andreas Marek's avatar
Andreas Marek committed
106 107 108 109 110 111 112 113 114
which *ELPA 2stage* compute kernels have been installed and which default kernels are set

If you want to solve an eigenvalue problem with *ELPA*, you have to decide whether you
want to use *ELPA 1stage* or *ELPA 2stage* solver. Normally, *ELPA 2stage* is the better
choice since it is faster, but there a matrix dimensions where *ELPA 1stage* is supperior.

Independent of the choice of the solver, the concept of calling *ELPA* is always the same:

#### MPI version of *ELPA* ####
Andreas Marek's avatar
Andreas Marek committed
115 116 117 118 119

In this case, *ELPA* relies on a BLACS distributed matrix.
To solve a Eigenvalue problem of this matrix with *ELPA*, one has

1. to include the *ELPA* header (C case) or module (Fortran)
120 121
2. to create row and column MPI communicators for ELPA (with "elpa_get_communicators")
3. to call to the *ELPA driver* or directly call *ELPA 1stage* or *ELPA 2stage* for the matrix.
Andreas Marek's avatar
Andreas Marek committed
122

Andreas Marek's avatar
Andreas Marek committed
123
Here is a very simple MPI code snippet for using *ELPA 1stage*: For the definition of all variables
Andreas Marek's avatar
Andreas Marek committed
124 125
please have a look at the man pages and/or the online documentation (see above). A full version
of a simple example program can be found in ./test_project/src.
Andreas Marek's avatar
Andreas Marek committed
126 127 128 129 130


   ! All ELPA routines need MPI communicators for communicating within
   ! rows or columns of processes, these are set in get_elpa_communicators

Andreas Marek's avatar
Andreas Marek committed
131
   success = elpa_get_communicators(mpi_comm_world, my_prow, my_pcol, &
Andreas Marek's avatar
Andreas Marek committed
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151
                                    mpi_comm_rows, mpi_comm_cols)

   if (myid==0) then
     print '(a)','| Past split communicator setup for rows and columns.'
   end if

   ! Determine the necessary size of the distributed matrices,
   ! we use the Scalapack tools routine NUMROC for that.

   na_rows = numroc(na, nblk, my_prow, 0, np_rows)
   na_cols = numroc(na, nblk, my_pcol, 0, np_cols)

   !-------------------------------------------------------------------------------
   ! Calculate eigenvalues/eigenvectors

   if (myid==0) then
     print '(a)','| Entering one-step ELPA solver ... '
     print *
   end if

152
   success = elpa_solve_evp_real_1stage_{single|double} (na, nev, a, na_rows, ev, z, na_rows, nblk, &
Andreas Marek's avatar
Andreas Marek committed
153
                                   matrixCols, mpi_comm_rows, mpi_comm_cols)
Andreas Marek's avatar
Andreas Marek committed
154 155 156 157 158 159 160

   if (myid==0) then
     print '(a)','| One-step ELPA solver complete.'
     print *
   end if


Andreas Marek's avatar
Andreas Marek committed
161
#### Shared-memory version of *ELPA* ####
Andreas Marek's avatar
Andreas Marek committed
162

Andreas Marek's avatar
Andreas Marek committed
163
If the *ELPA* library has been compiled with the configure option "--with-mpi=0",
Andreas Marek's avatar
Andreas Marek committed
164 165 166 167
no MPI will be used.

Still the **same** call sequence as in the MPI case can be used (see above).

Andreas Marek's avatar
Andreas Marek committed
168
#### Setting the row and column communicators ####
Andreas Marek's avatar
Andreas Marek committed
169

Andreas Marek's avatar
Andreas Marek committed
170 171 172
SYNOPSIS
   FORTRAN INTERFACE
       use elpa1
173

174
       success = elpa_get_communicators (mpi_comm_global, my_prow, my_pcol, mpi_comm_rows, mpi_comm_cols)
175

Andreas Marek's avatar
Andreas Marek committed
176 177 178 179 180 181 182 183 184 185 186
       integer, intent(in)   mpi_comm_global:  global communicator for the calculation
       integer, intent(in)   my_prow:          row coordinate of the calling process in the process grid
       integer, intent(in)   my_pcol:          column coordinate of the calling process in the process grid
       integer, intent(out)  mpi_comm_row:     communicator for communication within rows of processes
       integer, intent(out)  mpi_comm_row:     communicator for communication within columns of processes

       integer               success:          return value indicating success or failure of the underlying MPI_COMM_SPLIT function

   C INTERFACE
       #include "elpa_generated.h"

187
       success = elpa_get_communicators (int mpi_comm_world, int my_prow, my_pcol, int *mpi_comm_rows, int *Pmpi_comm_cols);
Andreas Marek's avatar
Andreas Marek committed
188 189 190 191 192 193 194 195 196 197 198 199

       int mpi_comm_global:  global communicator for the calculation
       int my_prow:          row coordinate of the calling process in the process grid
       int my_pcol:          column coordinate of the calling process in the process grid
       int *mpi_comm_row:    pointer to the communicator for communication within rows of processes
       int *mpi_comm_row:    pointer to the communicator for communication within columns of processes

       int  success:         return value indicating success or failure of the underlying MPI_COMM_SPLIT function


#### Using *ELPA 1stage* ####

200
After setting up the *ELPA* row and column communicators (by calling elpa_get_communicators),
Andreas Marek's avatar
Andreas Marek committed
201 202 203 204 205
only the real or complex valued solver has to be called:

SYNOPSIS
   FORTRAN INTERFACE
       use elpa1
206
       success = elpa_solve_evp_real_1stage_{single|double} (na, nev, a(lda,matrixCols), ev(nev), q(ldq, matrixCols), ldq, nblk, matrixCols, mpi_comm_rows,
Andreas Marek's avatar
Andreas Marek committed
207 208 209 210 211 212
       mpi_comm_cols)

       With the definintions of the input and output variables:

       integer, intent(in)    na:            global dimension of quadratic matrix a to solve
       integer, intent(in)    nev:           number of eigenvalues to be computed; the first nev eigenvalules are calculated
213
       real*{4|8},  intent(inout) a:         locally distributed part of the matrix a. The local dimensions are lda x matrixCols
Andreas Marek's avatar
Andreas Marek committed
214
       integer, intent(in)    lda:           leading dimension of locally distributed matrix a
215 216
       real*{4|8},  intent(inout) ev:        on output the first nev computed eigenvalues
       real*{4|8},  intent(inout) q:         on output the first nev computed eigenvectors
Andreas Marek's avatar
Andreas Marek committed
217 218 219 220 221 222 223 224 225 226 227
       integer, intent(in)    ldq:           leading dimension of matrix q which stores the eigenvectors
       integer, intent(in)    nblk:          blocksize of block cyclic distributin, must be the same in both directions
       integer, intent(in)    matrixCols:    number of columns of locally distributed matrices a and q
       integer, intent(in)    mpi_comm_rows: communicator for communication in rows. Constructed with get_elpa_communicators(3)
       integer, intent(in)    mpi_comm_cols: communicator for communication in colums. Constructed with get_elpa_communicators(3)

       logical                success:       return value indicating success or failure

   C INTERFACE
       #include "elpa.h"

228
       success = elpa_solve_evp_real_1stage_{single|double} (int na, int nev,  double *a, int lda,  double *ev, double *q, int ldq, int nblk, int matrixCols, int
Andreas Marek's avatar
Andreas Marek committed
229 230 231 232 233 234
       mpi_comm_rows, int mpi_comm_cols);

       With the definintions of the input and output variables:

       int     na:            global dimension of quadratic matrix a to solve
       int     nev:           number of eigenvalues to be computed; the first nev eigenvalules are calculated
235
       {float|double} *a:     pointer to locally distributed part of the matrix a. The local dimensions are lda x matrixCols
Andreas Marek's avatar
Andreas Marek committed
236
       int     lda:           leading dimension of locally distributed matrix a
237 238
       {float|double} *ev:    pointer to memory containing on output the first nev computed eigenvalues
       {float|double} *q:     pointer to memory containing on output the first nev computed eigenvectors
Andreas Marek's avatar
Andreas Marek committed
239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254
       int     ldq:           leading dimension of matrix q which stores the eigenvectors
       int     nblk:          blocksize of block cyclic distributin, must be the same in both directions
       int     matrixCols:    number of columns of locally distributed matrices a and q
       int     mpi_comm_rows: communicator for communication in rows. Constructed with get_elpa_communicators(3)
       int     mpi_comm_cols: communicator for communication in colums. Constructed with get_elpa_communicators(3)

       int     success:       return value indicating success (1) or failure (0)

DESCRIPTION
       Solve the real eigenvalue problem with the 1-stage solver. The ELPA communicators mpi_comm_rows and mpi_comm_cols are obtained with the
       get_elpa_communicators(3) function. The distributed quadratic marix a has global dimensions na x na, and a local size lda x matrixCols.
       The solver will compute the first nev eigenvalues, which will be stored on exit in ev. The eigenvectors corresponding to the eigenvalues
       will be stored in q. All memory of the arguments must be allocated outside the call to the solver.

   FORTRAN INTERFACE
       use elpa1
255
       success = elpa_solve_evp_complex_1stage_{single|double} (na, nev, a(lda,matrixCols), ev(nev), q(ldq, matrixCols), ldq, nblk, matrixCols, mpi_comm_rows,
Andreas Marek's avatar
Andreas Marek committed
256 257 258 259 260 261
       mpi_comm_cols)

       With the definintions of the input and output variables:

       integer,     intent(in)    na:            global dimension of quadratic matrix a to solve
       integer,     intent(in)    nev:           number of eigenvalues to be computed; the first nev eigenvalules are calculated
262
       complex*{8|16},  intent(inout) a:         locally distributed part of the matrix a. The local dimensions are lda x matrixCols
Andreas Marek's avatar
Andreas Marek committed
263
       integer,     intent(in)    lda:           leading dimension of locally distributed matrix a
264 265
       real*{4|8},      intent(inout) ev:        on output the first nev computed eigenvalues
       complex*{8|16},  intent(inout) q:         on output the first nev computed eigenvectors
Andreas Marek's avatar
Andreas Marek committed
266 267 268 269 270 271 272 273 274 275 276 277
       integer,     intent(in)    ldq:           leading dimension of matrix q which stores the eigenvectors
       integer,     intent(in)    nblk:          blocksize of block cyclic distributin, must be the same in both directions
       integer,     intent(in)    matrixCols:    number of columns of locally distributed matrices a and q
       integer,     intent(in)    mpi_comm_rows: communicator for communication in rows. Constructed with get_elpa_communicators(3)
       integer, intent(in)        mpi_comm_cols: communicator for communication in colums. Constructed with get_elpa_communicators(3)

       logical                    success:       return value indicating success or failure

   C INTERFACE
       #include "elpa.h"
       #include <complex.h>

278
       success = elpa_solve_evp_complex_1stage_{single|double} (int na, int nev,  double complex *a, int lda,  double *ev, double complex*q, int ldq, int nblk, int
Andreas Marek's avatar
Andreas Marek committed
279 280 281 282 283 284
       matrixCols, int mpi_comm_rows, int mpi_comm_cols);

       With the definintions of the input and output variables:

       int             na:            global dimension of quadratic matrix a to solve
       int             nev:           number of eigenvalues to be computed; the first nev eigenvalules are calculated
285
       {float|double} complex *a:     pointer to locally distributed part of the matrix a. The local dimensions are lda x matrixCols
Andreas Marek's avatar
Andreas Marek committed
286
       int             lda:           leading dimension of locally distributed matrix a
287 288
       {float|double}         *ev:    pointer to memory containing on output the first nev computed eigenvalues
       {float|double} complex *q:     pointer to memory containing on output the first nev computed eigenvectors
Andreas Marek's avatar
Andreas Marek committed
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326
       int             ldq:           leading dimension of matrix q which stores the eigenvectors
       int             nblk:          blocksize of block cyclic distributin, must be the same in both directions
       int             matrixCols:    number of columns of locally distributed matrices a and q
       int             mpi_comm_rows: communicator for communication in rows. Constructed with get_elpa_communicators(3)
       int             mpi_comm_cols: communicator for communication in colums. Constructed with get_elpa_communicators(3)

       int             success:       return value indicating success (1) or failure (0)

DESCRIPTION
       Solve the complex eigenvalue problem with the 1-stage solver. The ELPA communicators mpi_comm_rows and mpi_comm_cols are obtained with the
       get_elpa_communicators(3) function. The distributed quadratic marix a has global dimensions na x na, and a local size lda x matrixCols.
       The solver will compute the first nev eigenvalues, which will be stored on exit in ev. The eigenvectors corresponding to the eigenvalues
       will be stored in q. All memory of the arguments must be allocated outside the call to the solver.


The *ELPA 1stage* solver, does not need or accept any other parameters than in the above
specification.

#### Using *ELPA 2stage* ####

The *ELPA 2stage* solver can be used in the same manner, as the *ELPA 1stage* solver.
However, the 2 stage solver, can be used with different compute kernels, which offers
more possibilities for configuration.

It is recommended to first call the utillity program

elpa2_print_kernels

which will tell all the compute kernels that can be used with *ELPA 2stage*". It will
also give information, whether a kernel can be set via environment variables.

##### Using the default kernels #####

If no kernel is set either via an environment variable or the *ELPA 2stage API* then
the default kernels will be set.

##### Setting the *ELPA 2stage* compute kernels #####

327 328
##### Setting the *ELPA 2stage* compute kernels with environment variables#####

Andreas Marek's avatar
Andreas Marek committed
329 330 331 332
If the *ELPA* installation allows setting ther compute kernels with enviroment variables,
setting the variables "REAL_ELPA_KERNEL" and "COMPLEX_ELPA_KERNEL" will set the compute
kernels. The environment variable setting will take precedence over all other settings!

333 334 335 336 337 338
The utility program "elpa2_print_kernels" can list which kernels are available and which
would be choosen. This reflects, as well the setting of the default kernel or the settings
with the environment variables

##### Setting the *ELPA 2stage* compute kernels with API calls#####

Andreas Marek's avatar
Andreas Marek committed
339 340
It is also possible to set the *ELPA 2stage* compute kernels via the API.

341 342
As an example the API for ELPA real double-precision 2stage is shown:

Andreas Marek's avatar
Andreas Marek committed
343 344
SYNOPSIS
   FORTRAN INTERFACE
Andreas Marek's avatar
Andreas Marek committed
345 346
       use elpa1
       use elpa2
347 348
       success = elpa_solve_evp_real_2stage_double (na, nev, a(lda,matrixCols), ev(nev), q(ldq, matrixCols), ldq, nblk, matrixCols, mpi_comm_rows,
       mpi_comm_cols, mpi_comm_all, THIS_REAL_ELPA_KERNEL, useQR, useGPU)
Andreas Marek's avatar
Andreas Marek committed
349 350 351 352 353

       With the definintions of the input and output variables:

       integer, intent(in)            na:            global dimension of quadratic matrix a to solve
       integer, intent(in)            nev:           number of eigenvalues to be computed; the first nev eigenvalules are calculated
354
       real*{4|8},  intent(inout)         a:         locally distributed part of the matrix a. The local dimensions are lda x matrixCols
Andreas Marek's avatar
Andreas Marek committed
355
       integer, intent(in)            lda:           leading dimension of locally distributed matrix a
356 357
       real*{4|8},  intent(inout)         ev:        on output the first nev computed eigenvalues
       real*{4|8},  intent(inout)         q:         on output the first nev computed eigenvectors
Andreas Marek's avatar
Andreas Marek committed
358 359 360 361 362 363 364
       integer, intent(in)            ldq:           leading dimension of matrix q which stores the eigenvectors
       integer, intent(in)            nblk:          blocksize of block cyclic distributin, must be the same in both directions
       integer, intent(in)            matrixCols:    number of columns of locally distributed matrices a and q
       integer, intent(in)            mpi_comm_rows: communicator for communication in rows. Constructed with get_elpa_communicators(3)
       integer, intent(in)            mpi_comm_cols: communicator for communication in colums. Constructed with get_elpa_communicators(3)
       integer, intent(in)            mpi_comm_all:  communicator for all processes in the processor set involved in ELPA
       logical, intent(in), optional: useQR:         optional argument; switches to QR-decomposition if set to .true.
365
       logical, intent(in), optional: useGPU:        decide whether GPUs should be used ore not
Andreas Marek's avatar
Andreas Marek committed
366 367 368 369 370 371

      logical                        success:       return value indicating success or failure

   C INTERFACE
       #include "elpa.h"

372 373
       success = elpa_solve_evp_real_2stage_double (int na, int nev,  double *a, int lda,  double *ev, double *q, int ldq, int nblk, int matrixCols, int
       mpi_comm_rows, int mpi_comm_cols, int mpi_comm_all, int THIS_ELPA_REAL_KERNEL, int useQR, int useGPU);
Andreas Marek's avatar
Andreas Marek committed
374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389

       With the definintions of the input and output variables:

       int     na:            global dimension of quadratic matrix a to solve
       int     nev:           number of eigenvalues to be computed; the first nev eigenvalules are calculated
       double *a:             pointer to locally distributed part of the matrix a. The local dimensions are lda x matrixCols
       int     lda:           leading dimension of locally distributed matrix a
       double *ev:            pointer to memory containing on output the first nev computed eigenvalues
       double *q:             pointer to memory containing on output the first nev computed eigenvectors
       int     ldq:           leading dimension of matrix q which stores the eigenvectors
       int     nblk:          blocksize of block cyclic distributin, must be the same in both directions
       int     matrixCols:    number of columns of locally distributed matrices a and q
       int     mpi_comm_rows: communicator for communication in rows. Constructed with get_elpa_communicators(3)
       int     mpi_comm_cols: communicator for communication in colums. Constructed with get_elpa_communicators(3)
       int     mpi_comm_all:  communicator for all processes in the processor set involved in ELPA
       int     useQR:         if set to 1 switch to QR-decomposition
390
       int     useGPU:        decide whether the GPU version should be used or not
Andreas Marek's avatar
Andreas Marek committed
391 392 393 394 395 396 397 398 399 400

       int     success:       return value indicating success (1) or failure (0)


DESCRIPTION
       Solve the real eigenvalue problem with the 2-stage solver. The ELPA communicators mpi_comm_rows and mpi_comm_cols are obtained with the
       get_elpa_communicators(3) function. The distributed quadratic marix a has global dimensions na x na, and a local size lda x matrixCols.
       The solver will compute the first nev eigenvalues, which will be stored on exit in ev. The eigenvectors corresponding to the eigenvalues
       will be stored in q. All memory of the arguments must be allocated outside the call to the solver.

401 402
##### Setting up *ELPA 1stage* or *ELPA 2stage* with the *ELPA driver interface* #####

403 404 405
Since release ELPA 2016.005.004 a driver routine allows to choose more easily which solver (1stage or 2stage) will be used.

As an exmple the real double-precision case is explained:
406 407 408 409 410

 SYNOPSIS

 FORTRAN INTERFACE

Andreas Marek's avatar
Andreas Marek committed
411
  use elpa_driver
412

413
  success = elpa_solve_evp_real_double (na, nev, a(lda,matrixCols), ev(nev), q(ldq, matrixCols), ldq, nblk, matrixCols, mpi_comm_rows, mpi_comm_cols, mpi_comm_all, THIS_REAL_ELPA_KERNEL=THIS_REAL_ELPA_KERNEL, useQR, useGPU, method=method)
414 415 416 417 418 419

  Generalized interface to the ELPA 1stage and 2stage solver for real-valued problems

  With the definintions of the input and output variables:


Andreas Marek's avatar
Andreas Marek committed
420
  integer, intent(in)            na:                    global dimension of quadratic matrix a to solve
421

Andreas Marek's avatar
Andreas Marek committed
422
  integer, intent(in)            nev:                   number of eigenvalues to be computed; the first nev eigenvalules are calculated
423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447

  real*8,  intent(inout)         a:                     locally distributed part of the matrix a. The local dimensions are lda x matrixCols

  integer, intent(in)            lda:                   leading dimension of locally distributed matrix a

  real*8,  intent(inout)         ev:                    on output the first nev computed eigenvalues"

  real*8,  intent(inout)         q:                     on output the first nev computed eigenvectors"

  integer, intent(in)            ldq:                   leading dimension of matrix q which stores the eigenvectors

  integer, intent(in)            nblk:                  blocksize of block cyclic distributin, must be the same in both directions

  integer, intent(in)            matrixCols:            number of columns of locally distributed matrices a and q

  integer, intent(in)            mpi_comm_rows:         communicator for communication in rows. Constructed with elpa_get_communicators

  integer, intent(in)            mpi_comm_cols:         communicator for communication in colums. Constructed with elpa_get_communicators

  integer, intent(in)            mpi_comm_all:          communicator for all processes in the processor set involved in ELPA

  integer, intent(in), optional: THIS_REAL_ELPA_KERNEL: optional argument, choose the compute kernel for 2-stage solver

  logical, intent(in), optional: useQR:                 optional argument; switches to QR-decomposition if set to .true.

448 449
  logical, intent(in), optional: useQPU:                decide whether the GPU version should be used or not

450 451 452 453 454 455 456 457 458
  character(*), optional         method:                use 1stage solver if "1stage", use 2stage solver if "2stage", (at the moment) use 2stage solver if "auto"

  logical                        success:               return value indicating success or failure


 C INTERFACE

 #include "elpa.h"

459
 success = elpa_solve_evp_real_double (int na, int nev, double *a, int lda, double *ev, double *q, int ldq, int nblk, int matrixCols, int mpi_comm_rows, int mpi_comm_cols, int mpi_comm_all, int THIS_ELPA_REAL_KERNEL, int useQR, int useGPU, char *method);"
460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492


 With the definintions of the input and output variables:"


 int     na:                    global dimension of quadratic matrix a to solve

 int     nev:                   number of eigenvalues to be computed; the first nev eigenvalules are calculated

 double *a:                     pointer to locally distributed part of the matrix a. The local dimensions are lda x matrixCols

 int     lda:                   leading dimension of locally distributed matrix a

 double *ev:                    pointer to memory containing on output the first nev computed eigenvalues

 double *q:                     pointer to memory containing on output the first nev computed eigenvectors

 int     ldq:                   leading dimension of matrix q which stores the eigenvectors

 int     nblk:                  blocksize of block cyclic distributin, must be the same in both directions

 int     matrixCols:            number of columns of locally distributed matrices a and q

 int     mpi_comm_rows:         communicator for communication in rows. Constructed with elpa_get_communicators

 int     mpi_comm_cols:         communicator for communication in colums. Constructed with elpa_get_communicators

 int     mpi_comm_all:          communicator for all processes in the processor set involved in ELPA

 int     THIS_ELPA_REAL_KERNEL: choose the compute kernel for 2-stage solver

 int     useQR:                 if set to 1 switch to QR-decomposition

493 494
 int     useGPU:                decide whether the GPU version should be used or not

495 496 497 498 499 500 501
 char   *method:                use 1stage solver if "1stage", use 2stage solver if "2stage", (at the moment) use 2stage solver if "auto"

 int     success:               return value indicating success (1) or failure (0)

 DESCRIPTION
 Solve the real eigenvalue problem. The value of method desides whether the 1stage or 2stage solver is used. The ELPA communicators mpi_comm_rows and mpi_comm_cols are obtained with the elpa_get_communicators function. The distributed quadratic marix a has global dimensions na x na, and a local size lda x matrixCols. The solver will compute the first nev eigenvalues, which will be stored on exit in ev. The eigenvectors corresponding to the eigenvalues will be stored in q. All memory of the arguments must be allocated outside the call to the solver.

502 503 504 505 506 507 508 509 510 511 512 513 514 515
##### Setting up the GPU version of *ELPA* 1 and 2 stage #####

Since release ELPA 2016.011.001.pre *ELPA* offers GPU support, IF *ELPA* has been build with the configure option "--enabble-gpu-support".

At run-time the GPU version can be used by setting the environment variable "ELPA_USE_GPU" to "yes", or by calling the *ELPA* functions
(elpa_solve_evp_real_{double|single}, elpa_solve_evp_real_1stage_{double|single}, elpa_solve_evp_real_2stage_{double|single}) with the
argument "useGPU = .true." or "useGPU = 1" for the Fortran and C case, respectively. Please, not that similiar to the choice of the
*ELPA* 2stage compute kernels, the enviroment variable takes precendence over the setting in the API call.

Further note that it is NOT allowed to define the usage of GPUs AND to EXPLICITLY set an ELPA 2stage compute kernel other than
"REAL_ELPA_KERNEL_GPU" or "COMPLEX_ELPA_KERNEL_GPU".