USERS_GUIDE.md 28.8 KB
Newer Older
Andreas Marek's avatar
Andreas Marek committed
1
## Users guide for the *ELPA* library ##
Andreas Marek's avatar
Andreas Marek committed
2 3 4 5 6 7 8

This document provides the guide for using the *ELPA* library in user applications.

### Online and local documentation ###

Local documentation (via man pages) should be available (if *ELPA* has been installed with the documentation):

Andreas Marek's avatar
Andreas Marek committed
9 10
For example "man elpa2_print_kernels" should provide the documentation for the *ELPA* program which prints all
the available kernels.
Andreas Marek's avatar
Andreas Marek committed
11

12
Also a [online doxygen documentation] (http://elpa.mpcdf.mpg.de/html/Documentation/ELPA-2017.11.001/html/index.html)
Andreas Marek's avatar
Andreas Marek committed
13 14
for each *ELPA* release is available.

Andreas Marek's avatar
Andreas Marek committed
15 16 17

## API of the *ELPA* library ##

18
With release 2017.05.001 of the *ELPA* library the interface has been rewritten substantially, in order to have a more generic interface and to avoid future interface changes.
Andreas Marek's avatar
Andreas Marek committed
19 20

For compatibility reasons the interface defined in the previous release 2016.11.001 is also still available
Andreas Marek's avatar
Andreas Marek committed
21 22 23
IF AND ONLY IF *ELPA* has been build with support of this legacy interface.

If you want to use the legacy interface, please look to section "B) Using the legacy API of the *ELPA* library.
Andreas Marek's avatar
Andreas Marek committed
24

25
The legacy API defines all the functionallity as it has been defined in *ELPA* release 2016.11.011. Note, however,
26
that all future features of *ELPA* will only be accessible via the new API defined in release 2017.05.001 or later.
Andreas Marek's avatar
Andreas Marek committed
27 28 29 30 31

## A) Using the final API definition of the *ELPA* library ##

Using *ELPA* with the latest API is done in the following steps

Andreas Marek's avatar
Andreas Marek committed
32 33
- include elpa headers "elpa/elpa.h" (C-Case) or use the Fortran module "use elpa"

Andreas Marek's avatar
Andreas Marek committed
34
- define a instance of the elpa type
Andreas Marek's avatar
Andreas Marek committed
35

Andreas Marek's avatar
Andreas Marek committed
36
- call elpa_init
Andreas Marek's avatar
Andreas Marek committed
37 38
  note, that at the moment the only supported API version number is 20170403

Andreas Marek's avatar
Andreas Marek committed
39
- call elpa_allocate to allocate an instance of *ELPA*
Andreas Marek's avatar
Andreas Marek committed
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
  note that you can define (and configure individually) as many different instances
  for ELPA as you want, e.g. one for CPU only computations and for larger matrices on GPUs

- use ELPA-type function "set" to set matrix and MPI parameters

- call the ELPA-type function "setup"

- set or get all possible ELPA tunable options with ELPA-type functions get/set

  At the moment the following tunable options are available:

     - "solver" can either be ELPA_SOLVER_1STAGE or ELPA_SOLVER_2STAGE
     - "real_kernel" can be one of the available real kernels (a list of available kernels can be
        queried with the ELPA helper binary elpa2_print_kernels)
     - "complex_kernel" can be one of the available complex kernels (a list of available kernels can be
     - "qr" can be either 0 or 1, switches QR decomposition off/on for ELPA_SOLVER_2STAGE
       only available in real-case for blocksize at least 64
     - "gpu" can be either 0 or 1, switches GPU computations off or on, assuming that the installation
       of the ELPA library has been build with GPU support enables
     - "timings" can be either 0 or 1, switches time measurements off or on
     - "debug" can be either 0 or 1, switches detailed debug messages off/on

Andreas Marek's avatar
Andreas Marek committed
62 63
- call ELPA-type function solve or others

Andreas Marek's avatar
Andreas Marek committed
64 65
  At the moment the following ELPA compute functions are available:

66 67
    - "eigenvectors" solves the eigenvalue problem for single/double real/complex valued matrices and
                     returns the eigenvalues AND eigenvectors
68 69
    - "eigenvalues" solves the eigenvalue problem for single/double real/complex valued matrices and
                     returns the eigenvalues
Andreas Marek's avatar
Andreas Marek committed
70 71 72 73
    - "hermetian_multipy" computes C = A^T * B (real) or C = A^H * B (complex) for single/double
      real/complex matrices
    - "cholesky" does a cholesky factorization for a single/double real/complex matrix
    - "invert_triangular" inverts a single/double real/complex triangular matrix
Andreas Marek's avatar
Andreas Marek committed
74
    - "solve_tridiagonal" solves the single/double eigenvalue problem for a real tridiagonal matrix
Andreas Marek's avatar
Andreas Marek committed
75 76

- if the ELPA object is not needed any more call ELPA-type function destroy
Andreas Marek's avatar
Andreas Marek committed
77

Andreas Marek's avatar
Andreas Marek committed
78
- call elpa_uninit at the end of the program
Andreas Marek's avatar
Andreas Marek committed
79 80 81 82 83

## B) Using the legacy API of the *ELPA* library ##

The following description describes the usage of the *ELPA* library with the legacy interface.

Andreas Marek's avatar
Andreas Marek committed
84
### General concept of the *ELPA* library ###
Andreas Marek's avatar
Andreas Marek committed
85

Andreas Marek's avatar
Andreas Marek committed
86 87 88 89
The *ELPA* library consists of two main parts:
- *ELPA 1stage* solver
- *ELPA 2stage* solver

90
Both variants of the *ELPA* solvers are available for real or complex singe and double precision valued matrices.
Andreas Marek's avatar
Andreas Marek committed
91

Andreas Marek's avatar
Andreas Marek committed
92
Thus *ELPA* provides the following user functions (see man pages or [online] (http://elpa.mpcdf.mpg.de/html/Documentation/ELPA-2016.11.001/html/index.html) for details):
Andreas Marek's avatar
Andreas Marek committed
93

94 95 96 97 98
- elpa_get_communicators                        : set the row / column communicators for *ELPA*
- elpa_solve_evp_complex_1stage_{single|double} : solve a {single|double} precision complex eigenvalue proplem with the *ELPA 1stage* solver
- elpa_solve_evp_real_1stage_{single|double}    : solve a {single|double} precision real eigenvalue proplem with the *ELPA 1stage* solver
- elpa_solve_evp_complex_2stage_{single|double} : solve a {single|double} precision complex eigenvalue proplem with the *ELPA 2stage* solver
- elpa_solve_evp_real_2stage_{single|double}    : solve a {single|double} precision real eigenvalue proplem with the *ELPA 2stage* solver
Andreas Marek's avatar
Andreas Marek committed
99 100
- elpa_solve_evp_real_{single|double}           : driver for the {single|double} precision real *ELPA 1stage* or *ELPA 2stage* solver
- elpa_solve_evp_complex_{single|double}        : driver for the {single|double} precision complex *ELPA 1stage* or *ELPA 2stage* solver
101 102 103



Andreas Marek's avatar
Andreas Marek committed
104
Furthermore *ELPA* provides the utility binary "elpa2_print_available_kernels": it tells the user
Andreas Marek's avatar
Andreas Marek committed
105 106 107 108 109 110 111 112 113
which *ELPA 2stage* compute kernels have been installed and which default kernels are set

If you want to solve an eigenvalue problem with *ELPA*, you have to decide whether you
want to use *ELPA 1stage* or *ELPA 2stage* solver. Normally, *ELPA 2stage* is the better
choice since it is faster, but there a matrix dimensions where *ELPA 1stage* is supperior.

Independent of the choice of the solver, the concept of calling *ELPA* is always the same:

#### MPI version of *ELPA* ####
Andreas Marek's avatar
Andreas Marek committed
114 115 116 117 118

In this case, *ELPA* relies on a BLACS distributed matrix.
To solve a Eigenvalue problem of this matrix with *ELPA*, one has

1. to include the *ELPA* header (C case) or module (Fortran)
119 120
2. to create row and column MPI communicators for ELPA (with "elpa_get_communicators")
3. to call to the *ELPA driver* or directly call *ELPA 1stage* or *ELPA 2stage* for the matrix.
Andreas Marek's avatar
Andreas Marek committed
121

Andreas Marek's avatar
Andreas Marek committed
122
Here is a very simple MPI code snippet for using *ELPA 1stage*: For the definition of all variables
Andreas Marek's avatar
Andreas Marek committed
123 124
please have a look at the man pages and/or the online documentation (see above). A full version
of a simple example program can be found in ./test_project/src.
Andreas Marek's avatar
Andreas Marek committed
125 126 127


   ! All ELPA routines need MPI communicators for communicating within
128
   ! rows or columns of processes, these are set in elpa_get_communicators
Andreas Marek's avatar
Andreas Marek committed
129

Andreas Marek's avatar
Andreas Marek committed
130
   success = elpa_get_communicators(mpi_comm_world, my_prow, my_pcol, &
Andreas Marek's avatar
Andreas Marek committed
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
                                    mpi_comm_rows, mpi_comm_cols)

   if (myid==0) then
     print '(a)','| Past split communicator setup for rows and columns.'
   end if

   ! Determine the necessary size of the distributed matrices,
   ! we use the Scalapack tools routine NUMROC for that.

   na_rows = numroc(na, nblk, my_prow, 0, np_rows)
   na_cols = numroc(na, nblk, my_pcol, 0, np_cols)

   !-------------------------------------------------------------------------------
   ! Calculate eigenvalues/eigenvectors

   if (myid==0) then
     print '(a)','| Entering one-step ELPA solver ... '
     print *
   end if

151
   success = elpa_solve_evp_real_1stage_{single|double} (na, nev, a, na_rows, ev, z, na_rows, nblk, &
Andreas Marek's avatar
Andreas Marek committed
152
                                   matrixCols, mpi_comm_rows, mpi_comm_cols)
Andreas Marek's avatar
Andreas Marek committed
153 154 155 156 157 158 159

   if (myid==0) then
     print '(a)','| One-step ELPA solver complete.'
     print *
   end if


Andreas Marek's avatar
Andreas Marek committed
160
#### Shared-memory version of *ELPA* ####
Andreas Marek's avatar
Andreas Marek committed
161

Andreas Marek's avatar
Andreas Marek committed
162
If the *ELPA* library has been compiled with the configure option "--with-mpi=0",
Andreas Marek's avatar
Andreas Marek committed
163 164 165 166
no MPI will be used.

Still the **same** call sequence as in the MPI case can be used (see above).

Andreas Marek's avatar
Andreas Marek committed
167
#### Setting the row and column communicators ####
Andreas Marek's avatar
Andreas Marek committed
168

Andreas Marek's avatar
Andreas Marek committed
169 170 171
SYNOPSIS
   FORTRAN INTERFACE
       use elpa1
172

173
       success = elpa_get_communicators (mpi_comm_global, my_prow, my_pcol, mpi_comm_rows, mpi_comm_cols)
174

Andreas Marek's avatar
Andreas Marek committed
175 176 177 178 179 180 181 182 183 184 185
       integer, intent(in)   mpi_comm_global:  global communicator for the calculation
       integer, intent(in)   my_prow:          row coordinate of the calling process in the process grid
       integer, intent(in)   my_pcol:          column coordinate of the calling process in the process grid
       integer, intent(out)  mpi_comm_row:     communicator for communication within rows of processes
       integer, intent(out)  mpi_comm_row:     communicator for communication within columns of processes

       integer               success:          return value indicating success or failure of the underlying MPI_COMM_SPLIT function

   C INTERFACE
       #include "elpa_generated.h"

186
       success = elpa_get_communicators (int mpi_comm_world, int my_prow, my_pcol, int *mpi_comm_rows, int *Pmpi_comm_cols);
Andreas Marek's avatar
Andreas Marek committed
187 188 189 190 191 192 193 194 195 196 197 198

       int mpi_comm_global:  global communicator for the calculation
       int my_prow:          row coordinate of the calling process in the process grid
       int my_pcol:          column coordinate of the calling process in the process grid
       int *mpi_comm_row:    pointer to the communicator for communication within rows of processes
       int *mpi_comm_row:    pointer to the communicator for communication within columns of processes

       int  success:         return value indicating success or failure of the underlying MPI_COMM_SPLIT function


#### Using *ELPA 1stage* ####

199
After setting up the *ELPA* row and column communicators (by calling elpa_get_communicators),
Andreas Marek's avatar
Andreas Marek committed
200 201 202 203 204
only the real or complex valued solver has to be called:

SYNOPSIS
   FORTRAN INTERFACE
       use elpa1
205
       success = elpa_solve_evp_real_1stage_{single|double} (na, nev, a(lda,matrixCols), ev(nev), q(ldq, matrixCols), ldq, nblk, matrixCols, mpi_comm_rows,
Andreas Marek's avatar
Andreas Marek committed
206 207 208 209 210 211
       mpi_comm_cols)

       With the definintions of the input and output variables:

       integer, intent(in)    na:            global dimension of quadratic matrix a to solve
       integer, intent(in)    nev:           number of eigenvalues to be computed; the first nev eigenvalules are calculated
212
       real*{4|8},  intent(inout) a:         locally distributed part of the matrix a. The local dimensions are lda x matrixCols
Andreas Marek's avatar
Andreas Marek committed
213
       integer, intent(in)    lda:           leading dimension of locally distributed matrix a
214 215
       real*{4|8},  intent(inout) ev:        on output the first nev computed eigenvalues
       real*{4|8},  intent(inout) q:         on output the first nev computed eigenvectors
Andreas Marek's avatar
Andreas Marek committed
216 217 218
       integer, intent(in)    ldq:           leading dimension of matrix q which stores the eigenvectors
       integer, intent(in)    nblk:          blocksize of block cyclic distributin, must be the same in both directions
       integer, intent(in)    matrixCols:    number of columns of locally distributed matrices a and q
219 220
       integer, intent(in)    mpi_comm_rows: communicator for communication in rows. Constructed with elpa_get_communicators(3)
       integer, intent(in)    mpi_comm_cols: communicator for communication in colums. Constructed with elpa_get_communicators(3)
Andreas Marek's avatar
Andreas Marek committed
221 222 223 224 225 226

       logical                success:       return value indicating success or failure

   C INTERFACE
       #include "elpa.h"

227
       success = elpa_solve_evp_real_1stage_{single|double} (int na, int nev,  double *a, int lda,  double *ev, double *q, int ldq, int nblk, int matrixCols, int
Andreas Marek's avatar
Andreas Marek committed
228 229 230 231 232 233
       mpi_comm_rows, int mpi_comm_cols);

       With the definintions of the input and output variables:

       int     na:            global dimension of quadratic matrix a to solve
       int     nev:           number of eigenvalues to be computed; the first nev eigenvalules are calculated
234
       {float|double} *a:     pointer to locally distributed part of the matrix a. The local dimensions are lda x matrixCols
Andreas Marek's avatar
Andreas Marek committed
235
       int     lda:           leading dimension of locally distributed matrix a
236 237
       {float|double} *ev:    pointer to memory containing on output the first nev computed eigenvalues
       {float|double} *q:     pointer to memory containing on output the first nev computed eigenvectors
Andreas Marek's avatar
Andreas Marek committed
238 239 240
       int     ldq:           leading dimension of matrix q which stores the eigenvectors
       int     nblk:          blocksize of block cyclic distributin, must be the same in both directions
       int     matrixCols:    number of columns of locally distributed matrices a and q
241 242
       int     mpi_comm_rows: communicator for communication in rows. Constructed with elpa_get_communicators(3)
       int     mpi_comm_cols: communicator for communication in colums. Constructed with elpa_get_communicators(3)
Andreas Marek's avatar
Andreas Marek committed
243 244 245 246 247

       int     success:       return value indicating success (1) or failure (0)

DESCRIPTION
       Solve the real eigenvalue problem with the 1-stage solver. The ELPA communicators mpi_comm_rows and mpi_comm_cols are obtained with the
248
       elpa_get_communicators(3) function. The distributed quadratic marix a has global dimensions na x na, and a local size lda x matrixCols.
Andreas Marek's avatar
Andreas Marek committed
249 250 251 252 253
       The solver will compute the first nev eigenvalues, which will be stored on exit in ev. The eigenvectors corresponding to the eigenvalues
       will be stored in q. All memory of the arguments must be allocated outside the call to the solver.

   FORTRAN INTERFACE
       use elpa1
254
       success = elpa_solve_evp_complex_1stage_{single|double} (na, nev, a(lda,matrixCols), ev(nev), q(ldq, matrixCols), ldq, nblk, matrixCols, mpi_comm_rows,
Andreas Marek's avatar
Andreas Marek committed
255 256 257 258 259 260
       mpi_comm_cols)

       With the definintions of the input and output variables:

       integer,     intent(in)    na:            global dimension of quadratic matrix a to solve
       integer,     intent(in)    nev:           number of eigenvalues to be computed; the first nev eigenvalules are calculated
261
       complex*{8|16},  intent(inout) a:         locally distributed part of the matrix a. The local dimensions are lda x matrixCols
Andreas Marek's avatar
Andreas Marek committed
262
       integer,     intent(in)    lda:           leading dimension of locally distributed matrix a
263 264
       real*{4|8},      intent(inout) ev:        on output the first nev computed eigenvalues
       complex*{8|16},  intent(inout) q:         on output the first nev computed eigenvectors
Andreas Marek's avatar
Andreas Marek committed
265 266 267
       integer,     intent(in)    ldq:           leading dimension of matrix q which stores the eigenvectors
       integer,     intent(in)    nblk:          blocksize of block cyclic distributin, must be the same in both directions
       integer,     intent(in)    matrixCols:    number of columns of locally distributed matrices a and q
268 269
       integer,     intent(in)    mpi_comm_rows: communicator for communication in rows. Constructed with elpa_get_communicators(3)
       integer, intent(in)        mpi_comm_cols: communicator for communication in colums. Constructed with elpa_get_communicators(3)
Andreas Marek's avatar
Andreas Marek committed
270 271 272 273 274 275 276

       logical                    success:       return value indicating success or failure

   C INTERFACE
       #include "elpa.h"
       #include <complex.h>

277
       success = elpa_solve_evp_complex_1stage_{single|double} (int na, int nev,  double complex *a, int lda,  double *ev, double complex*q, int ldq, int nblk, int
Andreas Marek's avatar
Andreas Marek committed
278 279 280 281 282 283
       matrixCols, int mpi_comm_rows, int mpi_comm_cols);

       With the definintions of the input and output variables:

       int             na:            global dimension of quadratic matrix a to solve
       int             nev:           number of eigenvalues to be computed; the first nev eigenvalules are calculated
284
       {float|double} complex *a:     pointer to locally distributed part of the matrix a. The local dimensions are lda x matrixCols
Andreas Marek's avatar
Andreas Marek committed
285
       int             lda:           leading dimension of locally distributed matrix a
286 287
       {float|double}         *ev:    pointer to memory containing on output the first nev computed eigenvalues
       {float|double} complex *q:     pointer to memory containing on output the first nev computed eigenvectors
Andreas Marek's avatar
Andreas Marek committed
288 289 290
       int             ldq:           leading dimension of matrix q which stores the eigenvectors
       int             nblk:          blocksize of block cyclic distributin, must be the same in both directions
       int             matrixCols:    number of columns of locally distributed matrices a and q
291 292
       int             mpi_comm_rows: communicator for communication in rows. Constructed with elpa_get_communicators(3)
       int             mpi_comm_cols: communicator for communication in colums. Constructed with elpa_get_communicators(3)
Andreas Marek's avatar
Andreas Marek committed
293 294 295 296 297

       int             success:       return value indicating success (1) or failure (0)

DESCRIPTION
       Solve the complex eigenvalue problem with the 1-stage solver. The ELPA communicators mpi_comm_rows and mpi_comm_cols are obtained with the
298
       elpa_get_communicators(3) function. The distributed quadratic marix a has global dimensions na x na, and a local size lda x matrixCols.
Andreas Marek's avatar
Andreas Marek committed
299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325
       The solver will compute the first nev eigenvalues, which will be stored on exit in ev. The eigenvectors corresponding to the eigenvalues
       will be stored in q. All memory of the arguments must be allocated outside the call to the solver.


The *ELPA 1stage* solver, does not need or accept any other parameters than in the above
specification.

#### Using *ELPA 2stage* ####

The *ELPA 2stage* solver can be used in the same manner, as the *ELPA 1stage* solver.
However, the 2 stage solver, can be used with different compute kernels, which offers
more possibilities for configuration.

It is recommended to first call the utillity program

elpa2_print_kernels

which will tell all the compute kernels that can be used with *ELPA 2stage*". It will
also give information, whether a kernel can be set via environment variables.

##### Using the default kernels #####

If no kernel is set either via an environment variable or the *ELPA 2stage API* then
the default kernels will be set.

##### Setting the *ELPA 2stage* compute kernels #####

326 327
##### Setting the *ELPA 2stage* compute kernels with environment variables#####

Andreas Marek's avatar
Andreas Marek committed
328 329 330 331
If the *ELPA* installation allows setting ther compute kernels with enviroment variables,
setting the variables "REAL_ELPA_KERNEL" and "COMPLEX_ELPA_KERNEL" will set the compute
kernels. The environment variable setting will take precedence over all other settings!

332 333 334 335 336 337
The utility program "elpa2_print_kernels" can list which kernels are available and which
would be choosen. This reflects, as well the setting of the default kernel or the settings
with the environment variables

##### Setting the *ELPA 2stage* compute kernels with API calls#####

Andreas Marek's avatar
Andreas Marek committed
338 339
It is also possible to set the *ELPA 2stage* compute kernels via the API.

340 341
As an example the API for ELPA real double-precision 2stage is shown:

Andreas Marek's avatar
Andreas Marek committed
342 343
SYNOPSIS
   FORTRAN INTERFACE
Andreas Marek's avatar
Andreas Marek committed
344 345
       use elpa1
       use elpa2
346 347
       success = elpa_solve_evp_real_2stage_double (na, nev, a(lda,matrixCols), ev(nev), q(ldq, matrixCols), ldq, nblk, matrixCols, mpi_comm_rows,
       mpi_comm_cols, mpi_comm_all, THIS_REAL_ELPA_KERNEL, useQR, useGPU)
Andreas Marek's avatar
Andreas Marek committed
348 349 350 351 352

       With the definintions of the input and output variables:

       integer, intent(in)            na:            global dimension of quadratic matrix a to solve
       integer, intent(in)            nev:           number of eigenvalues to be computed; the first nev eigenvalules are calculated
353
       real*{4|8},  intent(inout)         a:         locally distributed part of the matrix a. The local dimensions are lda x matrixCols
Andreas Marek's avatar
Andreas Marek committed
354
       integer, intent(in)            lda:           leading dimension of locally distributed matrix a
355 356
       real*{4|8},  intent(inout)         ev:        on output the first nev computed eigenvalues
       real*{4|8},  intent(inout)         q:         on output the first nev computed eigenvectors
Andreas Marek's avatar
Andreas Marek committed
357 358 359
       integer, intent(in)            ldq:           leading dimension of matrix q which stores the eigenvectors
       integer, intent(in)            nblk:          blocksize of block cyclic distributin, must be the same in both directions
       integer, intent(in)            matrixCols:    number of columns of locally distributed matrices a and q
360 361
       integer, intent(in)            mpi_comm_rows: communicator for communication in rows. Constructed with elpa_get_communicators(3)
       integer, intent(in)            mpi_comm_cols: communicator for communication in colums. Constructed with elpa_get_communicators(3)
Andreas Marek's avatar
Andreas Marek committed
362 363
       integer, intent(in)            mpi_comm_all:  communicator for all processes in the processor set involved in ELPA
       logical, intent(in), optional: useQR:         optional argument; switches to QR-decomposition if set to .true.
364
       logical, intent(in), optional: useGPU:        decide whether GPUs should be used ore not
Andreas Marek's avatar
Andreas Marek committed
365 366 367 368 369 370

      logical                        success:       return value indicating success or failure

   C INTERFACE
       #include "elpa.h"

371 372
       success = elpa_solve_evp_real_2stage_double (int na, int nev,  double *a, int lda,  double *ev, double *q, int ldq, int nblk, int matrixCols, int
       mpi_comm_rows, int mpi_comm_cols, int mpi_comm_all, int THIS_ELPA_REAL_KERNEL, int useQR, int useGPU);
Andreas Marek's avatar
Andreas Marek committed
373 374 375 376 377 378 379 380 381 382 383 384

       With the definintions of the input and output variables:

       int     na:            global dimension of quadratic matrix a to solve
       int     nev:           number of eigenvalues to be computed; the first nev eigenvalules are calculated
       double *a:             pointer to locally distributed part of the matrix a. The local dimensions are lda x matrixCols
       int     lda:           leading dimension of locally distributed matrix a
       double *ev:            pointer to memory containing on output the first nev computed eigenvalues
       double *q:             pointer to memory containing on output the first nev computed eigenvectors
       int     ldq:           leading dimension of matrix q which stores the eigenvectors
       int     nblk:          blocksize of block cyclic distributin, must be the same in both directions
       int     matrixCols:    number of columns of locally distributed matrices a and q
385 386
       int     mpi_comm_rows: communicator for communication in rows. Constructed with elpa_get_communicators(3)
       int     mpi_comm_cols: communicator for communication in colums. Constructed with elpa_get_communicators(3)
Andreas Marek's avatar
Andreas Marek committed
387 388
       int     mpi_comm_all:  communicator for all processes in the processor set involved in ELPA
       int     useQR:         if set to 1 switch to QR-decomposition
389
       int     useGPU:        decide whether the GPU version should be used or not
Andreas Marek's avatar
Andreas Marek committed
390 391 392 393 394 395

       int     success:       return value indicating success (1) or failure (0)


DESCRIPTION
       Solve the real eigenvalue problem with the 2-stage solver. The ELPA communicators mpi_comm_rows and mpi_comm_cols are obtained with the
396
       elpa_get_communicators(3) function. The distributed quadratic marix a has global dimensions na x na, and a local size lda x matrixCols.
Andreas Marek's avatar
Andreas Marek committed
397 398 399
       The solver will compute the first nev eigenvalues, which will be stored on exit in ev. The eigenvectors corresponding to the eigenvalues
       will be stored in q. All memory of the arguments must be allocated outside the call to the solver.

400 401
##### Setting up *ELPA 1stage* or *ELPA 2stage* with the *ELPA driver interface* #####

402 403 404
Since release ELPA 2016.005.004 a driver routine allows to choose more easily which solver (1stage or 2stage) will be used.

As an exmple the real double-precision case is explained:
405 406 407 408 409

 SYNOPSIS

 FORTRAN INTERFACE

Andreas Marek's avatar
Andreas Marek committed
410
  use elpa_driver
411

412
  success = elpa_solve_evp_real_double (na, nev, a(lda,matrixCols), ev(nev), q(ldq, matrixCols), ldq, nblk, matrixCols, mpi_comm_rows, mpi_comm_cols, mpi_comm_all, THIS_REAL_ELPA_KERNEL=THIS_REAL_ELPA_KERNEL, useQR, useGPU, method=method)
413 414 415 416 417 418

  Generalized interface to the ELPA 1stage and 2stage solver for real-valued problems

  With the definintions of the input and output variables:


Andreas Marek's avatar
Andreas Marek committed
419
  integer, intent(in)            na:                    global dimension of quadratic matrix a to solve
420

Andreas Marek's avatar
Andreas Marek committed
421
  integer, intent(in)            nev:                   number of eigenvalues to be computed; the first nev eigenvalules are calculated
422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446

  real*8,  intent(inout)         a:                     locally distributed part of the matrix a. The local dimensions are lda x matrixCols

  integer, intent(in)            lda:                   leading dimension of locally distributed matrix a

  real*8,  intent(inout)         ev:                    on output the first nev computed eigenvalues"

  real*8,  intent(inout)         q:                     on output the first nev computed eigenvectors"

  integer, intent(in)            ldq:                   leading dimension of matrix q which stores the eigenvectors

  integer, intent(in)            nblk:                  blocksize of block cyclic distributin, must be the same in both directions

  integer, intent(in)            matrixCols:            number of columns of locally distributed matrices a and q

  integer, intent(in)            mpi_comm_rows:         communicator for communication in rows. Constructed with elpa_get_communicators

  integer, intent(in)            mpi_comm_cols:         communicator for communication in colums. Constructed with elpa_get_communicators

  integer, intent(in)            mpi_comm_all:          communicator for all processes in the processor set involved in ELPA

  integer, intent(in), optional: THIS_REAL_ELPA_KERNEL: optional argument, choose the compute kernel for 2-stage solver

  logical, intent(in), optional: useQR:                 optional argument; switches to QR-decomposition if set to .true.

447 448
  logical, intent(in), optional: useQPU:                decide whether the GPU version should be used or not

449 450 451 452 453 454 455 456 457
  character(*), optional         method:                use 1stage solver if "1stage", use 2stage solver if "2stage", (at the moment) use 2stage solver if "auto"

  logical                        success:               return value indicating success or failure


 C INTERFACE

 #include "elpa.h"

458
 success = elpa_solve_evp_real_double (int na, int nev, double *a, int lda, double *ev, double *q, int ldq, int nblk, int matrixCols, int mpi_comm_rows, int mpi_comm_cols, int mpi_comm_all, int THIS_ELPA_REAL_KERNEL, int useQR, int useGPU, char *method);"
459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491


 With the definintions of the input and output variables:"


 int     na:                    global dimension of quadratic matrix a to solve

 int     nev:                   number of eigenvalues to be computed; the first nev eigenvalules are calculated

 double *a:                     pointer to locally distributed part of the matrix a. The local dimensions are lda x matrixCols

 int     lda:                   leading dimension of locally distributed matrix a

 double *ev:                    pointer to memory containing on output the first nev computed eigenvalues

 double *q:                     pointer to memory containing on output the first nev computed eigenvectors

 int     ldq:                   leading dimension of matrix q which stores the eigenvectors

 int     nblk:                  blocksize of block cyclic distributin, must be the same in both directions

 int     matrixCols:            number of columns of locally distributed matrices a and q

 int     mpi_comm_rows:         communicator for communication in rows. Constructed with elpa_get_communicators

 int     mpi_comm_cols:         communicator for communication in colums. Constructed with elpa_get_communicators

 int     mpi_comm_all:          communicator for all processes in the processor set involved in ELPA

 int     THIS_ELPA_REAL_KERNEL: choose the compute kernel for 2-stage solver

 int     useQR:                 if set to 1 switch to QR-decomposition

492 493
 int     useGPU:                decide whether the GPU version should be used or not

494 495 496 497 498 499 500
 char   *method:                use 1stage solver if "1stage", use 2stage solver if "2stage", (at the moment) use 2stage solver if "auto"

 int     success:               return value indicating success (1) or failure (0)

 DESCRIPTION
 Solve the real eigenvalue problem. The value of method desides whether the 1stage or 2stage solver is used. The ELPA communicators mpi_comm_rows and mpi_comm_cols are obtained with the elpa_get_communicators function. The distributed quadratic marix a has global dimensions na x na, and a local size lda x matrixCols. The solver will compute the first nev eigenvalues, which will be stored on exit in ev. The eigenvectors corresponding to the eigenvalues will be stored in q. All memory of the arguments must be allocated outside the call to the solver.

501 502 503 504 505 506 507 508 509 510 511 512 513 514
##### Setting up the GPU version of *ELPA* 1 and 2 stage #####

Since release ELPA 2016.011.001.pre *ELPA* offers GPU support, IF *ELPA* has been build with the configure option "--enabble-gpu-support".

At run-time the GPU version can be used by setting the environment variable "ELPA_USE_GPU" to "yes", or by calling the *ELPA* functions
(elpa_solve_evp_real_{double|single}, elpa_solve_evp_real_1stage_{double|single}, elpa_solve_evp_real_2stage_{double|single}) with the
argument "useGPU = .true." or "useGPU = 1" for the Fortran and C case, respectively. Please, not that similiar to the choice of the
*ELPA* 2stage compute kernels, the enviroment variable takes precendence over the setting in the API call.

Further note that it is NOT allowed to define the usage of GPUs AND to EXPLICITLY set an ELPA 2stage compute kernel other than
"REAL_ELPA_KERNEL_GPU" or "COMPLEX_ELPA_KERNEL_GPU".