USERS_GUIDE.md 17 KB
Newer Older
Andreas Marek's avatar
Andreas Marek committed
1
## Users guide for the *ELPA* library ##
Andreas Marek's avatar
Andreas Marek committed
2

3 4 5 6 7
This document provides the guide for using the *ELPA* library with the new API (API version 20170403 or higher).
If you want to use the deprecated legacy API (we strongly recommend against this), please refer to the document
[USERS_GUIDE_DEPRECATED_LEGACY_API.md] (USERS_GUIDE_DEPRECATED_LEGACY_API.md).

If you need instructions on how to build *ELPA*, please look at [INSTALL.md] (INSTALL.md).
Andreas Marek's avatar
Andreas Marek committed
8 9 10 11 12

### Online and local documentation ###

Local documentation (via man pages) should be available (if *ELPA* has been installed with the documentation):

13
For example "man elpa2_print_kernels" should provide the documentation for the *ELPA* program, which prints all
14
the available kernels.
Andreas Marek's avatar
Andreas Marek committed
15

16
Also a [online doxygen documentation] (http://elpa.mpcdf.mpg.de/html/Documentation/ELPA-2019.05.001/html/index.html)
Andreas Marek's avatar
Andreas Marek committed
17 18
for each *ELPA* release is available.

19

20
### API of the *ELPA* library ###
21

22 23
With release 2017.05.001 of the *ELPA* library the interface has been rewritten substantially, in order to have a more generic 
interface and to avoid future interface changes.
24 25

For compatibility reasons the interface defined in the previous release 2016.11.001 is also still available
26
**IF AND ONLY IF** *ELPA* has been build with support of this legacy interface.
27

28
The legacy API defines all the functionality as it has been defined in *ELPA* release 2016.11.011. Note, however,
29
that all future features of *ELPA* will only be accessible via the new API defined in release 2017.05.001 or later.
30

31 32 33
As mentioned, we advise against it, but if you want to use the legacy API please look at the document 
[USERS_GUIDE_DEPRECATED_LEGACY_API.md] (USERS_GUIDE_DEPRECATED_LEGACY_API.md).

34 35 36 37 38

The old, obsolete legacy API will be deprecated in the future !
Allready now, all new features of ELPA are only available with the new API. Thus, there
is no reason to keep the legacy API arround for too long.

39
The release ELPA 2018.11.001 was the last release, where the legacy API has been
40
enabled by default (and can be disabled at build time).
41
With release ELPA 2019.05.001 the legacy API is disabled by default, however,
42 43 44 45 46
can be still switched on at build time.
Most likely with the release ELPA 2019.11.001 the legacy API will be deprecated and
not supported anymore.


47 48 49 50 51 52 53 54 55 56
### Table of Contents: ###

- I)   General concept of the *ELPA* API
- II)  List of supported tunable parameters
- III) List of computational routines
- IV)  Using OpenMP threading
- V)   Influencing default values with environment variables
- VI)   Autotuning

## I) General concept of the *ELPA* API ##
57

58
Using *ELPA* just requires a few steps:
59

Andreas Marek's avatar
Andreas Marek committed
60 61
- include elpa headers "elpa/elpa.h" (C-Case) or use the Fortran module "use elpa"

62
- define a instance of the elpa type
Andreas Marek's avatar
Andreas Marek committed
63

64
- call elpa_init
Andreas Marek's avatar
Andreas Marek committed
65

66
- call elpa_allocate to allocate an instance of *ELPA*
Andreas Marek's avatar
Andreas Marek committed
67 68 69 70 71 72 73 74 75
  note that you can define (and configure individually) as many different instances
  for ELPA as you want, e.g. one for CPU only computations and for larger matrices on GPUs

- use ELPA-type function "set" to set matrix and MPI parameters

- call the ELPA-type function "setup"

- set or get all possible ELPA tunable options with ELPA-type functions get/set

76 77
- call ELPA-type function solve or others

Andreas Marek's avatar
Andreas Marek committed
78
- if the ELPA object is not needed any more call ELPA-type function destroy
79

Andreas Marek's avatar
Andreas Marek committed
80
- call elpa_uninit at the end of the program
81

82 83 84 85 86 87 88 89
To be more precise a basic call sequence for Fortran and C looks as follows:

Fortran synopsis

```Fortran
 use elpa
 class(elpa_t), pointer :: elpa
 integer :: success
90

91 92 93 94
 if (elpa_init(20171201) /= ELPA_OK) then        ! put here the API version that you are using
    print *, "ELPA API version not supported"
    stop
  endif
Andreas Marek's avatar
Andreas Marek committed
95 96 97 98 99 100
  elpa => elpa_allocate(success)
  if (success != ELPA_OK) then
    ! react on the error
    ! we urge every user to always check the error codes
    ! of all ELPA functions
  endif
101 102 103 104 105 106 107 108 109 110

  ! set parameters decribing the matrix and it's MPI distribution
  call elpa%set("na", na, success)                          ! size of the na x na matrix
  call elpa%set("nev", nev, success)                        ! number of eigenvectors that should be computed ( 1<= nev <= na)
  call elpa%set("local_nrows", na_rows, success)            ! number of local rows of the distributed matrix on this MPI task 
  call elpa%set("local_ncols", na_cols, success)            ! number of local columns of the distributed matrix on this MPI task
  call elpa%set("nblk", nblk, success)                      ! size of the BLACS block cyclic distribution
  call elpa%set("mpi_comm_parent", MPI_COMM_WORLD, success) ! the global MPI communicator
  call elpa%set("process_row", my_prow, success)            ! row coordinate of MPI process
  call elpa%set("process_col", my_pcol, success)            ! column coordinate of MPI process
111

Andreas Marek's avatar
Andreas Marek committed
112
  success = elpa%setup()
113

114 115 116 117
  ! if desired, set any number of tunable run-time options
  ! look at the list of possible options as detailed later in
  ! USERS_GUIDE.md
  call e%set("solver", ELPA_SOLVER_2STAGE, success)
118

119 120 121 122
  ! set the AVX BLOCK2 kernel, otherwise ELPA_2STAGE_REAL_DEFAULT will
  ! be used
  call e%set("real_kernel", ELPA_2STAGE_REAL_AVX_BLOCK2, success)

123 124 125 126
  ! use method solve to solve the eigenvalue problem to obtain eigenvalues
  ! and eigenvectors
  ! other possible methods are desribed in USERS_GUIDE.md
  call e%eigenvectors(a, ev, z, success)
127

128 129
  ! cleanup
  call elpa_deallocate(e)
130

131 132
  call elpa_uninit()
```
133

134 135 136
C Synopsis:
```C
   #include <elpa/elpa.h>
137

138 139
   elpa_t handle;
   int error;
140

141 142 143 144
   if (elpa_init(20171201) != ELPA_OK) {                          // put here the API version that you are using
     fprintf(stderr, "Error: ELPA API version not supported");
     exit(1);
   }
145

146
   handle = elpa_allocate(&error);
Andreas Marek's avatar
Andreas Marek committed
147 148 149 150 151
   if (error != ELPA_OK) {
     /* react on the error code */
     /* we urge the user to always check the error codes of all ELPA functions */
   }

152

153 154 155 156 157 158 159 160 161
   /* Set parameters the matrix and it's MPI distribution */
   elpa_set(handle, "na", na, &error);                                           // size of the na x na matrix
   elpa_set(handle, "nev", nev, &error);                                         // number of eigenvectors that should be computed ( 1<= nev <= na)
   elpa_set(handle, "local_nrows", na_rows, &error);                             // number of local rows of the distributed matrix on this MPI task 
   elpa_set(handle, "local_ncols", na_cols, &error);                             // number of local columns of the distributed matrix on this MPI task
   elpa_set(handle, "nblk", nblk, &error);                                       // size of the BLACS block cyclic distribution
   elpa_set(handle, "mpi_comm_parent", MPI_Comm_c2f(MPI_COMM_WORLD), &error);    // the global MPI communicator
   elpa_set(handle, "process_row", my_prow, &error);                             // row coordinate of MPI process
   elpa_set(handle, "process_col", my_pcol, &error);                             // column coordinate of MPI process
162

163
   /* Setup */
Andreas Marek's avatar
Andreas Marek committed
164
   error = elpa_setup(handle);
165

166 167 168
   /* if desired, set any number of tunable run-time options */
   /* look at the list of possible options as detailed later in
      USERS_GUIDE.md */
169

170
   elpa_set(handle, "solver", ELPA_SOLVER_2STAGE, &error);
171 172 173 174
  
   // set the AVX BLOCK2 kernel, otherwise ELPA_2STAGE_REAL_DEFAULT will
   // be used
   elpa_set(handle, "real_kernel", ELPA_2STAGE_REAL_AVX_BLOCK2, &error)
175

176 177 178
   /* use method solve to solve the eigenvalue problem */
   /* other possible methods are desribed in USERS_GUIDE.md */
   elpa_eigenvectors(handle, a, ev, z, &error);
179

180 181 182 183
   /* cleanup */
   elpa_deallocate(handle);
   elpa_uninit();
```
184

185
## II) List of supported tunable parameters ##
186

187
The following table gives a list of all supported parameters which can be used to tune (influence) the runtime behaviour of *ELPA* ([see here if you cannot read it in your editor] (https://gitlab.mpcdf.mpg.de/elpa/elpa/wikis/USERS_GUIDE))
188

189 190 191 192 193 194 195 196 197 198 199
| Parameter name | Short description     | default value               | possible values         | since API version | 
| :------------- |:--------------------- | :-------------------------- | :---------------------- | :---------------- | 
| solver         | use ELPA 1 stage <br>  or 2 stage solver | ELPA_SOLVER_1STAGE          | ELPA_SOLVER_1STAGE <br> ELPA_SOLVER_2STAGE      | 20170403          |
| gpu            | use GPU (if build <br> with GPU support)| 0                           | 0 or 1             | 20170403          | 
| real_kernel    | real kernel to be <br> used in ELPA 2 | ELPA_2STAGE_REAL_DEFAULT    | see output of <br> elpa2_print_kernels    | 20170403          |
| complex kernel | complex kernel to <br>  be used in ELPA 2 | ELPA_2STAGE_COMPLEX_DEFAULT | see output of <br>  elpa2_print_kernels     | 20170403          |
| omp_threads    | OpenMP threads used <br> (if build with OpenMP <br> support) | 1 | >1 | 20180525 |
| qr | Use QR decomposition in <br> ELPA 2 real | 0 | 0 or 1 |  20170403  |
| timings | Enable time <br> measurement | 1 | 0 or 1 |  20170403  |
| debug | give debug information | 0 | 0 or 1 | 20170403  |
       
200

201
## III) List of computational routines ##
202

203
The following compute routines are available in *ELPA*: Please have a look at the man pages or  [online doxygen documentation] (http://elpa.mpcdf.mpg.de/html/Documentation/ELPA-2019.05.001/html/index.html) for details.
204 205


206 207 208 209 210 211 212 213 214
| Name         | Purpose                                                                 | since API version |
| :----------- | :---------------------------------------------------------------------- | :---------------- |
| eigenvectors | solve std. eigenvalue problem <br> compute eigenvalues and eigenvectors | 20170403  |
| eigenvalues  | solve std. eigenvalue problem <br> compute eigenvalues only             | 20170403  |
| generalized_eigenvectors | solve generalized eigenvalule problem <br> compute eigenvalues and eigenvectors | 20180525 |
| generalized_eigenvalues  | solve generalized eigenvalule problem <br> compute eigenvalues only             | 20180525 |
| hermitian_multiply       | do (real) a^T x b <br> (complex) a^H x b                                        | 20170403 |
| cholesky                 | do cholesky factorisation                                                       | 20170403 |
| invert_triangular        | invert a upper triangular matrix                                                | 20170403 |
215
| solve_tridiagonal        | solve EVP for a tridiagonal matrix                                              | 20170403 |
216 217


218
## IV) Using OpenMP threading ##
219

220 221 222
If *ELPA* has been build with OpenMP threading support you can specify the number of OpenMP threads that *ELPA* will use internally.
Please note that it is **mandatory**  to set the number of threads to be used with the OMP_NUM_THREADS environment variable **and**
with the **set method** 
223

224 225 226
```Fortran
call e%set("omp_threads", 4, error)
```
227

228
**or the *ELPA* environment variable**
229

230
export ELPA_DEFAULT_omp_threads=4 (see Section V for an explanation of this variable).
231

232
Just setting the environment variable OMP_NUM_THREADS is **not** sufficient.
233

234
This is necessary to make the threading an autotunable option.
235

236
## V) Influencing default values with environment variables ##
237

238 239
For each tunable parameter mentioned in Section II, there exists a default value. This means, that if this parameter is **not explicitly** set by the user by the
*ELPA* set method, *ELPA* takes the default value for the parameter. E.g. if the user does not set a solver method, than *ELPA* will take the default "ELPA_SOLVER_1STAGE".
240

241
The user can change this default value by setting an enviroment variable to the desired value.
242

243 244 245 246
The name of this variable is always constructed in the following way:
```
ELPA_DEFAULT_tunable_parameter_name=value
```
247

248
, e.g. in case of the solver the user can
249

250 251 252
```
export ELPA_DEFAULT_solver=ELPA_SOLVER_2STAGE
```
253

254
in order to define the 2stage solver as the default.
255

256 257 258 259 260 261 262
**Important note**
The default valule is completly ignored, if the user has manually set a parameter-value pair with the *ELPA* set method!
Thus the above environemnt variable will **not** have an effect, if the user code contains a line
```Fortran
call e%set("solver",ELPA_SOLVER_1STAGE,error)
```
.
263

264
## VI) Using autotuning ##
265

266 267
Since API version 20171201 *ELPA* supports the autotuning of some "tunable" parameters (see Section II). The idea is that if *ELPA* is called multiple times (like typical in
self-consistent-iterations) some parameters can be tuned to an optimal value, which is hard to set for the user. Note, that not every parameter mentioned in Section II can actually be tuned with the autotuning. At the moment, only the parameters mentioned in the table below are affected by autotuning.
268

269
There are two ways, how the user can influence the autotuning steps:
270

271 272 273
1.) the user can set one of the following autotuning levels
- ELPA_AUTOTUNE_FAST
- ELPA_AUTOTUNE_MEDIUM
274

275 276
Each level defines a different set of tunable parameter. The autouning option will be extended by future releases of the *ELPA* library, at the moment the following
sets are supported: 
277

278 279 280 281 282 283
| AUTOTUNE LEVEL          | Parameters                                              |
| :---------------------- | :------------------------------------------------------ |
| ELPA_AUTOTUNE_FAST      | { solver, real_kernel, complex_kernel, omp_threads }    |
| ELPA_AUTOTUNE_MEDIUM    | all of abvoe + { gpu, partly gpu }                      |
| ELPA_AUTOTUNE_EXTENSIVE | all of above + { various blocking factors, stripewidth, |
|                         | intermediate_bandwidth }                                |
284

285 286
2.) the user can **remove** tunable parameters from the list of autotuning possibilites by explicetly setting this parameter,
e.g. if the user sets in his code 
287

288 289 290 291
```Fortran
call e%set("solver", ELPA_SOLVER_2STAGE, error)
```
**before** invoking the autotuning, then the solver is fixed and not considered anymore for autotuning. Thus the ELPA_SOLVER_1STAGE would be skipped and, consequently, all possible autotuning parameters, which depend on ELPA_SOLVER_1STAGE.
292

293
The user can invoke autotuning in the following way:
294 295


296
Fortran synopsis
297

298 299 300 301 302 303 304
```Fortran
 ! prepare elpa as you are used to (see Section I)
 ! only steps for autotuning are commentd
 use elpa
 class(elpa_t), pointer :: elpa
 class(elpa_autotune_t), pointer :: tune_state   ! create an autotuning pointer
 integer :: success
305

306 307 308 309
 if (elpa_init(20171201) /= ELPA_OK) then
    print *, "ELPA API version not supported"
    stop
  endif
Andreas Marek's avatar
Andreas Marek committed
310
  elpa => elpa_allocate(success)
311

312 313 314 315 316 317 318 319 320
  ! set parameters decribing the matrix and it's MPI distribution
  call elpa%set("na", na, success)
  call elpa%set("nev", nev, success))
  call elpa%set("local_nrows", na_rows, success)
  call elpa%set("local_ncols", na_cols, success)
  call elpa%set("nblk", nblk, success)
  call elpa%set("mpi_comm_parent", MPI_COMM_WORLD, success)
  call elpa%set("process_row", my_prow, success)
  call elpa%set("process_col", my_pcol, success)
321

Andreas Marek's avatar
Andreas Marek committed
322
  success = elpa%setup()
323

Andreas Marek's avatar
Andreas Marek committed
324
  tune_state => e%autotune_setup(ELPA_AUTOTUNE_MEDIUM, ELPA_AUTOTUNE_DOMAIN_REAL, success)   ! prepare autotuning, set AUTOTUNE_LEVEL and the domain (real or complex)
325

326 327
  ! do the loop of subsequent ELPA calls which will be used to do the autotuning
  do i=1, scf_cycles
Andreas Marek's avatar
Andreas Marek committed
328
    unfinished = e%autotune_step(tune_state, success)   ! check whether autotuning is finished; If not do next step
329

330 331 332
    if (.not.(unfinished)) then
      print *,"autotuning finished at step ",i
    endif
333

Andreas Marek's avatar
Andreas Marek committed
334
    call e%eigenvectors(a, ev, z, success)       ! do the normal computation
335

336
  enddo
337

Andreas Marek's avatar
Andreas Marek committed
338
  call e%autotune_set_best(tune_state, success)         ! from now use the values found by autotuning
339

340 341
  call elpa_autotune_deallocate(tune_state)    ! cleanup autotuning object 
```
342

343 344 345 346
C Synopsis
```C
   /* prepare ELPA the usual way; only steps for autotuning are commented */
   #include <elpa/elpa.h>
347

348 349 350
   elpa_t handle;
   elpa_autotune_t autotune_handle;                               // handle for autotuning
   int error;
351

352 353 354 355
   if (elpa_init(20171201) != ELPA_OK) { 
     fprintf(stderr, "Error: ELPA API version not supported");
     exit(1);
   }
356

357
   handle = elpa_allocate(&error);
358

359 360 361 362 363 364 365 366 367 368 369
   /* Set parameters the matrix and it's MPI distribution */
   elpa_set(handle, "na", na, &error);
   elpa_set(handle, "nev", nev, &error);
   elpa_set(handle, "local_nrows", na_rows, &error);
   elpa_set(handle, "local_ncols", na_cols, &error);
   elpa_set(handle, "nblk", nblk, &error);
   elpa_set(handle, "mpi_comm_parent", MPI_Comm_c2f(MPI_COMM_WORLD), &error);
   elpa_set(handle, "process_row", my_prow, &error);
   elpa_set(handle, "process_col", my_pcol, &error);
   /* Setup */
   elpa_setup(handle);
370

371
   autotune_handle = elpa_autotune_setup(handle, ELPA_AUTOTUNE_FAST, ELPA_AUTOTUNE_DOMAIN_REAL, &error);   // create autotune object
372

373 374
   // repeatedl call ELPA, e.g. in an scf iteration
   for (i=0; i < scf_cycles; i++) {
375

Andreas Marek's avatar
Andreas Marek committed
376
     unfinished = elpa_autotune_step(handle, autotune_handle, &error);      // check whether autotuning finished. If not do next step
377

378 379 380
     if (unfinished == 0) {
       printf("ELPA autotuning finished in the %d th scf step \n",i);
      }
381 382


383 384 385
      /* do the normal computation */
      elpa_eigenvectors(handle, a, ev, z, &error);
   }
Andreas Marek's avatar
Andreas Marek committed
386
   elpa_autotune_set_best(handle, autotune_handle &error);  // from now on use values used by autotuning
387 388 389
   elpa_autotune_deallocate(autotune_handle);        // cleanup autotuning
   
```
390

391
  
392 393 394