USERS_GUIDE.md 16.4 KB
Newer Older
Andreas Marek's avatar
Andreas Marek committed
1
## Users guide for the *ELPA* library ##
Andreas Marek's avatar
Andreas Marek committed
2

3
4
5
6
7
This document provides the guide for using the *ELPA* library with the new API (API version 20170403 or higher).
If you want to use the deprecated legacy API (we strongly recommend against this), please refer to the document
[USERS_GUIDE_DEPRECATED_LEGACY_API.md] (USERS_GUIDE_DEPRECATED_LEGACY_API.md).

If you need instructions on how to build *ELPA*, please look at [INSTALL.md] (INSTALL.md).
Andreas Marek's avatar
Andreas Marek committed
8
9
10
11
12

### Online and local documentation ###

Local documentation (via man pages) should be available (if *ELPA* has been installed with the documentation):

13
For example "man elpa2_print_kernels" should provide the documentation for the *ELPA* program, which prints all
Andreas Marek's avatar
Andreas Marek committed
14
the available kernels.
Andreas Marek's avatar
Andreas Marek committed
15

16
Also a [online doxygen documentation] (http://elpa.mpcdf.mpg.de/html/Documentation/ELPA-2018.11.001.rc1/html/index.html)
Andreas Marek's avatar
Andreas Marek committed
17
18
for each *ELPA* release is available.

Andreas Marek's avatar
Andreas Marek committed
19

20
### API of the *ELPA* library ###
Andreas Marek's avatar
Andreas Marek committed
21

22
23
With release 2017.05.001 of the *ELPA* library the interface has been rewritten substantially, in order to have a more generic 
interface and to avoid future interface changes.
Andreas Marek's avatar
Andreas Marek committed
24
25

For compatibility reasons the interface defined in the previous release 2016.11.001 is also still available
26
**IF AND ONLY IF** *ELPA* has been build with support of this legacy interface.
Andreas Marek's avatar
Andreas Marek committed
27

28
The legacy API defines all the functionality as it has been defined in *ELPA* release 2016.11.011. Note, however,
29
that all future features of *ELPA* will only be accessible via the new API defined in release 2017.05.001 or later.
Andreas Marek's avatar
Andreas Marek committed
30

31
32
33
34
35
36
37
38
39
40
41
42
43
As mentioned, we advise against it, but if you want to use the legacy API please look at the document 
[USERS_GUIDE_DEPRECATED_LEGACY_API.md] (USERS_GUIDE_DEPRECATED_LEGACY_API.md).

### Table of Contents: ###

- I)   General concept of the *ELPA* API
- II)  List of supported tunable parameters
- III) List of computational routines
- IV)  Using OpenMP threading
- V)   Influencing default values with environment variables
- VI)   Autotuning

## I) General concept of the *ELPA* API ##
Andreas Marek's avatar
Andreas Marek committed
44

45
Using *ELPA* just requires a few steps:
Andreas Marek's avatar
Andreas Marek committed
46

Andreas Marek's avatar
Andreas Marek committed
47
48
- include elpa headers "elpa/elpa.h" (C-Case) or use the Fortran module "use elpa"

Andreas Marek's avatar
Andreas Marek committed
49
- define a instance of the elpa type
Andreas Marek's avatar
Andreas Marek committed
50

Andreas Marek's avatar
Andreas Marek committed
51
- call elpa_init
Andreas Marek's avatar
Andreas Marek committed
52

Andreas Marek's avatar
Andreas Marek committed
53
- call elpa_allocate to allocate an instance of *ELPA*
Andreas Marek's avatar
Andreas Marek committed
54
55
56
57
58
59
60
61
62
  note that you can define (and configure individually) as many different instances
  for ELPA as you want, e.g. one for CPU only computations and for larger matrices on GPUs

- use ELPA-type function "set" to set matrix and MPI parameters

- call the ELPA-type function "setup"

- set or get all possible ELPA tunable options with ELPA-type functions get/set

Andreas Marek's avatar
Andreas Marek committed
63
64
- call ELPA-type function solve or others

Andreas Marek's avatar
Andreas Marek committed
65
- if the ELPA object is not needed any more call ELPA-type function destroy
Andreas Marek's avatar
Andreas Marek committed
66

Andreas Marek's avatar
Andreas Marek committed
67
- call elpa_uninit at the end of the program
Andreas Marek's avatar
Andreas Marek committed
68

69
70
71
72
73
74
75
76
To be more precise a basic call sequence for Fortran and C looks as follows:

Fortran synopsis

```Fortran
 use elpa
 class(elpa_t), pointer :: elpa
 integer :: success
Andreas Marek's avatar
Andreas Marek committed
77

78
79
80
81
 if (elpa_init(20171201) /= ELPA_OK) then        ! put here the API version that you are using
    print *, "ELPA API version not supported"
    stop
  endif
Andreas Marek's avatar
Andreas Marek committed
82
83
84
85
86
87
  elpa => elpa_allocate(success)
  if (success != ELPA_OK) then
    ! react on the error
    ! we urge every user to always check the error codes
    ! of all ELPA functions
  endif
88
89
90
91
92
93
94
95
96
97

  ! set parameters decribing the matrix and it's MPI distribution
  call elpa%set("na", na, success)                          ! size of the na x na matrix
  call elpa%set("nev", nev, success)                        ! number of eigenvectors that should be computed ( 1<= nev <= na)
  call elpa%set("local_nrows", na_rows, success)            ! number of local rows of the distributed matrix on this MPI task 
  call elpa%set("local_ncols", na_cols, success)            ! number of local columns of the distributed matrix on this MPI task
  call elpa%set("nblk", nblk, success)                      ! size of the BLACS block cyclic distribution
  call elpa%set("mpi_comm_parent", MPI_COMM_WORLD, success) ! the global MPI communicator
  call elpa%set("process_row", my_prow, success)            ! row coordinate of MPI process
  call elpa%set("process_col", my_pcol, success)            ! column coordinate of MPI process
Andreas Marek's avatar
Andreas Marek committed
98

Andreas Marek's avatar
Andreas Marek committed
99
  success = elpa%setup()
100

101
102
103
104
  ! if desired, set any number of tunable run-time options
  ! look at the list of possible options as detailed later in
  ! USERS_GUIDE.md
  call e%set("solver", ELPA_SOLVER_2STAGE, success)
Andreas Marek's avatar
Andreas Marek committed
105

106
107
108
109
  ! set the AVX BLOCK2 kernel, otherwise ELPA_2STAGE_REAL_DEFAULT will
  ! be used
  call e%set("real_kernel", ELPA_2STAGE_REAL_AVX_BLOCK2, success)

110
111
112
113
  ! use method solve to solve the eigenvalue problem to obtain eigenvalues
  ! and eigenvectors
  ! other possible methods are desribed in USERS_GUIDE.md
  call e%eigenvectors(a, ev, z, success)
114

115
116
  ! cleanup
  call elpa_deallocate(e)
117

118
119
  call elpa_uninit()
```
Andreas Marek's avatar
Andreas Marek committed
120

121
122
123
C Synopsis:
```C
   #include <elpa/elpa.h>
124

125
126
   elpa_t handle;
   int error;
Andreas Marek's avatar
Andreas Marek committed
127

128
129
130
131
   if (elpa_init(20171201) != ELPA_OK) {                          // put here the API version that you are using
     fprintf(stderr, "Error: ELPA API version not supported");
     exit(1);
   }
Andreas Marek's avatar
Andreas Marek committed
132

133
   handle = elpa_allocate(&error);
Andreas Marek's avatar
Andreas Marek committed
134
135
136
137
138
   if (error != ELPA_OK) {
     /* react on the error code */
     /* we urge the user to always check the error codes of all ELPA functions */
   }

Andreas Marek's avatar
Andreas Marek committed
139

140
141
142
143
144
145
146
147
148
   /* Set parameters the matrix and it's MPI distribution */
   elpa_set(handle, "na", na, &error);                                           // size of the na x na matrix
   elpa_set(handle, "nev", nev, &error);                                         // number of eigenvectors that should be computed ( 1<= nev <= na)
   elpa_set(handle, "local_nrows", na_rows, &error);                             // number of local rows of the distributed matrix on this MPI task 
   elpa_set(handle, "local_ncols", na_cols, &error);                             // number of local columns of the distributed matrix on this MPI task
   elpa_set(handle, "nblk", nblk, &error);                                       // size of the BLACS block cyclic distribution
   elpa_set(handle, "mpi_comm_parent", MPI_Comm_c2f(MPI_COMM_WORLD), &error);    // the global MPI communicator
   elpa_set(handle, "process_row", my_prow, &error);                             // row coordinate of MPI process
   elpa_set(handle, "process_col", my_pcol, &error);                             // column coordinate of MPI process
Andreas Marek's avatar
Andreas Marek committed
149

150
   /* Setup */
Andreas Marek's avatar
Andreas Marek committed
151
   error = elpa_setup(handle);
Andreas Marek's avatar
Andreas Marek committed
152

153
154
155
   /* if desired, set any number of tunable run-time options */
   /* look at the list of possible options as detailed later in
      USERS_GUIDE.md */
Andreas Marek's avatar
Andreas Marek committed
156

157
   elpa_set(handle, "solver", ELPA_SOLVER_2STAGE, &error);
158
159
160
161
  
   // set the AVX BLOCK2 kernel, otherwise ELPA_2STAGE_REAL_DEFAULT will
   // be used
   elpa_set(handle, "real_kernel", ELPA_2STAGE_REAL_AVX_BLOCK2, &error)
Andreas Marek's avatar
Andreas Marek committed
162

163
164
165
   /* use method solve to solve the eigenvalue problem */
   /* other possible methods are desribed in USERS_GUIDE.md */
   elpa_eigenvectors(handle, a, ev, z, &error);
Andreas Marek's avatar
Andreas Marek committed
166

167
168
169
170
   /* cleanup */
   elpa_deallocate(handle);
   elpa_uninit();
```
Andreas Marek's avatar
Andreas Marek committed
171

172
## II) List of supported tunable parameters ##
Andreas Marek's avatar
Andreas Marek committed
173

174
The following table gives a list of all supported parameters which can be used to tune (influence) the runtime behaviour of *ELPA* ([see here if you cannot read it in your editor] (https://gitlab.mpcdf.mpg.de/elpa/elpa/wikis/USERS_GUIDE))
Andreas Marek's avatar
Andreas Marek committed
175

176
177
178
179
180
181
182
183
184
185
186
| Parameter name | Short description     | default value               | possible values         | since API version | 
| :------------- |:--------------------- | :-------------------------- | :---------------------- | :---------------- | 
| solver         | use ELPA 1 stage <br>  or 2 stage solver | ELPA_SOLVER_1STAGE          | ELPA_SOLVER_1STAGE <br> ELPA_SOLVER_2STAGE      | 20170403          |
| gpu            | use GPU (if build <br> with GPU support)| 0                           | 0 or 1             | 20170403          | 
| real_kernel    | real kernel to be <br> used in ELPA 2 | ELPA_2STAGE_REAL_DEFAULT    | see output of <br> elpa2_print_kernels    | 20170403          |
| complex kernel | complex kernel to <br>  be used in ELPA 2 | ELPA_2STAGE_COMPLEX_DEFAULT | see output of <br>  elpa2_print_kernels     | 20170403          |
| omp_threads    | OpenMP threads used <br> (if build with OpenMP <br> support) | 1 | >1 | 20180525 |
| qr | Use QR decomposition in <br> ELPA 2 real | 0 | 0 or 1 |  20170403  |
| timings | Enable time <br> measurement | 1 | 0 or 1 |  20170403  |
| debug | give debug information | 0 | 0 or 1 | 20170403  |
       
187

188
## III) List of computational routines ##
189

190
The following compute routines are available in *ELPA*: Please have a look at the man pages or  [online doxygen documentation] (http://elpa.mpcdf.mpg.de/html/Documentation/ELPA-2018.11.001.rc1/html/index.html) for details.
191
192


193
194
195
196
197
198
199
200
201
| Name         | Purpose                                                                 | since API version |
| :----------- | :---------------------------------------------------------------------- | :---------------- |
| eigenvectors | solve std. eigenvalue problem <br> compute eigenvalues and eigenvectors | 20170403  |
| eigenvalues  | solve std. eigenvalue problem <br> compute eigenvalues only             | 20170403  |
| generalized_eigenvectors | solve generalized eigenvalule problem <br> compute eigenvalues and eigenvectors | 20180525 |
| generalized_eigenvalues  | solve generalized eigenvalule problem <br> compute eigenvalues only             | 20180525 |
| hermitian_multiply       | do (real) a^T x b <br> (complex) a^H x b                                        | 20170403 |
| cholesky                 | do cholesky factorisation                                                       | 20170403 |
| invert_triangular        | invert a upper triangular matrix                                                | 20170403 |
202
| solve_tridiagonal        | solve EVP for a tridiagonal matrix                                              | 20170403 |
203
204


205
## IV) Using OpenMP threading ##
206

207
208
209
If *ELPA* has been build with OpenMP threading support you can specify the number of OpenMP threads that *ELPA* will use internally.
Please note that it is **mandatory**  to set the number of threads to be used with the OMP_NUM_THREADS environment variable **and**
with the **set method** 
210

211
212
213
```Fortran
call e%set("omp_threads", 4, error)
```
214

215
**or the *ELPA* environment variable**
216

217
export ELPA_DEFAULT_omp_threads=4 (see Section V for an explanation of this variable).
218

219
Just setting the environment variable OMP_NUM_THREADS is **not** sufficient.
220

221
This is necessary to make the threading an autotunable option.
222

223
## V) Influencing default values with environment variables ##
224

225
226
For each tunable parameter mentioned in Section II, there exists a default value. This means, that if this parameter is **not explicitly** set by the user by the
*ELPA* set method, *ELPA* takes the default value for the parameter. E.g. if the user does not set a solver method, than *ELPA* will take the default "ELPA_SOLVER_1STAGE".
227

228
The user can change this default value by setting an enviroment variable to the desired value.
229

230
231
232
233
The name of this variable is always constructed in the following way:
```
ELPA_DEFAULT_tunable_parameter_name=value
```
234

235
, e.g. in case of the solver the user can
236

237
238
239
```
export ELPA_DEFAULT_solver=ELPA_SOLVER_2STAGE
```
240

241
in order to define the 2stage solver as the default.
242

243
244
245
246
247
248
249
**Important note**
The default valule is completly ignored, if the user has manually set a parameter-value pair with the *ELPA* set method!
Thus the above environemnt variable will **not** have an effect, if the user code contains a line
```Fortran
call e%set("solver",ELPA_SOLVER_1STAGE,error)
```
.
250

251
## VI) Using autotuning ##
252

253
254
Since API version 20171201 *ELPA* supports the autotuning of some "tunable" parameters (see Section II). The idea is that if *ELPA* is called multiple times (like typical in
self-consistent-iterations) some parameters can be tuned to an optimal value, which is hard to set for the user. Note, that not every parameter mentioned in Section II can actually be tuned with the autotuning. At the moment, only the parameters mentioned in the table below are affected by autotuning.
255

256
There are two ways, how the user can influence the autotuning steps:
257

258
259
260
1.) the user can set one of the following autotuning levels
- ELPA_AUTOTUNE_FAST
- ELPA_AUTOTUNE_MEDIUM
261

262
263
Each level defines a different set of tunable parameter. The autouning option will be extended by future releases of the *ELPA* library, at the moment the following
sets are supported: 
264

265
266
267
268
269
270
| AUTOTUNE LEVEL          | Parameters                                              |
| :---------------------- | :------------------------------------------------------ |
| ELPA_AUTOTUNE_FAST      | { solver, real_kernel, complex_kernel, omp_threads }    |
| ELPA_AUTOTUNE_MEDIUM    | all of abvoe + { gpu, partly gpu }                      |
| ELPA_AUTOTUNE_EXTENSIVE | all of above + { various blocking factors, stripewidth, |
|                         | intermediate_bandwidth }                                |
271

272
273
2.) the user can **remove** tunable parameters from the list of autotuning possibilites by explicetly setting this parameter,
e.g. if the user sets in his code 
274

275
276
277
278
```Fortran
call e%set("solver", ELPA_SOLVER_2STAGE, error)
```
**before** invoking the autotuning, then the solver is fixed and not considered anymore for autotuning. Thus the ELPA_SOLVER_1STAGE would be skipped and, consequently, all possible autotuning parameters, which depend on ELPA_SOLVER_1STAGE.
279

280
The user can invoke autotuning in the following way:
281
282


283
Fortran synopsis
284

285
286
287
288
289
290
291
```Fortran
 ! prepare elpa as you are used to (see Section I)
 ! only steps for autotuning are commentd
 use elpa
 class(elpa_t), pointer :: elpa
 class(elpa_autotune_t), pointer :: tune_state   ! create an autotuning pointer
 integer :: success
292

293
294
295
296
 if (elpa_init(20171201) /= ELPA_OK) then
    print *, "ELPA API version not supported"
    stop
  endif
Andreas Marek's avatar
Andreas Marek committed
297
  elpa => elpa_allocate(success)
298

299
300
301
302
303
304
305
306
307
  ! set parameters decribing the matrix and it's MPI distribution
  call elpa%set("na", na, success)
  call elpa%set("nev", nev, success))
  call elpa%set("local_nrows", na_rows, success)
  call elpa%set("local_ncols", na_cols, success)
  call elpa%set("nblk", nblk, success)
  call elpa%set("mpi_comm_parent", MPI_COMM_WORLD, success)
  call elpa%set("process_row", my_prow, success)
  call elpa%set("process_col", my_pcol, success)
308

Andreas Marek's avatar
Andreas Marek committed
309
  success = elpa%setup()
310

Andreas Marek's avatar
Andreas Marek committed
311
  tune_state => e%autotune_setup(ELPA_AUTOTUNE_MEDIUM, ELPA_AUTOTUNE_DOMAIN_REAL, success)   ! prepare autotuning, set AUTOTUNE_LEVEL and the domain (real or complex)
312

313
314
  ! do the loop of subsequent ELPA calls which will be used to do the autotuning
  do i=1, scf_cycles
Andreas Marek's avatar
Andreas Marek committed
315
    unfinished = e%autotune_step(tune_state, success)   ! check whether autotuning is finished; If not do next step
316

317
318
319
    if (.not.(unfinished)) then
      print *,"autotuning finished at step ",i
    endif
320

Andreas Marek's avatar
Andreas Marek committed
321
    call e%eigenvectors(a, ev, z, success)       ! do the normal computation
322

323
  enddo
324

Andreas Marek's avatar
Andreas Marek committed
325
  call e%autotune_set_best(tune_state, success)         ! from now use the values found by autotuning
326

327
328
  call elpa_autotune_deallocate(tune_state)    ! cleanup autotuning object 
```
329

330
331
332
333
C Synopsis
```C
   /* prepare ELPA the usual way; only steps for autotuning are commented */
   #include <elpa/elpa.h>
334

335
336
337
   elpa_t handle;
   elpa_autotune_t autotune_handle;                               // handle for autotuning
   int error;
338

339
340
341
342
   if (elpa_init(20171201) != ELPA_OK) { 
     fprintf(stderr, "Error: ELPA API version not supported");
     exit(1);
   }
343

344
   handle = elpa_allocate(&error);
345

346
347
348
349
350
351
352
353
354
355
356
   /* Set parameters the matrix and it's MPI distribution */
   elpa_set(handle, "na", na, &error);
   elpa_set(handle, "nev", nev, &error);
   elpa_set(handle, "local_nrows", na_rows, &error);
   elpa_set(handle, "local_ncols", na_cols, &error);
   elpa_set(handle, "nblk", nblk, &error);
   elpa_set(handle, "mpi_comm_parent", MPI_Comm_c2f(MPI_COMM_WORLD), &error);
   elpa_set(handle, "process_row", my_prow, &error);
   elpa_set(handle, "process_col", my_pcol, &error);
   /* Setup */
   elpa_setup(handle);
357

358
   autotune_handle = elpa_autotune_setup(handle, ELPA_AUTOTUNE_FAST, ELPA_AUTOTUNE_DOMAIN_REAL, &error);   // create autotune object
359

360
361
   // repeatedl call ELPA, e.g. in an scf iteration
   for (i=0; i < scf_cycles; i++) {
362

Andreas Marek's avatar
Andreas Marek committed
363
     unfinished = elpa_autotune_step(handle, autotune_handle, &error);      // check whether autotuning finished. If not do next step
364

365
366
367
     if (unfinished == 0) {
       printf("ELPA autotuning finished in the %d th scf step \n",i);
      }
368
369


370
371
372
      /* do the normal computation */
      elpa_eigenvectors(handle, a, ev, z, &error);
   }
Andreas Marek's avatar
Andreas Marek committed
373
   elpa_autotune_set_best(handle, autotune_handle &error);  // from now on use values used by autotuning
374
375
376
   elpa_autotune_deallocate(autotune_handle);        // cleanup autotuning
   
```
377

378
  
379
380
381