USERS_GUIDE.md 20.8 KB
Newer Older
Andreas Marek's avatar
Andreas Marek committed
1
## Users guide for the *ELPA* library ##
Andreas Marek's avatar
Andreas Marek committed
2

3
This document provides the guide for using the *ELPA* library with the new API (API version 20170403 or higher).
Andreas Marek's avatar
Andreas Marek committed
4
5
If you want to use the deprecated legacy API (we strongly recommend against this), please use the ELPA release
2019.05.002 or older.
6
7

If you need instructions on how to build *ELPA*, please look at [INSTALL.md] (INSTALL.md).
Andreas Marek's avatar
Andreas Marek committed
8
9
10
11
12

### Online and local documentation ###

Local documentation (via man pages) should be available (if *ELPA* has been installed with the documentation):

Andreas Marek's avatar
Andreas Marek committed
13
For example `man elpa2_print_kernels` should provide the documentation for the *ELPA* program, which prints all
Andreas Marek's avatar
Andreas Marek committed
14
the available kernels.
Andreas Marek's avatar
Andreas Marek committed
15

16
Also a [online doxygen documentation](http://elpa.mpcdf.mpg.de/html/Documentation/ELPA-2021.05.001.rc1/html/index.html)
Andreas Marek's avatar
Andreas Marek committed
17
18
for each *ELPA* release is available.

Andreas Marek's avatar
Andreas Marek committed
19

20
### API of the *ELPA* library ###
Andreas Marek's avatar
Andreas Marek committed
21

22
23
With release 2017.05.001 of the *ELPA* library the interface has been rewritten substantially, in order to have a more generic 
interface and to avoid future interface changes.
Andreas Marek's avatar
Andreas Marek committed
24

25

26
27
28
29
30
31
32
### Table of Contents: ###

- I)   General concept of the *ELPA* API
- II)  List of supported tunable parameters
- III) List of computational routines
- IV)  Using OpenMP threading
- V)   Influencing default values with environment variables
33
34
- VI)  Autotuning
- VII) A simple example how to use ELPA in an MPI application
35
36

## I) General concept of the *ELPA* API ##
Andreas Marek's avatar
Andreas Marek committed
37

38
Using *ELPA* just requires a few steps:
Andreas Marek's avatar
Andreas Marek committed
39

Andreas Marek's avatar
Andreas Marek committed
40
- include elpa headers `elpa/elpa.h` (C-Case) or use the Fortran module `use elpa`
Andreas Marek's avatar
Andreas Marek committed
41

Andreas Marek's avatar
Andreas Marek committed
42
- define a instance of the elpa type
Andreas Marek's avatar
Andreas Marek committed
43

Andreas Marek's avatar
Andreas Marek committed
44
- call elpa_init
Andreas Marek's avatar
Andreas Marek committed
45

Andreas Marek's avatar
Andreas Marek committed
46
- call elpa_allocate to allocate an instance of *ELPA*
Andreas Marek's avatar
Andreas Marek committed
47
48
49
50
51
52
53
54
55
  note that you can define (and configure individually) as many different instances
  for ELPA as you want, e.g. one for CPU only computations and for larger matrices on GPUs

- use ELPA-type function "set" to set matrix and MPI parameters

- call the ELPA-type function "setup"

- set or get all possible ELPA tunable options with ELPA-type functions get/set

Andreas Marek's avatar
Andreas Marek committed
56
57
- call ELPA-type function solve or others

Andreas Marek's avatar
Andreas Marek committed
58
- if the ELPA object is not needed any more call ELPA-type function destroy
Andreas Marek's avatar
Andreas Marek committed
59

Andreas Marek's avatar
Andreas Marek committed
60
- call elpa_uninit at the end of the program
Andreas Marek's avatar
Andreas Marek committed
61

62
63
64
65
To be more precise a basic call sequence for Fortran and C looks as follows:

Fortran synopsis

Andreas Marek's avatar
Andreas Marek committed
66
```fortran
67
68
69
 use elpa
 class(elpa_t), pointer :: elpa
 integer :: success
Andreas Marek's avatar
Andreas Marek committed
70

71
72
73
74
 if (elpa_init(20171201) /= ELPA_OK) then        ! put here the API version that you are using
    print *, "ELPA API version not supported"
    stop
  endif
Andreas Marek's avatar
Andreas Marek committed
75
76
77
78
79
80
  elpa => elpa_allocate(success)
  if (success != ELPA_OK) then
    ! react on the error
    ! we urge every user to always check the error codes
    ! of all ELPA functions
  endif
81
82
83
84
85
86
87
88
89
90

  ! set parameters decribing the matrix and it's MPI distribution
  call elpa%set("na", na, success)                          ! size of the na x na matrix
  call elpa%set("nev", nev, success)                        ! number of eigenvectors that should be computed ( 1<= nev <= na)
  call elpa%set("local_nrows", na_rows, success)            ! number of local rows of the distributed matrix on this MPI task 
  call elpa%set("local_ncols", na_cols, success)            ! number of local columns of the distributed matrix on this MPI task
  call elpa%set("nblk", nblk, success)                      ! size of the BLACS block cyclic distribution
  call elpa%set("mpi_comm_parent", MPI_COMM_WORLD, success) ! the global MPI communicator
  call elpa%set("process_row", my_prow, success)            ! row coordinate of MPI process
  call elpa%set("process_col", my_pcol, success)            ! column coordinate of MPI process
Andreas Marek's avatar
Andreas Marek committed
91

Andreas Marek's avatar
Andreas Marek committed
92
  success = elpa%setup()
93

94
95
96
97
  ! if desired, set any number of tunable run-time options
  ! look at the list of possible options as detailed later in
  ! USERS_GUIDE.md
  call e%set("solver", ELPA_SOLVER_2STAGE, success)
Andreas Marek's avatar
Andreas Marek committed
98

99
100
101
102
  ! set the AVX BLOCK2 kernel, otherwise ELPA_2STAGE_REAL_DEFAULT will
  ! be used
  call e%set("real_kernel", ELPA_2STAGE_REAL_AVX_BLOCK2, success)

103
104
105
106
  ! use method solve to solve the eigenvalue problem to obtain eigenvalues
  ! and eigenvectors
  ! other possible methods are desribed in USERS_GUIDE.md
  call e%eigenvectors(a, ev, z, success)
107

108
109
  ! cleanup
  call elpa_deallocate(e)
110

111
112
  call elpa_uninit()
```
Andreas Marek's avatar
Andreas Marek committed
113

114
C Synopsis:
Andreas Marek's avatar
Andreas Marek committed
115
```x
116
   #include <elpa/elpa.h>
117

118
119
   elpa_t handle;
   int error;
Andreas Marek's avatar
Andreas Marek committed
120

121
122
123
124
   if (elpa_init(20171201) != ELPA_OK) {                          // put here the API version that you are using
     fprintf(stderr, "Error: ELPA API version not supported");
     exit(1);
   }
Andreas Marek's avatar
Andreas Marek committed
125

126
   handle = elpa_allocate(&error);
Andreas Marek's avatar
Andreas Marek committed
127
128
129
130
131
   if (error != ELPA_OK) {
     /* react on the error code */
     /* we urge the user to always check the error codes of all ELPA functions */
   }

Andreas Marek's avatar
Andreas Marek committed
132

133
134
135
136
137
138
139
140
141
   /* Set parameters the matrix and it's MPI distribution */
   elpa_set(handle, "na", na, &error);                                           // size of the na x na matrix
   elpa_set(handle, "nev", nev, &error);                                         // number of eigenvectors that should be computed ( 1<= nev <= na)
   elpa_set(handle, "local_nrows", na_rows, &error);                             // number of local rows of the distributed matrix on this MPI task 
   elpa_set(handle, "local_ncols", na_cols, &error);                             // number of local columns of the distributed matrix on this MPI task
   elpa_set(handle, "nblk", nblk, &error);                                       // size of the BLACS block cyclic distribution
   elpa_set(handle, "mpi_comm_parent", MPI_Comm_c2f(MPI_COMM_WORLD), &error);    // the global MPI communicator
   elpa_set(handle, "process_row", my_prow, &error);                             // row coordinate of MPI process
   elpa_set(handle, "process_col", my_pcol, &error);                             // column coordinate of MPI process
Andreas Marek's avatar
Andreas Marek committed
142

143
   /* Setup */
Andreas Marek's avatar
Andreas Marek committed
144
   error = elpa_setup(handle);
Andreas Marek's avatar
Andreas Marek committed
145

146
147
148
   /* if desired, set any number of tunable run-time options */
   /* look at the list of possible options as detailed later in
      USERS_GUIDE.md */
Andreas Marek's avatar
Andreas Marek committed
149

150
   elpa_set(handle, "solver", ELPA_SOLVER_2STAGE, &error);
151
152
153
154
  
   // set the AVX BLOCK2 kernel, otherwise ELPA_2STAGE_REAL_DEFAULT will
   // be used
   elpa_set(handle, "real_kernel", ELPA_2STAGE_REAL_AVX_BLOCK2, &error)
Andreas Marek's avatar
Andreas Marek committed
155

156
157
158
   /* use method solve to solve the eigenvalue problem */
   /* other possible methods are desribed in USERS_GUIDE.md */
   elpa_eigenvectors(handle, a, ev, z, &error);
Andreas Marek's avatar
Andreas Marek committed
159

160
161
162
163
   /* cleanup */
   elpa_deallocate(handle);
   elpa_uninit();
```
Andreas Marek's avatar
Andreas Marek committed
164

165
## II) List of supported tunable parameters ##
Andreas Marek's avatar
Andreas Marek committed
166

Andreas Marek's avatar
Andreas Marek committed
167
The following table gives a list of all supported parameters which can be used to tune (influence) the runtime behaviour of *ELPA* ([see here if you cannot read it in your editor](https://gitlab.mpcdf.mpg.de/elpa/elpa/wikis/USERS_GUIDE))
Andreas Marek's avatar
Andreas Marek committed
168

169
170
171
172
173
174
175
176
177
178
179
| Parameter name | Short description     | default value               | possible values         | since API version | 
| :------------- |:--------------------- | :-------------------------- | :---------------------- | :---------------- | 
| solver         | use ELPA 1 stage <br>  or 2 stage solver | ELPA_SOLVER_1STAGE          | ELPA_SOLVER_1STAGE <br> ELPA_SOLVER_2STAGE      | 20170403          |
| gpu            | use GPU (if build <br> with GPU support)| 0                           | 0 or 1             | 20170403          | 
| real_kernel    | real kernel to be <br> used in ELPA 2 | ELPA_2STAGE_REAL_DEFAULT    | see output of <br> elpa2_print_kernels    | 20170403          |
| complex kernel | complex kernel to <br>  be used in ELPA 2 | ELPA_2STAGE_COMPLEX_DEFAULT | see output of <br>  elpa2_print_kernels     | 20170403          |
| omp_threads    | OpenMP threads used <br> (if build with OpenMP <br> support) | 1 | >1 | 20180525 |
| qr | Use QR decomposition in <br> ELPA 2 real | 0 | 0 or 1 |  20170403  |
| timings | Enable time <br> measurement | 1 | 0 or 1 |  20170403  |
| debug | give debug information | 0 | 0 or 1 | 20170403  |
       
180

181
## III) List of computational routines ##
182

183
The following compute routines are available in *ELPA*: Please have a look at the man pages or  [online doxygen documentation] (http://elpa.mpcdf.mpg.de/html/Documentation/ELPA-2021.05.001.rc1/html/index.html) for details.
184
185


186
187
188
189
190
191
192
193
194
| Name         | Purpose                                                                 | since API version |
| :----------- | :---------------------------------------------------------------------- | :---------------- |
| eigenvectors | solve std. eigenvalue problem <br> compute eigenvalues and eigenvectors | 20170403  |
| eigenvalues  | solve std. eigenvalue problem <br> compute eigenvalues only             | 20170403  |
| generalized_eigenvectors | solve generalized eigenvalule problem <br> compute eigenvalues and eigenvectors | 20180525 |
| generalized_eigenvalues  | solve generalized eigenvalule problem <br> compute eigenvalues only             | 20180525 |
| hermitian_multiply       | do (real) a^T x b <br> (complex) a^H x b                                        | 20170403 |
| cholesky                 | do cholesky factorisation                                                       | 20170403 |
| invert_triangular        | invert a upper triangular matrix                                                | 20170403 |
195
| solve_tridiagonal        | solve EVP for a tridiagonal matrix                                              | 20170403 |
196
197


198
## IV) Using OpenMP threading ##
199

200
201
202
If *ELPA* has been build with OpenMP threading support you can specify the number of OpenMP threads that *ELPA* will use internally.
Please note that it is **mandatory**  to set the number of threads to be used with the OMP_NUM_THREADS environment variable **and**
with the **set method** 
203

Andreas Marek's avatar
Andreas Marek committed
204
```fortran
205
206
call e%set("omp_threads", 4, error)
```
207

208
**or the *ELPA* environment variable**
209

Andreas Marek's avatar
Andreas Marek committed
210
```
211
export ELPA_DEFAULT_omp_threads=4 (see Section V for an explanation of this variable).
Andreas Marek's avatar
Andreas Marek committed
212
```
213

214
Just setting the environment variable OMP_NUM_THREADS is **not** sufficient.
215

216
This is necessary to make the threading an autotunable option.
217

218
## V) Influencing default values with environment variables ##
219

220
For each tunable parameter mentioned in Section II, there exists a default value. This means, that if this parameter is **not explicitly** set by the user by the
Andreas Marek's avatar
Andreas Marek committed
221
*ELPA* set method, *ELPA* takes the default value for the parameter. E.g. if the user does not set a solver method, than *ELPA* will take the default 1`ELPA_SOLVER_1STAGE`.
222

223
The user can change this default value by setting an enviroment variable to the desired value.
224

225
226
227
228
The name of this variable is always constructed in the following way:
```
ELPA_DEFAULT_tunable_parameter_name=value
```
229

230
, e.g. in case of the solver the user can
231

232
233
234
```
export ELPA_DEFAULT_solver=ELPA_SOLVER_2STAGE
```
235

236
in order to define the 2stage solver as the default.
237

238
239
240
**Important note**
The default valule is completly ignored, if the user has manually set a parameter-value pair with the *ELPA* set method!
Thus the above environemnt variable will **not** have an effect, if the user code contains a line
Andreas Marek's avatar
Andreas Marek committed
241
```fortran
242
243
244
call e%set("solver",ELPA_SOLVER_1STAGE,error)
```
.
245

246
## VI) Using autotuning ##
247

248
249
Since API version 20171201 *ELPA* supports the autotuning of some "tunable" parameters (see Section II). The idea is that if *ELPA* is called multiple times (like typical in
self-consistent-iterations) some parameters can be tuned to an optimal value, which is hard to set for the user. Note, that not every parameter mentioned in Section II can actually be tuned with the autotuning. At the moment, only the parameters mentioned in the table below are affected by autotuning.
250

251
There are two ways, how the user can influence the autotuning steps:
252

253
254
255
1.) the user can set one of the following autotuning levels
- ELPA_AUTOTUNE_FAST
- ELPA_AUTOTUNE_MEDIUM
256

257
258
Each level defines a different set of tunable parameter. The autouning option will be extended by future releases of the *ELPA* library, at the moment the following
sets are supported: 
259

260
261
262
263
| AUTOTUNE LEVEL          | Parameters                                              |
| :---------------------- | :------------------------------------------------------ |
| ELPA_AUTOTUNE_FAST      | { solver, real_kernel, complex_kernel, omp_threads }    |
| ELPA_AUTOTUNE_MEDIUM    | all of abvoe + { gpu, partly gpu }                      |
Andreas Marek's avatar
Andreas Marek committed
264
| ELPA_AUTOTUNE_EXTENSIVE | all of above + { various blocking factors, stripewidth, intermediate_bandwidth } |
265

266
267
2.) the user can **remove** tunable parameters from the list of autotuning possibilites by explicetly setting this parameter,
e.g. if the user sets in his code 
268

Andreas Marek's avatar
Andreas Marek committed
269
```fortran
270
271
272
call e%set("solver", ELPA_SOLVER_2STAGE, error)
```
**before** invoking the autotuning, then the solver is fixed and not considered anymore for autotuning. Thus the ELPA_SOLVER_1STAGE would be skipped and, consequently, all possible autotuning parameters, which depend on ELPA_SOLVER_1STAGE.
273

274
The user can invoke autotuning in the following way:
275
276


277
Fortran synopsis
278

Andreas Marek's avatar
Andreas Marek committed
279
```fortran
280
281
282
283
284
285
 ! prepare elpa as you are used to (see Section I)
 ! only steps for autotuning are commentd
 use elpa
 class(elpa_t), pointer :: elpa
 class(elpa_autotune_t), pointer :: tune_state   ! create an autotuning pointer
 integer :: success
286

287
288
289
290
 if (elpa_init(20171201) /= ELPA_OK) then
    print *, "ELPA API version not supported"
    stop
  endif
Andreas Marek's avatar
Andreas Marek committed
291
  elpa => elpa_allocate(success)
292

293
294
295
296
297
298
299
300
301
  ! set parameters decribing the matrix and it's MPI distribution
  call elpa%set("na", na, success)
  call elpa%set("nev", nev, success))
  call elpa%set("local_nrows", na_rows, success)
  call elpa%set("local_ncols", na_cols, success)
  call elpa%set("nblk", nblk, success)
  call elpa%set("mpi_comm_parent", MPI_COMM_WORLD, success)
  call elpa%set("process_row", my_prow, success)
  call elpa%set("process_col", my_pcol, success)
302

Andreas Marek's avatar
Andreas Marek committed
303
  success = elpa%setup()
304

Andreas Marek's avatar
Andreas Marek committed
305
  tune_state => e%autotune_setup(ELPA_AUTOTUNE_MEDIUM, ELPA_AUTOTUNE_DOMAIN_REAL, success)   ! prepare autotuning, set AUTOTUNE_LEVEL and the domain (real or complex)
306

307
308
  ! do the loop of subsequent ELPA calls which will be used to do the autotuning
  do i=1, scf_cycles
Andreas Marek's avatar
Andreas Marek committed
309
    unfinished = e%autotune_step(tune_state, success)   ! check whether autotuning is finished; If not do next step
310

311
312
313
    if (.not.(unfinished)) then
      print *,"autotuning finished at step ",i
    endif
314

Andreas Marek's avatar
Andreas Marek committed
315
    call e%eigenvectors(a, ev, z, success)       ! do the normal computation
316

317
  enddo
318

Andreas Marek's avatar
Andreas Marek committed
319
  call e%autotune_set_best(tune_state, success)         ! from now use the values found by autotuning
320

321
322
  call elpa_autotune_deallocate(tune_state)    ! cleanup autotuning object 
```
323

324
C Synopsis
Andreas Marek's avatar
Andreas Marek committed
325
```c
326
327
   /* prepare ELPA the usual way; only steps for autotuning are commented */
   #include <elpa/elpa.h>
328

329
330
331
   elpa_t handle;
   elpa_autotune_t autotune_handle;                               // handle for autotuning
   int error;
332

333
334
335
336
   if (elpa_init(20171201) != ELPA_OK) { 
     fprintf(stderr, "Error: ELPA API version not supported");
     exit(1);
   }
337

338
   handle = elpa_allocate(&error);
339

340
341
342
343
344
345
346
347
348
349
350
   /* Set parameters the matrix and it's MPI distribution */
   elpa_set(handle, "na", na, &error);
   elpa_set(handle, "nev", nev, &error);
   elpa_set(handle, "local_nrows", na_rows, &error);
   elpa_set(handle, "local_ncols", na_cols, &error);
   elpa_set(handle, "nblk", nblk, &error);
   elpa_set(handle, "mpi_comm_parent", MPI_Comm_c2f(MPI_COMM_WORLD), &error);
   elpa_set(handle, "process_row", my_prow, &error);
   elpa_set(handle, "process_col", my_pcol, &error);
   /* Setup */
   elpa_setup(handle);
351

352
   autotune_handle = elpa_autotune_setup(handle, ELPA_AUTOTUNE_FAST, ELPA_AUTOTUNE_DOMAIN_REAL, &error);   // create autotune object
353

354
355
   // repeatedl call ELPA, e.g. in an scf iteration
   for (i=0; i < scf_cycles; i++) {
356

Andreas Marek's avatar
Andreas Marek committed
357
     unfinished = elpa_autotune_step(handle, autotune_handle, &error);      // check whether autotuning finished. If not do next step
358

359
360
361
     if (unfinished == 0) {
       printf("ELPA autotuning finished in the %d th scf step \n",i);
      }
362
363


364
365
366
      /* do the normal computation */
      elpa_eigenvectors(handle, a, ev, z, &error);
   }
Andreas Marek's avatar
Andreas Marek committed
367
   elpa_autotune_set_best(handle, autotune_handle &error);  // from now on use values used by autotuning
368
369
   elpa_autotune_deallocate(autotune_handle);        // cleanup autotuning
```
370
371


372
373
374
375
376
377
## VII) A simple example how to use ELPA in an MPI application ##

The following is a skeleton code of an basic example on how to use ELPA. The purpose is to show the steps that have 
to be done in the application using MPI which wants to call ELPA, namely

- Initializing the MPI
Andreas Marek's avatar
Andreas Marek committed
378
- creating a blacs distributed matrix
379
380
- IMPORTANT: it is very, very important that you check the return value of "descinit" of your blacs distribution!
  ELPA relies that the distribution it should work on is _valid_. If this is not the case the behavior is undefined!
381
382
383
384
- using this matrix within ELPA

The skeleton is not ment to be copied and pasted, since the details will always be dependent on the application which should 
call ELPA.
385

386
For simplicity only a Fortran example is shown
387

388

Andreas Marek's avatar
Andreas Marek committed
389
```fortran
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442

use mpi

implicit none

integer :: mpierr, myid, nprocs
integer :: np_cols, np_rows, npcol, nprow
integer :: my_blacs_ctxt, sc_desc(9), info
integer :: na = [some value] ! global dimension of the matrix to be solved
integer :: nblk = [some value ] ! the block size of the scalapack block cyclic distribution
real*8, allocatable :: a(:,:), ev(:)

!-------------------------------------------------------------------------------
!  MPI Initialization

call mpi_init(mpierr)
call mpi_comm_rank(mpi_comm_world,myid,mpierr)
call mpi_comm_size(mpi_comm_world,nprocs,mpierr)  

!-------------------------------------------------------------------------------
! Selection of number of processor rows/columns
! the application has to decide how the matrix should be distributed
np_cols = [ some value ]
np_rows = [ some value ]


!-------------------------------------------------------------------------------
! Set up BLACS context and MPI communicators
!
! The BLACS context is only necessary for using Scalapack.
!
! For ELPA, the MPI communicators along rows/cols are sufficient,
! and the grid setup may be done in an arbitrary way as long as it is
! consistent (i.e. 0<=my_prow<np_rows, 0<=my_pcol<np_cols and every
! process has a unique (my_prow,my_pcol) pair).
! For details look at the documentation of  BLACS_Gridinit and
! BLACS_Gridinfo of your BLACS installation

my_blacs_ctxt = mpi_comm_world
call BLACS_Gridinit( my_blacs_ctxt, 'C', np_rows, np_cols )
call BLACS_Gridinfo( my_blacs_ctxt, nprow, npcol, my_prow, my_pcol )

! compute for your distributed matrix the number of local rows and columns 
! per MPI task, e.g. with
! the Scalapack tools routine NUMROC 

! Set up a scalapack descriptor for the checks below.
! For ELPA the following restrictions hold:
! - block sizes in both directions must be identical (args 4+5)
! - first row and column of the distributed matrix must be on row/col 0/0 (args 6+7)

call descinit( sc_desc, na, na, nblk, nblk, 0, 0, my_blacs_ctxt, na_rows, info )

443
444
445
446
447
448
! check the return code
if (info .ne. 0) then
  print *,"Invalid blacs-distribution. Abort!"
  stop
endif

449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
! Allocate matrices 

allocate(a (na_rows,na_cols))
allocate(ev(na))

! fill the matrix with resonable values

a(i,j) = [ your problem to be solved]

! UP to this point this where all the prerequisites which have to be done in the
! application if you have a distributed eigenvalue problem to be solved, independent of
! whether you want to use ELPA, Scalapack, EigenEXA or alike

! Now you can start using ELPA

if (elpa_init(20171201) /= ELPA_OK) then        ! put here the API version that you are using
   print *, "ELPA API version not supported"
   stop
 endif
 elpa => elpa_allocate(success)
 if (success != ELPA_OK) then
   ! react on the error
   ! we urge every user to always check the error codes
   ! of all ELPA functions
 endif

 ! set parameters decribing the matrix and it's MPI distribution
 call elpa%set("na", na, success)                          ! size of the na x na matrix
 call elpa%set("nev", nev, success)                        ! number of eigenvectors that should be computed ( 1<= nev <= na)
 call elpa%set("local_nrows", na_rows, success)            ! number of local rows of the distributed matrix on this MPI task 
 call elpa%set("local_ncols", na_cols, success)            ! number of local columns of the distributed matrix on this MPI task
 call elpa%set("nblk", nblk, success)                      ! size of the BLACS block cyclic distribution
 call elpa%set("mpi_comm_parent", MPI_COMM_WORLD, success) ! the global MPI communicator
 call elpa%set("process_row", my_prow, success)            ! row coordinate of MPI process
 call elpa%set("process_col", my_pcol, success)            ! column coordinate of MPI process

 success = elpa%setup()

 ! if desired, set any number of tunable run-time options
 ! look at the list of possible options as detailed later in
 ! USERS_GUIDE.md
 call e%set("solver", ELPA_SOLVER_2STAGE, success)

 ! set the AVX BLOCK2 kernel, otherwise ELPA_2STAGE_REAL_DEFAULT will
 ! be used
 call e%set("real_kernel", ELPA_2STAGE_REAL_AVX_BLOCK2, success)

 ! use method solve to solve the eigenvalue problem to obtain eigenvalues
 ! and eigenvectors
 ! other possible methods are desribed in USERS_GUIDE.md
 call e%eigenvectors(a, ev, z, success)

 ! cleanup
 call elpa_deallocate(e)

 call elpa_uninit()
```