README 8.39 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Welcome to the git-based distribution of the ELPA eigensolver library.

If you are reading this file, you have obtained the ELPA library
through the git repository that hosts the source code and also allows
you to contribute improvements to the project if necessary.

In your use of ELPA, please respect the copyright restrictions
found below and in the "COPYING" directory in this repository. In a
nutshell, if you make improvements to ELPA, copyright for such
improvements remains with you, but we request that you relicense any
such improvements under the same exact terms of the (modified) LGPL v3
that we are using here. Please do not simply absorb ELPA into your own
project and then redistribute binary-only without making your exact
version of the ELPA source code (unmodified or MODIFIED) available as
well. 


*** Citing:

  A description of some algorithms present in ELPA can be found in:

  T. Auckenthaler, V. Blum, H.-J. Bungartz, T. Huckle, R. Johanni,
  L. Kr\"amer, B. Lang, H. Lederer, and P. R. Willems, 
  "Parallel solution of partial symmetric eigenvalue problems from
  electronic structure calculations", 
Volker Blum's avatar
Volker Blum committed
26
27
  Parallel Computing 37, 783-794 (2011).
  doi:10.1016/j.parco.2011.05.002. 
28
29
30
31
32
33
34
35
36
37

  Please cite this paper when using ELPA. We also intend to publish an
  overview description of the ELPA library as such, and ask you to
  make appropriate reference to that as well, once it appears.


*** Copyright: 

Copyright of the original code rests with the authors inside the ELPA
consortium. The code is distributed under the terms of the GNU Lesser General 
38
Public License version 3 (LGPL).
39
40
41

Please also note the express "NO WARRANTY" disclaimers in the GPL.

42
43
Please see the file "COPYING" for details, and the files "gpl.txt" and
"lgpl.txt" for further information.
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113


*** Using ELPA: 

ELPA is designed to be compiled (Fortran) on its own, to be later
linked to your own application. In order to use ELPA, you must still
have a set of separate libraries that provide

  - Basic Linear Algebra Subroutines (BLAS)
  - Lapack routines
  - Basic Linear Algebra Communication Subroutines (BLACS)
  - Scalapack routines
  - a working MPI library

Appropriate libraries can be obtained and compiled separately on many
architectures as free software. Alternatively, pre-packaged libraries
are usually available from any HPC proprietary compiler vendors.

For example, Intel's ifort compiler contains the "math kernel library"
(MKL), providing BLAS/Lapack/BLACS/Scalapack functionality. (except on
Mac OS X, where the BLACS and Scalapack part must still be obtained
and compiled separately). 

A very usable general-purpose MPI library is OpenMPI (ELPA was tested
with OpenMPI 1.4.3 for example). Intel MPI seems to be a very well
performing option on Intel platforms.

Examples of how to use ELPA are included in the accompanying
test_*.f90 subroutines in the "test" directory. A Makefile in also
included as a minimal example of how to build and link ELPA to any
other piece of code. 


*** Structure of this repository:

* README file - this file. Please also consult the ELPA Wiki, and
  consider adding any useful information that you may have.

* COPYING directory - the copyright and licensing information for ELPA.

* src directory - contains all the files that are needed for the
  actual ELPA subroutines. If you are attempting to use ELPA in your
  own application, these are the files which you need. 

- elpa1.f90 contains routines for the one-stage solver,
  The 1 stage solver (elpa1.f90) can be used standalone without elpa2.

- elpa2.f90 - ADDITIONAL routines needed for the two-stage solver
  elpa2.f90 requires elpa1.f90 and a version of elpa2_kernels.f90, so
  always compile them together.

- elpa2_kernels.f90 - optimized linear algebra kernels for ELPA.
  This file is a generic version of optimized linear algebra kernels
  for use with the ELPA library. The standard elpa2_kernels.f90 runs
  on every platform but it is optimized for the Intel SSE instruction
  set. Best perfomance is achieved with the Intel ifort compiler and 
  compile flags -O3 -xSSE4.2

  For optimum performance on special architectures, you may wish to
  investigate whether hand-tuned versions of this file give additional
  gains. If so, simply remove elpa2_kernels.f90 from your compilation
  and replace with the version of your choice. It would be great if
  you could contribute such hand-tuned versions back to the
  repository. (LGPL requirement for redistribution holds in any case)

- elpa2_kernels_bg.f90
  Example of optimized ELPA kernels for the BlueGene/P
  architecture. Use instead of the standard elpa2_kernels.f90
  file. elpa2_kernels_bg.f90 contains assembler instructions for the
  BlueGene/P processor which IBM's xlf Fortran compiler can handle.
114
115
116
117
118
119
120
  
- elpa_qr directory (development version only)
  This directory contains routines for an alternative implementation
  of the QR-decomposition of "tall and skinny" matrices, which is
  needed for the reduction to banded form. The usage of this 
  alternative implementation can be switched on and off by setting
  the variable "which_qr_decomposition" in elpa2.f90 (default = on).
121
122
123
124
125
126
127
128
129

* test directory

- Contains the Makefile that demonstrates how to compile and link to
  the ELPA routines

- All files starting with test_... are for demonstrating the use
  of the elpa library (but not needed for using it).

130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
- The test_* programs build their own random matrices, solve the eigenvalue
  problem and write timings.

  There are three parameters that control the test_* programs:
  - na   = matrix dimension
  - nev  = number of eigenvalue / eigenvector pairs that is actually needed
  - nblk = algorithmic block size, usually 16, 32 or 64 
    (important - do not set to unreasonable values)

  These input parameters are set to default values na=4000, nev=1500, nblk=16
  in the header of each source file test*.f90 .

  Optionally, they can be controlled at runtime by supplying an input file 
  called 'test_parameters.in'.

  This file can contain any or no lines at all (input values not specified in
  'test_parameters.in' will be set to default values).

  The format of test_parameters.in is simple, for instance:

  na   8000
  nev  6000
  nblk 32

  to change all values. Order of lines does not matter.

156
157
158
159
160
161
162
163
164
165
166
- All test programs solve a eigenvalue problem and check the correctnes
  of the result by evaluating || A*x - x*lamba || and checking the
  orthogonality of the eigenvectors

  test_real         Real eigenvalue problem, 1 stage solver
  test_real_gen     Real generalized eigenvalue problem, 1 stage solver
  test_complex      Complex eigenvalue problem, 1 stage solver
  test_complex_gen  Complex generalized eigenvalue problem, 1 stage solver
  test_real2        Real eigenvalue problem, 2 stage solver
  test_complex2     Complex eigenvalue problem, 2 stage solver

167
- There are two programs which read matrices from a file, solve the
168
  eigenvalue problem, print the eigenvalues and check the correctness
169
170
171
172
173
174
  of the result (all using elpa1 only)

  read_real         for the real eigenvalue problem
  read_real_gen     for the real generalized eigenvalue problem
                    A*x - B*x*lambda = 0

175
176
  read_real has to be called with 1 command line argument (the file
  containing the matrix). The file must be in ASCII (formatted) form.
177

178
179
180
181
182
183
184
  read_real_gen has to be called with 3 command line arguments. The
  first argument is either 'asc' or 'bin' (without quotes) and
  determines the format of the following files. 'asc' refers to ASCII
  (formatted) and 'bin' to binary (unformatted). Command line
  arguments 2 and 3 are the names of the files which contain matrices
  A and B.

185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
  The structure of the matrix files for read_real and read_real_gen
  depends on the format of the files:

  * ASCII format (both read_real and read_real_gen):

    The files must contain the following lines:

      - 1st line containing the matrix size
      - then following the upper half of the matrix in column-major
        (i.e. Fortran) order, one number per line:
        a(1,1)
        a(1,2)
        a(2,2)
        ...
        a(1,i)
        ...
        a(i,i)
        ...
        a(1,n)
        ...
        a(n,n)


  * Binary format (read_real_gen only):

    The files must contain the following records:

      - 1st record:  matrix size  (type integer)
      - 2nd record:  a(1,1)
      - 3rd record:  a(1,2)  a(2,2)
      - ...
      - ...          a(1,i)   ...   a(i,i)
      - ...
      - ...          a(1,n)      ...         a(n,n)

    The type of the matrix elements a(i,j) is real*8.