Changelog 8.02 KB
Newer Older
1
Changelog for next release
2

3
- not yet decided
Andreas Marek's avatar
Andreas Marek committed
4

Andreas Marek's avatar
Andreas Marek committed
5
Changelog for ELPA 2021.05.001
6

7
- allow the user to set the mapping of MPI tasks to GPU id per set/get
Andreas Marek's avatar
Andreas Marek committed
8
- experimental feature: port to AMD GPUS, works correctly, performance yet
Andreas Marek's avatar
Andreas Marek committed
9
  unclear; only tested --with-mpi=0
10
- On request, ELPA can print the pinning of MPI tasks and OpenMP thread
Andreas Marek's avatar
Andreas Marek committed
11 12
- support for FUGAKU: some minor fix still have to be fixed due to compiler
issues
Andreas Marek's avatar
Andreas Marek committed
13 14 15 16 17
- BUG FIX: if matrix is already banded, check whether bandwidth >= 2. DO NOT
  ALLOW a bandwidth = 1, since this would imply that the input matrix is
  already diagonal which the ELPA algorithms do not support
- BUG FIX in internal test programs: do not consider a residual of 0.0 to be
  an error
18
- support for skew-symmetric matrices now enabled by default
19 20 21 22
- BUG FIX in generalized case: in setups like "mpiexec -np 4 ./validate_real_double_generalized_1stage_random 90 90 45`
- ELPA_SETUPS does now (in case of MPI-runs) check whether the user-provided BLACSGRID is reasonable (i.e. ELPA does 
  _not_ rely anymore that the user does check prior to calling ELPA whether the BLACSGRID is ok) if this check fails 
  then ELPA returns with an error
23
- limit number of OpenMP threads to one, if MPI thread level is not at least MPI_THREAD_SERIALIZED
Andreas Marek's avatar
Andreas Marek committed
24
- allow checking of the supported threading level of the MPI library at build time
25

Andreas Marek's avatar
Andreas Marek committed
26
Changelog for ELPA 2020.11.001
27

Andreas Marek's avatar
Andreas Marek committed
28
- this release containts mostly bugfixes:
29
- fix determination whether a _ is needed to link Fortran to C
30
- fix an error in the real block4 kernel for arch64 NEON
Andreas Marek's avatar
Andreas Marek committed
31 32 33
- add missing test_scalapack_template.F90 to EXTRA_DIST list
- fix error in the GPU kernel
- switch form python2 to python3
34
- experimental feature: complex kernels for arch64 NEON
35
- experimental feature: kernels for ARM SVE
36

Andreas Marek's avatar
Andreas Marek committed
37
Changelog for ELPA 2020.05.001
38

39 40
- Enable compilation with gcc v10
- Fix a bug in elpa_multiply_a_b (GPU)
Andreas Marek's avatar
Test  
Andreas Marek committed
41 42 43
- improved documentation, including fixing of typos and errors in markdown
- Fix a bug in the calling of Cannons algorithm which might lead to crashes
for a squared process grid
44 45 46 47 48 49 50
- improvements and bugfixes of the ELPA2 stage GPU version, see
   https://arxiv.org/abs/2002.10991
- bugfix for the build of AVX-512 KNL kernels
- clean seperation of SIMD instructions for AVX and AVX2 kernels
- better error checking for allocations / deallocations of CPU and GPU memory
- experimental feature of matrix redistribution
- bugfix in the cpuid tests
51 52
- bugfix in elpa2_print_kernels
- bugfix when configuring --with-gpu-support-only
Andreas Marek's avatar
Test  
Andreas Marek committed
53

Andreas Marek's avatar
Andreas Marek committed
54
Changelog for ELPA 2019.11.001
55 56 57 58 59 60

- solve a bug when using parallel make builds
- check the cpuid set during build time
- add experimental feature "heterogenous-cluster-support"
- add experimental feature for 64bit integer LAS/LAPACK/SCALAPACK support
- add experimental feature for 64bit integer MPI support
61 62
- support of ELPA for real valued skew-symmetric matrices, please cite:
  https://arxiv.org/abs/1912.04062 
63
- cleanup of the GPU version
Andreas Marek's avatar
Andreas Marek committed
64 65 66 67
- bugfix in the OpenMP version
- bugfix on the Power8/9 kernels
- bugfix on ARM aarch64 FMA kernels

Andreas Marek's avatar
Andreas Marek committed
68 69 70 71 72 73

Changelog for ELPA 2019.05.002

- repacking of the src since the legacy interface has been forgotten in the
  2019.05.001 release

Andreas Marek's avatar
Andreas Marek committed
74 75
Changelog for ELPA 2019.05.001

76 77 78 79 80 81 82
- elpa_print_kernels supports GPU usage
- fix an error if PAPI measurements are activated
- new simple real kernels: block4 and block6
- c functions can be build with optional arguments if compiler supports it
(configure option)
- allow measurements with the likwid tool
- users can define the default-kernel at build time
83
- ELPA versioning number is provided in the C header files
84 85 86 87 88 89 90 91 92 93
- as announced a year ago, the following deprecated routines have been finally
removed; see DEPRECATED_FEATURES for the replacement routines , which have
been introduced a year ago. Removed routines:
  -> mult_at_b_real
  -> mult_ah_b_complex
  -> invert_trm_real
  -> invert_trm_complex
  -> cholesky_real
  -> cholesky_complex
  -> solve_tridi
94
- new kernels for ARM arch64 added
Andreas Marek's avatar
Andreas Marek committed
95
- fix an out-of-bound-error in elpa2
96

97

Andreas Marek's avatar
Andreas Marek committed
98
Changelog for ELPA 2018.11.001
99 100 101 102 103

- improved autotuning
- improved performance of generalized problem via Cannon's algorithm
- check pointing functionality of elpa objects
- store/read/resume of autotuning
104
- Python interface for ELPA
105 106
- more ELPA functions have an optional error argument (Fortran) or required
error argument (C) => ABI and API change
107 108


Andreas Marek's avatar
Andreas Marek committed
109
Changelog for ELPA 2018.05.001
110 111 112 113 114

- significant improved performance on K-computer
- added interface for the generalized eigenvalue problem
- extended autotuning functionality

115
Changelog for ELPA 2017.11.001
116

Andreas Marek's avatar
Andreas Marek committed
117
- significant improvement of performance of GPU version
118 119 120
- added new compute kernels for IBM Power8 and Fujistu Sparc64
  processors
- a first implementation of autotuning capability
121 122
- correct some type statements in Fortran
- correct detection of PAPI in configure step
123

124 125 126 127 128
Changelog for ELPA 2017.05.003

- remove bug in invert_triangular, which had been introduced
  in ELPA 2017.05.002

129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
Changelog for ELPA 2017.05.002

Mainly bugfixes for ELPA 2017.05.001:
- fix memory leak of MPI communicators
- tests for hermitian_multiply, cholesky decomposition and
- deal with a problem on Debian (mawk)

Changelog for ELPA 2017.05.001

Final release of ELPA 2017.05.001
Since rc2 the following changes have been made
- more extensive tests during "make check"
- distribute missing C headers
- introduce analytic tests
- Fix stack overflow in some kernels

Andreas Marek's avatar
Andreas Marek committed
145 146 147 148 149 150 151
Changelog for ELPA 2017.05.001.rc2

This is the release candidate 2 for the ELPA 2017.05.001 version.
Additionaly to the changes from rc1, it fixes some smaller issues
- add missing script "manual_cpp"
- cleanup of code

152 153 154 155 156 157
Changelog for ELPA 2017.05.001.rc1

This is the release candidate 1 for the ELPA 2017.05.001 version.
It provides a first version of the new, more generic API of the ELPA library.
Smaller changes to the API might be possible in the upcoming release
candidates. For users, who would like to use the older API of the ELPA
Andreas Marek's avatar
Andreas Marek committed
158
library, the API as defined with release 2016.11.001.pre is frozen in and
159 160 161 162 163 164 165 166 167 168 169 170
also supported.

Apart of the API change to be more flexible for the future, this release
offers the following changes:

- faster GPU implementation, especially for ELPA 1stage
- the restriction of the block-cyclic distribution blocksize = 128 in the GPU
  case is relaxed
- Faster CPU implementation due to better blocking
- support of already banded matrices (new API only!)
- improved KNL support

171 172 173 174 175 176 177 178 179
Changelog for pre-release ELPA 2016.11.001.pre

This pre-release contains an experimental API which will most likely
change in the next stable release

- also suport of single-precision (real and complex case) eigenvalule problems
- GPU support in ELPA 1stage and 2stage (real and complex case)
- change of API (w.r.t. ELPA 2016.05.004) to support runtime-choice of GPU usage

180
Changelog for release ELPA 2016.05.004
Andreas Marek's avatar
Andreas Marek committed
181 182 183

- fix a problem with the private state of module precision
- distribute test_project with dist tarball
184
- generic driver routine for ELPA 1stage and 2stage
Andreas Marek's avatar
Andreas Marek committed
185 186 187 188 189 190 191 192 193 194
- test case for elpa_mult_at_b_real
- test case for elpa_mult_ah_b_complex
- test case for elpa_cholesky_real
- test case for elpa_cholesky_complex
- test case for elpa_invert_trm_real
- test case for elpa_invert_trm_complex
- fix building of static library
- better choice of AVX, AVX2, AVX512 kernels
- make assumed size Fortran arrays default

Andreas Marek's avatar
Andreas Marek committed
195 196 197 198 199 200 201 202 203 204
Changelog for release ELPA 2016.05.003

- fix a problem with the build of SSE kernels
- make some (internal) functions public, such that they
  can be used outside of ELPA
- add documentation and interfaces for new public functions
- shorten file namses and directory names for test programs
  in under to by pass "make agrument list too long" error

Changelog for release ELPA 2016.05.002
205 206 207

- fix problem with generated *.sh- check scripts
- name library differently if build without MPI support
Andreas Marek's avatar
Andreas Marek committed
208
- install only public modules
209 210


Andreas Marek's avatar
Andreas Marek committed
211
Changelog for release ELPA 2016.05.001
212

Andreas Marek's avatar
Andreas Marek committed
213
- support building without MPI for one node usage
214 215 216 217 218
- doxygen and man pages documentation for ELPA
- cleanup of documentation
- introduction of SSE gcc intrinsic kernels
- Remove errors due to unaligned memory
- removal of Fortran "contains functions"
Andreas Marek's avatar
Andreas Marek committed
219
- Fortran interfaces for assembly and C kernels
220 221