elpa issueshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues2017-07-16T17:31:51Zhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/6Refactor QR part2017-07-16T17:31:51ZAndreas MarekRefactor QR partThe QR part should be refactored and cleanedThe QR part should be refactored and cleanedhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/27Remove unecessary data copies if MPI is not used2022-12-12T07:47:11ZAndreas MarekRemove unecessary data copies if MPI is not usedhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/45unify real/complex QR paths2017-07-16T17:31:50ZAndreas Marekunify real/complex QR pathshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/50stripe_width in trans_ev_tridi_to_band2017-09-06T19:06:07ZAndreas Marekstripe_width in trans_ev_tridi_to_bandhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/53Investigate stripe_width issues for AVX512 kernels2017-09-06T19:06:07ZLorenz HuedepohlInvestigate stripe_width issues for AVX512 kernelsSee commit 565de5dSee commit 565de5dhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/58Port routines used in generalized eigenvalue problem to GPU2017-08-24T08:23:09ZPavel KusPort routines used in generalized eigenvalue problem to GPUPavel KusPavel Kushttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/61QR decompostion does not work with "analytic" matrix2017-09-13T09:30:39ZAndreas MarekQR decompostion does not work with "analytic" matrixSince it does not work, this combination is not enabled at the momentSince it does not work, this combination is not enabled at the momentPavel KusPavel Kushttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/64ELPA can not be build on various (unsupported?) architectures2017-12-29T14:20:57ZAndreas MarekELPA can not be build on various (unsupported?) architecturessee
https://bugzilla.redhat.com/show_bug.cgi?id=1512229see
https://bugzilla.redhat.com/show_bug.cgi?id=1512229https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/71investigate CPU memory allocation2019-10-22T08:27:00ZPavel Kusinvestigate CPU memory allocationand why it is not possible to run with much larger matrix on machines with very large memory (e.g. Optane memory equipped node)and why it is not possible to run with much larger matrix on machines with very large memory (e.g. Optane memory equipped node)Pavel KusPavel Kushttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/72lift the restriction bandwidth == nblk in the GPU version of ELPA 22019-10-29T11:13:22ZPavel Kuslift the restriction bandwidth == nblk in the GPU version of ELPA 2For the GPU version of ELPA 2, the intermediate bandwidth is always taken as the scalapack block size. It would be better, if the optimal value (as for the CPU version) could be selected, since it is very important for performance.
It i...For the GPU version of ELPA 2, the intermediate bandwidth is always taken as the scalapack block size. It would be better, if the optimal value (as for the CPU version) could be selected, since it is very important for performance.
It is, however, hard-coded somewhere in the band reduction step.https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/79Print in the test programs the number of GPU's used2021-04-15T06:16:37ZAndreas MarekPrint in the test programs the number of GPU's usedIf ELPA is build with GPU support, print in the startup of the test programs the number of GPUs usedIf ELPA is build with GPU support, print in the startup of the test programs the number of GPUs usedhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/81Toeplitz test cases hang for realy small matrices na=42021-05-06T12:37:33ZAndreas MarekToeplitz test cases hang for realy small matrices na=4If you use 4 MPI tasks for a setup of na=4 nev=4 nblk=1, the the test-cases for Toeplitz matrices hang.
The test-cases for other matrix setups do work, however.
It seems that the code hangs in the "solve" stepIf you use 4 MPI tasks for a setup of na=4 nev=4 nblk=1, the the test-cases for Toeplitz matrices hang.
The test-cases for other matrix setups do work, however.
It seems that the code hangs in the "solve" stephttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/85GPU improvements in tridi_to_band for non-MPI2021-07-09T07:00:44ZAndreas MarekGPU improvements in tridi_to_band for non-MPIA lot of memory transfers to / from the device could be avoided, similar to the CUDA_AWARE_MPI caseA lot of memory transfers to / from the device could be avoided, similar to the CUDA_AWARE_MPI casehttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/87Check status of GPU port of generalized routines2021-08-25T12:16:48ZAndreas MarekCheck status of GPU port of generalized routineshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/88Check status of mpi-redistribution routine2021-08-25T11:58:43ZAndreas MarekCheck status of mpi-redistribution routinehttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/89GPU version: limit rectriction to power of 2 block sizes2021-08-25T12:00:31ZAndreas MarekGPU version: limit rectriction to power of 2 block sizeshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/90Interfaces for device-memory arrays2021-09-02T16:36:58ZAndreas MarekInterfaces for device-memory arrayshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/911 MPI rank per GPU rest with OpenMP threads2022-02-09T13:50:23ZAndreas Marek1 MPI rank per GPU rest with OpenMP threadshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/92external sanity checker for ELPA settings / input2021-08-25T12:07:43ZAndreas Marekexternal sanity checker for ELPA settings / inputhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/94Setting of GPU kernel depens on order of set calls2022-02-03T17:05:00ZAndreas MarekSetting of GPU kernel depens on order of set callsWhen setting
first set("solver",2stage) and then
set("kernel",GPU_KERNEL)
it uses the CPU kernel (the default kernel seems to be set)
In the other order it works correctlyWhen setting
first set("solver",2stage) and then
set("kernel",GPU_KERNEL)
it uses the CPU kernel (the default kernel seems to be set)
In the other order it works correctly