Commit 84888b93 authored by Lorenz Huedepohl's avatar Lorenz Huedepohl
Browse files

Convert ELPA_2013.08 to a branch

parent bd93cec5
Welcome to the ELPA library git repository.
This central repository houses all versions of the ELPA library. At
the time of this writing (01/31/2013), the ELPA developers provide three
separate versions, each as a separate subdirectory available here:
- ELPA_2011.12
This is a stable version of the entire ELPA library including all
necessary files to build it. When using ELPA for production, we
recommend using this version, as it has seen a lot of testing and
production use.
This version should only ever see any updates if there are clear and
unambiguous bug-fixes to be made. Each bugfix must be documented in
the README file. (At the time of this writing, no known bugs exist.)
- ELPA_2013.02.BETA
This is a quasi stable version of the enire ELPA library including all
necessary files to build it.
It enhances ELPA_2011.12 by intrinsic kernels for Intel Westmere,
Intel Sandy Bridge, Intel Haswell (through complier -fma flag),
AMD Interlagos.
This version passed serveral extensive tests.
- ELPA_development_version
This is the development version of the ELPA library. Any useful new
changes should be made here, and a number of changes will be made in
the near future. If you want the version presently "branded" as
stable, please refer to ELPA_2011.12 at this time.
- ELPA_development_version_GPU
This is the development version of the ELPA library with GPU support.
Any useful new changes should be made here, and a number of changes will
be made in the near future. If you want the version presently "branded" as
stable, please refer to ELPA_2011.12 at this time.
- tar-archives
Snappshots of the various versions from above in a tar-ball.
ELPA Version numbering scheme:
We have decided to use the format "ELPA_year.month" to designate
specific stable point releases of the ELPA library. Point releases
will be made at irregular intervals, solely determined by when we
consider the development version absolutely stable. At the time of the
"branching", the new stable version should be an exact copy of the
development one.
in addition to the directories with the sources for the aforementioned
versions of ELPA there exist a directory "tar-archives", which contains
tar-balls with current snapshots of the different versions
How to install ELPA:
----------------------
ELPA is shipped with a typical "configure" and "make" procedure. It is
recommended to use this way to install ELPA. If you do not want to install
ELPA as library, but to include it in your source code, you can find a
"Makefile.example" in ./test, to see how this is done. Please distibute then
all files of ELPA with your code.
The configure installation is best done in four steps
1) run configure:
please point to your blacs/scalapack installation and the
linkline with the variables "BLACS_LDFLAGS" and "BLACS_FCFLAGS".
"BLACS_LDFLAGS" should then contain the correct linkline for your
blacs/scalapack installation and "BLACS_FCFLAGS" the include path
and any other flags you need at compile time.
It is recommended that you use the "rpath functionality" in the linkline,
otherwise it will be necessary to update the LD_LIBRARY_PATH environment
variable.
You can either specify your own builds of lapack/blacs/scalapack
or use specialized Vendor packages, e.g. if available you can use
Intel's MKL. If you do not set these variables ELPA will not be
build!
Please set the further optimisation that you would like with the
variable "FCFLAGS", "CFLAGS", and "CXXFLAGS", e.g. FCFLAGS="-O3 -xAVX"
Check the available options with "configure --help".
If available you can e.g. choose especially optimized elpa kernels
for your system.
Set the "prefix" - flag, if you wish another installation location than
the default "/usr/local/"
2) run "make"
3) run "make check"
a simple test of ELPA is done. At the moment the usage of "mpiexec"
is required. If this is not possible at your system, you can run the
binaries "test_real", "test_real2", "test_complex", and "test_complex2"
yourself. At the moment the tests check whether the residual and the
orthogonality of the found eigenvectors are lower than a threshold of
5e-12. If this test fails, or if you believe the threshold should be
even lower, please talk to us.
4) run "make install"
Note that a pckconfig file for ELPA is produced
How to use ELPA:
-----------------
Using ELPA should be quite simple. It is similiar to ScalaPack but the API
is different. See the examples in the directory "./test". There is shown how
to evoke ELPA from a Fortran code.
If you installed ELPA with the build procedure a pk-config file is produced.
ACLOCAL_AMFLAGS = ${ACLOCAL_FLAGS} -I m4
AM_FCFLAGS = @AM_FCFLAGS@ @FC_MODINC@modules @FC_MODOUT@modules
AM_LDFLAGS = @AM_LDFLAGS@ @BLACS_LDFLAGS@
BLACS_LDFLAGS = @BLACS_LDFLAGS@
# libelpa
lib_LTLIBRARIES = libelpa.la
libelpa_la_SOURCES = src/elpa1.f90 src/elpa2.f90
if WITH_BGP
libelpa_la_SOURCES += src/elpa2_kernels_bg.f90
else
libelpa_la_SOURCES += src/elpa2_kernels.f90
endif
libelpa_la_LDFLAGS = -version-info $(ELPA_SO_VERSION)
# install any .mod files in the include/ dir
elpa_includedir = $(includedir)/elpa
nobase_elpa_include_HEADERS = $(wildcard modules/*)
# other files to distribute
filesdir = $(datarootdir)
files_DATA = \
test/read_real.f90 \
test/read_real_gen.f90 \
test/test_complex2.f90 \
test/test_complex.f90 \
test/test_complex_gen.f90 \
test/test_real2.f90 \
test/test_real.f90 \
test/test_real_gen.f90
# pkg-config stuff
pkgconfigdir = $(libdir)/pkgconfig
pkgconfig_DATA = elpa.pc
# test programs
#bindir = $(abs_top_builddir)
bin_PROGRAMS = test_real test_real2 test_complex test_complex2
test_real_SOURCES = test/test_real.f90
test_real_LDADD = libelpa.la
test_real2_SOURCES = test/test_real2.f90
test_real2_LDADD = libelpa.la
test_complex_SOURCES = test/test_complex.f90
test_complex_LDADD = libelpa.la
test_complex2_SOURCES = test/test_complex2.f90
test_complex2_LDADD = libelpa.la
check_SCRIPTS = test_real.sh test_real2.sh test_complex.sh test_complex2.sh
TESTS = $(check_SCRIPTS)
test_real.sh:
echo "mpiexec -n 2 ./test_real > /dev/null 2>&1" > test_real.sh
chmod +x test_real.sh
test_real2.sh:
echo "mpiexec -n 2 ./test_real2 > /dev/null 2>&1" > test_real2.sh
chmod +x test_real2.sh
test_complex.sh:
echo "mpiexec -n 2 ./test_complex > /dev/null 2>&1" > test_complex.sh
chmod +x test_complex.sh
test_complex2.sh:
echo "mpiexec -n 2 ./test_complex2 > /dev/null 2>&1" > test_complex2.sh
chmod +x test_complex2.sh
CLEANFILES = test_real.sh test_real2.sh test_complex.sh test_complex2.sh
@FORTRAN_MODULE_DEPS@
This diff is collapsed.
This diff is collapsed.
/* config.h.in. Generated from configure.ac by autoheader. */
/* Define to 1 if you have the <dlfcn.h> header file. */
#undef HAVE_DLFCN_H
/* Define to 1 if you have the <inttypes.h> header file. */
#undef HAVE_INTTYPES_H
/* Define to 1 if you have the <memory.h> header file. */
#undef HAVE_MEMORY_H
/* Define to 1 if you have the <stdint.h> header file. */
#undef HAVE_STDINT_H
/* Define to 1 if you have the <stdlib.h> header file. */
#undef HAVE_STDLIB_H
/* Define to 1 if you have the <strings.h> header file. */
#undef HAVE_STRINGS_H
/* Define to 1 if you have the <string.h> header file. */
#undef HAVE_STRING_H
/* Define to 1 if you have the <sys/stat.h> header file. */
#undef HAVE_SYS_STAT_H
/* Define to 1 if you have the <sys/types.h> header file. */
#undef HAVE_SYS_TYPES_H
/* Define to 1 if you have the <unistd.h> header file. */
#undef HAVE_UNISTD_H
/* Define to the sub-directory in which libtool stores uninstalled libraries.
*/
#undef LT_OBJDIR
/* Define to 1 if your C compiler doesn't accept -c and -o together. */
#undef NO_MINUS_C_MINUS_O
/* Name of package */
#undef PACKAGE
/* Define to the address where bug reports for this package should be sent. */
#undef PACKAGE_BUGREPORT
/* Define to the full name of this package. */
#undef PACKAGE_NAME
/* Define to the full name and version of this package. */
#undef PACKAGE_STRING
/* Define to the one symbol short name of this package. */
#undef PACKAGE_TARNAME
/* Define to the home page for this package. */
#undef PACKAGE_URL
/* Define to the version of this package. */
#undef PACKAGE_VERSION
/* Define to 1 if you have the ANSI C header files. */
#undef STDC_HEADERS
/* Version number of package */
#undef VERSION
This diff is collapsed.
AC_PREREQ([2.69])
AC_INIT([elpa],[2011.12.002], elpa-library@rzg.mpg.de)
AC_CONFIG_SRCDIR([src/elpa1.f90])
AM_INIT_AUTOMAKE([foreign -Wall subdir-objects])
AC_CONFIG_MACRO_DIR([m4])
AC_CONFIG_HEADERS([config.h])
AM_SILENT_RULES([yes])
AX_CHECK_GNU_MAKE()
if test x$_cv_gnu_make_command = x ; then
AC_MSG_ERROR([Need GNU Make])
fi
AC_CHECK_PROG(CPP_FOUND,cpp,yes,no)
if test "x${CPP_FOUND}" = xno; then
AC_MSG_ERROR([no cpp found])
fi
# gnu-make fortran module dependencies
m4_include([fdep/fortran_dependencies.m4])
FDEP_F90_GNU_MAKE_DEPS
AC_PROG_INSTALL
#AC_PROG_CPP
AM_PROG_CC_C_O
AM_PROG_AR
AC_LANG(Fortran)
m4_include([m4/ax_prog_fc_mpi.m4])
dnl check whether an mpi compiler is available;
dnl if not abort since it is mandatory
AX_PROG_FC_MPI([],[have_mpi=yes],[have_mpi=no
if test "x${have_mpi}" = xno; then
AC_MSG_ERROR([no mpi found])
fi])
AC_SUBST([ELPA_LIB_VERSION], [2011.12.002])
# this is the version of the API, should be changed in the major revision
# if and only if the actual API changes
AC_SUBST([ELPA_SO_VERSION], [0:0:0])
AC_FC_FREEFORM
AC_FC_MODULE_FLAG
AC_FC_MODULE_OUTPUT_FLAG
dnl check different elpa
AC_MSG_CHECKING([whether the BG/P kernel was specified])
AC_ARG_WITH([BGP-kernel],
[AS_HELP_STRING([--with-BGP-kernel],
[use kernel tuned for Bluegene/P])],
[with_bgp_kernel=yes],[with_bgp_kernel=no])
AM_CONDITIONAL([WITH_BGP],[test x"$with_bgp_kernel" = x"yes"])
AC_MSG_RESULT([${with_bgp_kernel}])
save_FCFLAGS=$FCFLAGS
save_LDFLAGS=$LDFLAGS
FCFLAGS="$FCFLAGS $BLACS_FCFLAGS"
LDFLAGS="$LDFLAGS $BLACS_LDFLAGS"
dnl check whether one can link with specified MKL (desired method)
AC_MSG_CHECKING([whether we can compile a Fortran program using MKL])
AC_COMPILE_IFELSE([AC_LANG_SOURCE([
program test_mkl
use mkl_service
character*198 :: string
call mkl_get_version_string(string)
write(*,'(a)') string
end program
])],
[can_compile_with_mkl=yes],
[can_compile_with_mkl=no]
)
AC_MSG_RESULT([${can_compile_with_mkl}])
if test x"$can_compile_with_mkl" = x"yes" ; then
AC_MSG_CHECKING([whether we can link a Fortran program with MKL])
AC_LINK_IFELSE([AC_LANG_SOURCE([
program test_mkl
use mkl_service
character*198 :: string
call mkl_get_version_string(string)
write(*,'(a)') string
end program
])],
[can_link_with_mkl=yes],
[can_link_with_mkl=no]
)
AC_MSG_RESULT([${can_link_with_mkl}])
fi
dnl if not mkl, check all the necessary individually
if test "x${can_link_with_mkl}" = "xyes" ; then
WITH_MKL=1
else
dnl first check blas
AC_SEARCH_LIBS([dgemm],[blas],[can_link_with_blas=yes],[can_link_with_blas=no])
AC_MSG_CHECKING([whether we can link a program with a blas lib])
AC_MSG_RESULT([${can_link_with_blas}])
if test "x${can_link_with_blas}" = "xno" ; then
AC_MSG_ERROR([could not link with blas: specify path])
fi
dnl now lapack
AC_SEARCH_LIBS([dlarrv],[lapack],[can_link_with_lapack=yes],[can_link_with_lapack=no])
AC_MSG_CHECKING([whether we can link a program with a lapack lib])
AC_MSG_RESULT([${can_link_with_lapack}])
if test "x${can_link_with_lapack}" = "xno" ; then
AC_MSG_ERROR([could not link with lapack: specify path])
fi
dnl now blacs
AC_SEARCH_LIBS([blacs_gridinit],[mpiblacs],[can_link_with_blacs=yes],[can_link_with_blacs=no])
AC_MSG_CHECKING([whether we can link a program with a blacs lib])
AC_MSG_RESULT([${can_link_with_blacs}])
if test "x${can_link_with_blacs}" = "xno" ; then
AC_MSG_ERROR([could not link with blacs: specify path])
fi
dnl now scalapack
AC_SEARCH_LIBS([pdtran],[mpiscalapack],[can_link_with_scalapack=yes],[can_link_with_scalapack=no])
AC_MSG_CHECKING([whether we can link a program with a scalapack lib])
AC_MSG_RESULT([${can_link_with_scalapack}])
if test "x${can_link_with_scalapack}" = "xno" ; then
AC_MSG_ERROR([could not link with scalapack: specify path])
fi
dnl check whether we can link alltogehter
AC_MSG_CHECKING([whether we can link a Fortran program with all blacs/scalapack])
AC_LINK_IFELSE([AC_LANG_SOURCE([
program dgemm_test
integer , parameter:: n_cols=3,l_cols=3
real :: hvm(n_cols,l_cols)
call dgemm('T','N',n_cols,n_cols,l_cols,1.,hvm,ubound(hvm,1), &
hvm(1,n_cols+1),ubound(hvm,1),0.,hvm,ubound(hvm,1))
end program dgemm_test
])],
[can_link_with_blacs_scalapack=yes],
[can_link_with_blacs_scalapack=no]
)
AC_MSG_RESULT([${can_link_with_blacs_scalapack}])
if test "x${can_link_with_blacs_scalapack}" = "xyes" ; then
WITH_BLACS=1
else
AC_MSG_ERROR([We can neither link with MKL or another Scalpack. Please specify BLACS_LDFLAGS and BLACS_FCFLAGS!])
fi
fi
dnl important: reset them again!
FCFLAGS=$save_FCFLAGS
LDFLAGS=$save_LDFLAGS
LT_INIT
AC_SUBST([AM_FCFLAGS])
AC_SUBST([AM_LDFLAGS])
AC_SUBST([WITH_MKL])
AC_SUBST([WITH_BLACS])
AC_SUBST([CFLAGS])
AC_SUBST([FCFLAGS])
AC_SUBST([CPPFLAGS])
AC_SUBST([LDFLAGS])
AC_SUBST([RANLIB])
AC_SUBST([FC_MODINC])
AC_SUBST([FC_MODOUT])
AC_SUBST(BLACS_LDFLAGS)
AC_SUBST(BLACS_FCFLAGS)
mkdir modules
AC_CONFIG_FILES([
Makefile
elpa.pc:elpa.pc.in
])
AC_OUTPUT
../fdep
\ No newline at end of file
This diff is collapsed.
! This file is part of ELPA.
!
! The ELPA library was originally created by the ELPA consortium,
! consisting of the following organizations:
!
! - Rechenzentrum Garching der Max-Planck-Gesellschaft (RZG),
! - Bergische Universität Wuppertal, Lehrstuhl für angewandte
! Informatik,
! - Technische Universität München, Lehrstuhl für Informatik mit
! Schwerpunkt Wissenschaftliches Rechnen ,
! - Fritz-Haber-Institut, Berlin, Abt. Theorie,
! - Max-Plack-Institut für Mathematik in den Naturwissenschaftrn,
! Leipzig, Abt. Komplexe Strukutren in Biologie und Kognition,
! and
! - IBM Deutschland GmbH
!
!
! More information can be found here:
! http://elpa.rzg.mpg.de/
!
! ELPA is free software: you can redistribute it and/or modify
! it under the terms of the version 3 of the license of the
! GNU Lesser General Public License as published by the Free
! Software Foundation.
!
! ELPA is distributed in the hope that it will be useful,
! but WITHOUT ANY WARRANTY; without even the implied warranty of
! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
! GNU Lesser General Public License for more details.
!
! You should have received a copy of the GNU Lesser General Public License
! along with ELPA. If not, see <http://www.gnu.org/licenses/>
!
! ELPA reflects a substantial effort on the part of the original
! ELPA consortium, and we ask you to respect the spirit of the
! license that we chose: i.e., please contribute any changes you
! may have back to the original ELPA library distribution, and keep
! any derivatives of ELPA under the same license that we chose for
! the original distribution, the GNU Lesser General Public License.
!
!
! --------------------------------------------------------------------------------------------------
!
! This file contains the compute intensive kernels for the Householder transformations.
! It should be compiled with the highest possible optimization level.
!
! On Intel use -O3 -xSSE4.2 (or the SSE level fitting to your CPU)
!
! Copyright of the original code rests with the authors inside the ELPA
! consortium. The copyright of any additional modifications shall rest
! with their original authors, but shall adhere to the licensing terms
! distributed along with the original code in the file "COPYING".
!
! --------------------------------------------------------------------------------------------------
subroutine double_hh_trafo(q, hh, nb, nq, ldq, ldh)
implicit none
integer, intent(in) :: nb, nq, ldq, ldh
real*8, intent(inout) :: q(ldq,*)
real*8, intent(in) :: hh(ldh,*)
real*8 s
integer i
! Safety only:
if(mod(ldq,4) /= 0) STOP 'double_hh_trafo: ldq not divisible by 4!'
! Calculate dot product of the two Householder vectors
s = hh(2,2)*1
do i=3,nb
s = s+hh(i,2)*hh(i-1,1)
enddo
! Do the Householder transformations
! Always a multiple of 4 Q-rows is transformed, even if nq is smaller
do i=1,nq-8,12
call hh_trafo_kernel_12(q(i,1),hh, nb, ldq, ldh, s)
enddo
! i > nq-8 now, i.e. at most 8 rows remain
if(nq-i+1 > 4) then
call hh_trafo_kernel_8(q(i,1),hh, nb, ldq, ldh, s)
else if(nq-i+1 > 0) then
call hh_trafo_kernel_4(q(i,1),hh, nb, ldq, ldh, s)
endif
end
! --------------------------------------------------------------------------------------------------
! The following kernels perform the Householder transformation on Q for 12/8/4 rows.
! Please note that Q is declared complex*16 here.
! This is a hint for compilers that packed arithmetic can be used for Q
! (relevant for Intel SSE and BlueGene double hummer CPUs).
! --------------------------------------------------------------------------------------------------
subroutine hh_trafo_kernel_12(q, hh, nb, ldq, ldh, s)
implicit none
integer, intent(in) :: nb, ldq, ldh
complex*16, intent(inout) :: q(ldq/2,*)
real*8, intent(in) :: hh(ldh,*), s
complex*16 x1, x2, x3, x4, x5, x6, y1, y2, y3, y4, y5, y6
real*8 h1, h2, tau1, tau2
integer i
x1 = q(1,2)
x2 = q(2,2)
x3 = q(3,2)
x4 = q(4,2)
x5 = q(5,2)
x6 = q(6,2)
y1 = q(1,1) + q(1,2)*hh(2,2)
y2 = q(2,1) + q(2,2)*hh(2,2)
y3 = q(3,1) + q(3,2)*hh(2,2)
y4 = q(4,1) + q(4,2)*hh(2,2)
y5 = q(5,1) + q(5,2)*hh(2,2)
y6 = q(6,1) + q(6,2)*hh(2,2)
!DEC$ VECTOR ALIGNED
do i=3,nb
h1 = hh(i-1,1)
h2 = hh(i,2)
x1 = x1 + q(1,i)*h1
y1 = y1 + q(1,i)*h2
x2 = x2 + q(2,i)*h1
y2 = y2 + q(2,i)*h2
x3 = x3 + q(3,i)*h1
y3 = y3 + q(3,i)*h2
x4 = x4 + q(4,i)*h1
y4 = y4 + q(4,i)*h2
x5 = x5 + q(5,i)*h1
y5 = y5 + q(5,i)*h2
x6 = x6 + q(6,i)*h1
y6 = y6 + q(6,i)*h2
enddo
x1 = x1 + q(1,nb+1)*hh(nb,1)