Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
elpa
elpa
Commits
953a4ed6
Commit
953a4ed6
authored
Jan 03, 2013
by
Alexander Heinecke
Browse files
working SSE and AVX intrinsic version of complex 2hv
parent
da10d57e
Changes
6
Expand all
Hide whitespace changes
Inline
Side-by-side
ELPA_2011.12.Intrinsics/src/elpa2_kernels/elpa2_tum_kernels_complex_sse-avx_1hv.cpp
View file @
953a4ed6
...
...
@@ -11,6 +11,7 @@
// with their original authors, but shall adhere to the licensing terms
// distributed along with the original code in the file "COPYING".
//
// Author: Alexander Heinecke (alexander.heinecke@mytum.de)
// --------------------------------------------------------------------------------------------------
#include <complex>
...
...
ELPA_2011.12.Intrinsics/src/elpa2_kernels/elpa2_tum_kernels_complex_sse-avx_2hv.cpp
View file @
953a4ed6
This diff is collapsed.
Click to expand it.
ELPA_2011.12.Intrinsics/src/elpa2_kernels/elpa2_tum_kernels_real_sse-avx_2hv.c
View file @
953a4ed6
...
...
@@ -11,6 +11,7 @@
// with their original authors, but shall adhere to the licensing terms
// distributed along with the original code in the file "COPYING".
//
// Author: Alexander Heinecke (alexander.heinecke@mytum.de)
// --------------------------------------------------------------------------------------------------
#include <x86intrin.h>
...
...
ELPA_2011.12.Intrinsics/src/elpa2_kernels/elpa2_tum_kernels_real_sse-avx_4hv.c
View file @
953a4ed6
...
...
@@ -11,6 +11,7 @@
// with their original authors, but shall adhere to the licensing terms
// distributed along with the original code in the file "COPYING".
//
// Author: Alexander Heinecke (alexander.heinecke@mytum.de)
// --------------------------------------------------------------------------------------------------
#include <x86intrin.h>
...
...
ELPA_2011.12.Intrinsics/src/elpa2_kernels/elpa2_tum_kernels_real_sse-avx_6hv.c
View file @
953a4ed6
...
...
@@ -11,6 +11,7 @@
// with their original authors, but shall adhere to the licensing terms
// distributed along with the original code in the file "COPYING".
//
// Author: Alexander Heinecke (alexander.heinecke@mytum.de)
// --------------------------------------------------------------------------------------------------
#include <x86intrin.h>
...
...
ELPA_2011.12.Intrinsics/test/Makefile
View file @
953a4ed6
...
...
@@ -11,7 +11,7 @@ F90OPT=$(F90) -mavx
CC
=
gcc
-O3
CCOPT
=
$(CC)
-mavx
-funsafe-loop-optimizations
-funsafe-math-optimizations
-ftree-vect-loop-version
-ftree-vectorize
MKL_HOME
=
/opt/intel/mkl/lib/intel64
LIBS
=
-mkl
=
sequential
-L
$(MKL_HOME)
-lmkl_scalapack_lp64
-lmkl_blacs_intelmpi_lp64
LIBS
=
-mkl
=
sequential
-L
$(MKL_HOME)
-lmkl_scalapack_lp64
-lmkl_blacs_intelmpi_lp64
-lstdc
++
#
# ------------------------------------------------------------------------------
# Settings for Intel Fortran (Linux), Intel Composer XE 2011 (ifort 12.1) with SSE3:
...
...
@@ -95,17 +95,17 @@ test_complex_gen: test_complex_gen.o elpa1.o
$(F90)
-o
$@
test_complex_gen.o elpa1.o
$(LIBS)
ifeq
($(X86),1)
#test_real2: test_real2.o elpa1.o elpa2.o elpa2_tum_kernels_complex_sse-avx_1hv.o elpa2_tum_kernels_complex_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_4hv.o elpa2_tum_kernels_real_sse-avx_6hv.o
# $(F90) -o $@ test_real2.o elpa1.o elpa2.o elpa2_tum_kernels_complex_sse-avx_1hv.o elpa2_tum_kernels_complex_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_4hv.o elpa2_tum_kernels_real_sse-avx_6hv.o $(LIBS)
#
#test_complex2: test_complex2.o elpa1.o elpa2.o elpa2_tum_kernels_complex_sse-avx_1hv.o elpa2_tum_kernels_complex_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_4hv.o elpa2_tum_kernels_real_sse-avx_6hv.o
# $(F90) -o $@ test_complex2.o elpa1.o elpa2.o elpa2_tum_kernels_complex_sse-avx_1hv.o elpa2_tum_kernels_complex_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_4hv.o elpa2_tum_kernels_real_sse-avx_6hv.o $(LIBS)
test_real2
:
test_real2.o elpa1.o elpa2.o elpa2_tum_kernels_complex_sse-avx_1hv.o elpa2_tum_kernels_complex_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_4hv.o elpa2_tum_kernels_real_sse-avx_6hv.o
$(F90)
-o
$@
test_real2.o elpa1.o elpa2.o elpa2_tum_kernels_complex_sse-avx_1hv.o elpa2_tum_kernels_complex_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_4hv.o elpa2_tum_kernels_real_sse-avx_6hv.o
$(LIBS)
test_
real
2
:
test_
real
2.o elpa1.o elpa2.o elpa2_kernels_complex.o elpa2_tum_kernels_real_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_4hv.o elpa2_tum_kernels_real_sse-avx_6hv.o
$(F90)
-o
$@
test_
real
2.o elpa1.o elpa2.o elpa2_kernels_complex.o elpa2_tum_kernels_real_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_4hv.o elpa2_tum_kernels_real_sse-avx_6hv.o
$(LIBS)
test_
complex
2
:
test_
complex
2.o elpa1.o elpa2.o elpa2_
tum_
kernels_complex
_sse-avx_1hv.o elpa2_tum_kernels_complex_sse-avx_2hv
.o elpa2_tum_kernels_real_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_4hv.o elpa2_tum_kernels_real_sse-avx_6hv.o
$(F90)
-o
$@
test_
complex
2.o elpa1.o elpa2.o elpa2_
tum_
kernels_complex
_sse-avx_1hv.o elpa2_tum_kernels_complex_sse-avx_2hv
.o elpa2_tum_kernels_real_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_4hv.o elpa2_tum_kernels_real_sse-avx_6hv.o
$(LIBS)
test_complex2
:
test_complex2.o elpa1.o elpa2.o elpa2_kernels_complex.o elpa2_tum_kernels_real_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_4hv.o elpa2_tum_kernels_real_sse-avx_6hv.o
$(F90)
-o
$@
test_complex2.o elpa1.o elpa2.o elpa2_kernels_complex.o elpa2_tum_kernels_real_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_4hv.o elpa2_tum_kernels_real_sse-avx_6hv.o
$(LIBS)
#test_real2: test_real2.o elpa1.o elpa2.o elpa2_kernels_complex.o elpa2_tum_kernels_real_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_4hv.o elpa2_tum_kernels_real_sse-avx_6hv.o
# $(F90) -o $@ test_real2.o elpa1.o elpa2.o elpa2_kernels_complex.o elpa2_tum_kernels_real_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_4hv.o elpa2_tum_kernels_real_sse-avx_6hv.o $(LIBS)
#
#test_complex2: test_complex2.o elpa1.o elpa2.o elpa2_kernels_complex.o elpa2_tum_kernels_real_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_4hv.o elpa2_tum_kernels_real_sse-avx_6hv.o
# $(F90) -o $@ test_complex2.o elpa1.o elpa2.o elpa2_kernels_complex.o elpa2_tum_kernels_real_sse-avx_2hv.o elpa2_tum_kernels_real_sse-avx_4hv.o elpa2_tum_kernels_real_sse-avx_6hv.o $(LIBS)
else
test_real2
:
test_real2.o elpa1.o elpa2.o elpa2_kernels_real.o elpa2_kernels_complex.o
$(F90)
-o
$@
test_real2.o elpa1.o elpa2.o elpa2_kernels_real.o elpa2_kernels_complex.o
$(LIBS)
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment