Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
elpa
elpa
Commits
c602e2bf
Commit
c602e2bf
authored
Apr 14, 2016
by
Andreas Marek
Browse files
Single precision SSE BLOCK 4 real kernel
parent
47da2281
Changes
2
Expand all
Show whitespace changes
Inline
Side-by-side
src/elpa2_kernels/elpa2_kernels_real_sse_4hv_single_precision.c
View file @
c602e2bf
This diff is collapsed.
Click to expand it.
src/mod_compute_hh_trafo_real.F90
View file @
c602e2bf
...
...
@@ -850,7 +850,6 @@ module compute_hh_trafo_real
do
j
=
ncols
,
2
,
-2
w
(:,
1
)
=
bcast_buffer
(
1
:
nbw
,
j
+
off
)
w
(:,
2
)
=
bcast_buffer
(
1
:
nbw
,
j
+
off
-1
)
print
*
,
"calling sse block2"
#ifdef WITH_OPENMP
call
double_hh_trafo_real_sse_2hv_single
(
a
(
1
,
j
+
off
+
a_off
-1
,
istripe
,
my_thread
),
&
w
,
nbw
,
nl
,
stripe_width
,
nbw
)
...
...
@@ -953,7 +952,6 @@ module compute_hh_trafo_real
#endif /* WITH_NO_SPECIFIC_REAL_KERNEL */
! X86 INTRINSIC CODE, USING 4 HOUSEHOLDER VECTORS
do
j
=
ncols
,
4
,
-4
print
*
,
"calling 4"
w
(:,
1
)
=
bcast_buffer
(
1
:
nbw
,
j
+
off
)
w
(:,
2
)
=
bcast_buffer
(
1
:
nbw
,
j
+
off
-1
)
w
(:,
3
)
=
bcast_buffer
(
1
:
nbw
,
j
+
off
-2
)
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment