- 02 Feb, 2015 1 commit
-
-
Lorenz Huedepohl authored
There is currently no reliable way to measure RAM accesses with PAPI, the previous way by counting load and store instructions is not very useful, as it is unknown how many bytes are transferred in each instruction. On certain CPUs there is a reliable way to measure this via an "uncore" performance counter, one can check if your CPU (and/or Linux kernel version) support this by checking if the files /sys/devices/uncore_imc/events/data_reads /sys/devices/uncore_imc/events/data_writes exist. To access these counter from an unprivileged program one has to set the "paranoia" level of the perf subsystem to at most 0, adjustable via the file /proc/sys/kernel/perf_event_paranoid Along with this change there is a small API/ABI breakage as some keyword arguments related to the memory measurement have been renamed/split-up.
-
- 21 Jul, 2014 3 commits
-
-
Andreas Marek authored
-
Lorenz Hüdepohl authored
-
Lorenz Huedepohl authored
Simplified highwater_mark.c a bit, whitespace cleanup
-
- 18 Jun, 2014 2 commits
-
-
Lorenz Huedepohl authored
-
Lorenz Huedepohl authored
Even though they are never used in an undefined way per se, there is an associated(node, threshold_node) statement which compares a valid pointer to the undefined state of threshold_node (or own_node), which could potentially erroneously evaluate to true, should threshold_node accidentally hold precisely the address of a valid node.
-
- 28 May, 2014 2 commits
-
-
Andreas Marek authored
-
Andreas Marek authored
-
- 13 May, 2014 1 commit
-
-
Lorenz Huedepohl authored
Additionally one can now also measure load and stores, and thus the memory bandwidth. Therefore, also the arithmetic intensity. One caveat, though: The user is responsible to provide a meaningful value for the amount of bytes transferred in one load/store, via the "bytes_per_ldsr" parameter of the new function %set_print_options. Till now, I have now way of obtaining this value programmatically, and it also can and will vary for different sections of a program. For example, a SSE movapd instructions loads/stores 16 byte, but is still counted as one "load and store" instruction, just as well as a 1-byte mov. Feel free to advise me on a better set of machine counters.. Also, somewhat updated documentation.
-
- 07 May, 2014 1 commit
-
-
Lorenz Huedepohl authored
-
- 06 May, 2014 1 commit
-
-
Lorenz Huedepohl authored
Now, one can select which kind of measurements are taken by calling the member functions %measure_flops, and %measure_memory of a timer_t object. For example type(timer_t) :: timer call timer%measure_flops(.true.) call timer%measure_memory(.true.) call timer%enable() An explicit ftimings_init() call is now no longer necessary, PAPI will be initialized on the first %measure_flops(.true.) call.
-
- 05 May, 2014 1 commit
-
-
Lorenz Huedepohl authored
-
- 02 May, 2014 4 commits
-
-
Lorenz Huedepohl authored
-
Lorenz Huedepohl authored
Otherwise, very fast routines are simply not printed
-
Lorenz Huedepohl authored
If the argument 'name' is not given to timer_stop, close the currently active region without further sanity checking.
-
Lorenz Huedepohl authored
Next so some refactoring into four separate source files, support for also recording values of perfomance counters via the PAPI library was added, at the moment a FLOP count is measured and results are presented in timer_print as MFlop/s.
-
- 31 Jan, 2014 1 commit
-
-
Lorenz Huedepohl authored
-
- 27 Jan, 2014 2 commits
-
-
Lorenz Huedepohl authored
-
Lorenz Huedepohl authored
-
- 17 Jan, 2014 1 commit
-
-
Lorenz Huedepohl authored
Make only timer_t visible via the module, some cleanup
-
- 14 Jan, 2014 1 commit
-
-
Lorenz Huedepohl authored
-
- 18 Jun, 2013 4 commits
-
-
Lorenz Hüdepohl authored
-
Lorenz Hüdepohl authored
The resulting package is called ftimings-$(FTIMINGS_API_VERSION)-$(FC)
-
Lorenz Hüdepohl authored
-
Lorenz Hüdepohl authored
-