1. 25 Mar, 2015 1 commit
  2. 24 Mar, 2015 2 commits
  3. 20 Mar, 2015 1 commit
  4. 19 Mar, 2015 6 commits
  5. 16 Mar, 2015 1 commit
  6. 12 Mar, 2015 2 commits
  7. 04 Mar, 2015 1 commit
  8. 02 Mar, 2015 3 commits
  9. 27 Feb, 2015 1 commit
  10. 25 Feb, 2015 1 commit
  11. 16 Feb, 2015 1 commit
    • Lorenz Huedepohl's avatar
      C-API for ftimings · c9a7e72c
      Lorenz Huedepohl authored
      This is rather big update, ftimings can now be used via a C-Api, see
      test/c_test.c for an example on how to use it.
      
      This step lead to a slight overhaul also of the Fortran API, there are
      now also a number of ..._node member functions of timer_t that can be
      used if cannot or does not want to specify the node of interest via an
      explicit chain of names. An example:
      
      Instead of
      
        type(timer_t) :: timer
      
        ...
      
        call timer%print("foo", "bar", "baz")
      
      one can now also do
      
        type(timer_t) :: timer
        type(node_t) :: node
      
        ...
      
        node = timer%get_root_node()
        node = node%get_child("foo")
        node = node%get_child("bar")
        node = node%get_child("baz")
      
        call timer%print_node(node)
      
      This construction might sometimes be necessary, e.g. if the hierarchy is
      very dynamic or if the current provided maximum number of six levels in
      the non-_node functions is not sufficient (but think about if you REALLY
      need more than six levels..).
      
      This is similarly done on the C-side, there is even no restriction on
      the number of levels by using variadic lists. Still, also there _node
      functions are provided. All C-API symbols are prefixed with "ftimings_"
      in order to avoid name clashes.
      c9a7e72c
  12. 02 Feb, 2015 1 commit
    • Lorenz Huedepohl's avatar
      Measure RAM access with Linux perf API · 49797bea
      Lorenz Huedepohl authored
      There is currently no reliable way to measure RAM accesses with PAPI,
      the previous way by counting load and store instructions is not very
      useful, as it is unknown how many bytes are transferred in each
      instruction.
      
      On certain CPUs there is a reliable way to measure this via an "uncore"
      performance counter, one can check if your CPU (and/or Linux kernel
      version) support this by checking if the files
      
      	/sys/devices/uncore_imc/events/data_reads
      	/sys/devices/uncore_imc/events/data_writes
      
      exist.
      
      To access these counter from an unprivileged program one has to set the
      "paranoia" level of the perf subsystem to at most 0, adjustable via the
      file
      
      	/proc/sys/kernel/perf_event_paranoid
      
      Along with this change there is a small API/ABI breakage as some keyword
      arguments related to the memory measurement have been renamed/split-up.
      49797bea
  13. 21 Jul, 2014 3 commits
  14. 18 Jun, 2014 2 commits
  15. 28 May, 2014 2 commits
  16. 13 May, 2014 1 commit
    • Lorenz Huedepohl's avatar
      Counter for memory bandwidth (loads + stores) · 803a3959
      Lorenz Huedepohl authored
      Additionally one can now also measure load and stores, and thus the
      memory bandwidth. Therefore, also the arithmetic intensity.
      
      One caveat, though: The user is responsible to provide a meaningful
      value for the amount of bytes transferred in one load/store, via the
      "bytes_per_ldsr" parameter of the new function %set_print_options.
      
      Till now, I have now way of obtaining this value programmatically, and
      it also can and will vary for different sections of a program.
      
      For example, a SSE movapd instructions loads/stores 16 byte, but is
      still counted as one "load and store" instruction, just as well as a
      1-byte mov. Feel free to advise me on a better set of machine counters..
      
      Also, somewhat updated documentation.
      803a3959
  17. 07 May, 2014 1 commit
  18. 06 May, 2014 1 commit
    • Lorenz Huedepohl's avatar
      Allow the user to activate FLOPS/RAM measurements · 3fe6c3d1
      Lorenz Huedepohl authored
      Now, one can select which kind of measurements are taken by calling the
      member functions %measure_flops, and %measure_memory of a timer_t
      object. For example
      
        type(timer_t) :: timer
      
        call timer%measure_flops(.true.)
        call timer%measure_memory(.true.)
      
        call timer%enable()
      
      An explicit ftimings_init() call is now no longer necessary, PAPI will
      be initialized on the first %measure_flops(.true.) call.
      3fe6c3d1
  19. 05 May, 2014 1 commit
  20. 02 May, 2014 4 commits
  21. 31 Jan, 2014 1 commit
  22. 27 Jan, 2014 2 commits
  23. 17 Jan, 2014 1 commit