1. 26 Mar, 2015 1 commit
    • Lorenz Huedepohl's avatar
      A couple of new routines to query usage of children · d336cb45
      Lorenz Huedepohl authored
      The type timer_t got two new kind of methods to query information about
      the resource usage if its children. The method
      
        %get_in_children
      
      queries the seconds spent in all children of the queried node, and
      
        %get_without_children
      
      conversely returns the number of seconds spent excluding all children.
      
      Additionally, there also version where the node in question is passed
      directly instead of by a string path (*_node) and versions where the
      result is a type(value_t) with all measurements (get_value_*).
      d336cb45
  2. 25 Mar, 2015 1 commit
  3. 24 Mar, 2015 2 commits
  4. 20 Mar, 2015 1 commit
  5. 19 Mar, 2015 6 commits
  6. 16 Mar, 2015 1 commit
  7. 12 Mar, 2015 2 commits
  8. 04 Mar, 2015 1 commit
  9. 02 Mar, 2015 3 commits
  10. 27 Feb, 2015 1 commit
  11. 25 Feb, 2015 1 commit
  12. 16 Feb, 2015 1 commit
    • Lorenz Huedepohl's avatar
      C-API for ftimings · c9a7e72c
      Lorenz Huedepohl authored
      This is rather big update, ftimings can now be used via a C-Api, see
      test/c_test.c for an example on how to use it.
      
      This step lead to a slight overhaul also of the Fortran API, there are
      now also a number of ..._node member functions of timer_t that can be
      used if cannot or does not want to specify the node of interest via an
      explicit chain of names. An example:
      
      Instead of
      
        type(timer_t) :: timer
      
        ...
      
        call timer%print("foo", "bar", "baz")
      
      one can now also do
      
        type(timer_t) :: timer
        type(node_t) :: node
      
        ...
      
        node = timer%get_root_node()
        node = node%get_child("foo")
        node = node%get_child("bar")
        node = node%get_child("baz")
      
        call timer%print_node(node)
      
      This construction might sometimes be necessary, e.g. if the hierarchy is
      very dynamic or if the current provided maximum number of six levels in
      the non-_node functions is not sufficient (but think about if you REALLY
      need more than six levels..).
      
      This is similarly done on the C-side, there is even no restriction on
      the number of levels by using variadic lists. Still, also there _node
      functions are provided. All C-API symbols are prefixed with "ftimings_"
      in order to avoid name clashes.
      c9a7e72c
  13. 02 Feb, 2015 1 commit
    • Lorenz Huedepohl's avatar
      Measure RAM access with Linux perf API · 49797bea
      Lorenz Huedepohl authored
      There is currently no reliable way to measure RAM accesses with PAPI,
      the previous way by counting load and store instructions is not very
      useful, as it is unknown how many bytes are transferred in each
      instruction.
      
      On certain CPUs there is a reliable way to measure this via an "uncore"
      performance counter, one can check if your CPU (and/or Linux kernel
      version) support this by checking if the files
      
      	/sys/devices/uncore_imc/events/data_reads
      	/sys/devices/uncore_imc/events/data_writes
      
      exist.
      
      To access these counter from an unprivileged program one has to set the
      "paranoia" level of the perf subsystem to at most 0, adjustable via the
      file
      
      	/proc/sys/kernel/perf_event_paranoid
      
      Along with this change there is a small API/ABI breakage as some keyword
      arguments related to the memory measurement have been renamed/split-up.
      49797bea
  14. 21 Jul, 2014 3 commits
  15. 18 Jun, 2014 2 commits
  16. 28 May, 2014 2 commits
  17. 13 May, 2014 1 commit
    • Lorenz Huedepohl's avatar
      Counter for memory bandwidth (loads + stores) · 803a3959
      Lorenz Huedepohl authored
      Additionally one can now also measure load and stores, and thus the
      memory bandwidth. Therefore, also the arithmetic intensity.
      
      One caveat, though: The user is responsible to provide a meaningful
      value for the amount of bytes transferred in one load/store, via the
      "bytes_per_ldsr" parameter of the new function %set_print_options.
      
      Till now, I have now way of obtaining this value programmatically, and
      it also can and will vary for different sections of a program.
      
      For example, a SSE movapd instructions loads/stores 16 byte, but is
      still counted as one "load and store" instruction, just as well as a
      1-byte mov. Feel free to advise me on a better set of machine counters..
      
      Also, somewhat updated documentation.
      803a3959
  18. 07 May, 2014 1 commit
  19. 06 May, 2014 1 commit
    • Lorenz Huedepohl's avatar
      Allow the user to activate FLOPS/RAM measurements · 3fe6c3d1
      Lorenz Huedepohl authored
      Now, one can select which kind of measurements are taken by calling the
      member functions %measure_flops, and %measure_memory of a timer_t
      object. For example
      
        type(timer_t) :: timer
      
        call timer%measure_flops(.true.)
        call timer%measure_memory(.true.)
      
        call timer%enable()
      
      An explicit ftimings_init() call is now no longer necessary, PAPI will
      be initialized on the first %measure_flops(.true.) call.
      3fe6c3d1
  20. 05 May, 2014 1 commit
  21. 02 May, 2014 4 commits
  22. 31 Jan, 2014 1 commit
  23. 27 Jan, 2014 2 commits