diff --git a/README.rst b/README.rst new file mode 100644 index 0000000000000000000000000000000000000000..5c3892effb2502b323be67fa0e67351a228f3b6d --- /dev/null +++ b/README.rst @@ -0,0 +1,323 @@ +.. sectnum:: + +.. contents:: + + +Introduction +=============== + +Very often applications need some simple and cheap timing information, both for internal purposes, such as cleanly shutting down (checkpointing) before batch system time limits are encountered, or for profiling the various parts of the code. For this, we developed a simple library at the `MPCDF <https://www.mpcdf.mpg.de>`_ called ftimings, that is primarily aimed at Fortran-based codes. It is similar in spirit to the venerable perflib in that it uses explicit calls placed by the user in the code to instrument sections of interest. [#]_ It always measures wall-clock time, and calls from within OpenMP parallel regions are only considered for the first thread. + +The ftimings library provides a simple object-oriented Fortran interface, and can also be used from a C/C++ code or from C/C++ code sections of a Fortran program. A specific feature of ``ftimings`` is that timing sections can be nested and will be presented consistently as a tree in the output. + +Availability +============ +ftimings is open-source and is already available as a module on many of the clusters as well as on the Draco and Cobra installations at the MPCDF. It provides a `pkg-config <https://www.freedesktop.org/wiki/Software/pkg-config/>`_ file with the usual ``--libs`` and ``--cflags`` arguments, as well as the Fortran specific flags in ``–variables=fcflags``. For codes that are already equipped with perflib instrumentation, there is also a wrapper library that provides a compatible API of most of perflib's interface. There are additional pkg-config files (e.g. ftimings_perflib-1-gcc-8) that provide the necessary linker arguments to link with that shim layer. [#]_ + + +Quick start +=========== +************* +Prerequisites +************* + +**Supported compilers**: + +**Source code**: The source code and documentation, also for the C API, can be found in the libraries repository, publicly available at our `gitlab instance <https://gitlab.mpcdf.mpg.de/mpcdf/ftimings/>`_. + +**Compiling the ftimings**: + + +********************************* +Adding ftimings to a Fortran code +********************************* + +========================================== +Add the necessary parts to the source code +========================================== + +.. code-block:: fortran + + use ftimings + type(timer_t) :: timer + call timer%enable() + call timer%start("section1") + call timer%stop("section1") + call timer%print("section1") + call timer%free() + +========================================== +Compile the code and link with the library +========================================== + +.. code-block:: bash + + #> module load gcc/8 ftimings + #> LDFLAGS="$(pkg-config ftimings-1-gcc-8 --libs) -Wl,-rpath=$FTIMINGS_HOME/lib" + #> FCFLAGS=$(pkg-config ftimings-1-gcc-8 --variable=fcflags) + #> gfortran $FCFLAGS foo.F90 -o foo $LDFLAGS + +=== +Run +=== +Finally, run your program as usual. The timing information will be printed to the standard output when the ``timer%print`` is called. + + +******************************* +Adding ftimings to a C/C++ code +******************************* + +========================================== +Add the necessary parts to the source code +========================================== + +a) add the ftimings headers to code + +.. code-block:: C + + #include <ftimings.h> + +For C++ codes, this header must be included in an extern "C" block to unmangle the function names. + +.. code-block:: C++ + + extern "C" { + #include <ftimings.h> + } + +b) instantiate a ``ftimer_t`` object at the top of the main function or wherever you want to start timing + +.. code-block:: C + + ftimer_t *timer; + timer = ftimings_create(); + +c) register ftimings error handler with timer name + +.. code-block:: C + + const char *timername = "mytimer"; + ftimings_register_error_handler(timer, error_handler, (void*) timername); + +d) enable the timer + +.. code-block:: C + + ftimings_enable(timer); + +e) start/stop the timer: wrap the sections of your code for timing + +.. code-block:: C + + ftimings_start(timer, "section1"); + […] + ftimings_stop(timer, "section1"); + +f) print the measured times + +.. code-block:: C + + ftimings_print(timer, 0.0); + +g) destroy the ``ftimer_t`` object + +.. code-block:: C + + ftimings_destroy(timer); + + +========================================== +Compile the code and link with the library +========================================== + +.. code-block:: bash + + #> module load intel/18.0.5 ftimings + #> LDFLAGS="$(pkg-config ftimings-1-intel-18.0.5 --libs) -Wl,-rpath=$FTIMINGS_HOME/lib" + #> CFLAGS=$(pkg-config ftimings-1-intel-18.0.5 --variable=cflags) + #> icpc $CFLAGS foo.cpp -o foo $LDFLAGS + +=== +Run +=== +Finally, run your program as usual. The timing information will be printed to the standard output when the ``ftimings_print()`` is called. + + +Examples +======== +*********** +Basic usage +*********** + +======= +FORTRAN +======= +A very minimal mock-up of a code could be instrumented like this: + +.. code-block:: fortran + + program phi + use ftimings + type(timer_t) :: timer + [...] + + call timer%enable() + + call timer%start('init') + call some_init() + call timer%stop('init') + + call timer%start('main-loop') + do i = 1, 4 + call timer%start('a') + call a() + call timer%stop('a') + + call timer%start('b') + call b() + call timer%stop('b') + + call c() + end do + call timer%stop('main-loop') + + call timer%print() + end program + + +The call ``timer%print()`` statement will produce an output such as this: + +:: + + #> ./a.out + /= Group [s] fraction + | ============ ============ + |_ [Root] (running) 18.002003 1.000 + |_ (own) 0.000250 0.000 + |_ init 2.000139 0.111 + |_ main-loop 16.001608 0.889 + |_ (own) 4.000575 0.250 + |_ a (4x) 4.000535 0.250 + |_ b (4x) 8.000498 0.500 + +As you can see, the information is stored in a tree, and nested ``timer%start()`` calls appear as child nodes of their encompassing sections. This, of course, necessitates that the user does not close an outer section before all started inner sections have been closed. The user is responsible for ensuring that this is the case. However, the library detects such misuse and disables the affected timer_t object after printing a warning message. + +Every node is labeled with the argument given to the ``%start()``/``%stop()`` subroutines and by default lists two values, its time duration and the fraction this particular duration is relative to its parent node. It is also possible to record various other resource data, such as allocated memory or FLOP counts (on hardware that supports it). +The time values for multiple identical ``%start()``/``%stop()`` pairs in the same encompassing section are accumulated together and increase a counter value, visible in parenthesis after the node names a and b. The mean time duration for a single call of the b subroutine would thus be about 2 seconds. + +The measurements are constructed such that all child nodes on the same level exactly add up to the value of their parent node, sections of the code that are not enclosed by timing sections are put into a special node (own). Here, this represents the loop overhead and the time spent in the c subroutine. + +===== +C/C++ +===== + +The same example in C/C++ language can be written as: + +.. code-block:: C + + #include <ftimings.h> + + int main(){ + ftimer_t *timer; + const char *timername = "mytimer"; + timer = ftimings_create(); + ftimings_register_error_handler(timer, error_handler, (void*) timername); + [...] + ftimings_enable(timer); + ftimings_start(timer, "init"); + some_init(); + ftimings_stop(timer, "init"); + ftimings_start(timer, "main-loop"); + for(int i = 1; i <= 4; ++i){ + ftimings_start(timer, "a"); + a(); + ftimings_stop(timer, "a"); + ftimings_start(timer, "b"); + b(); + ftimings_stop(timer, "b"); + c(); + } + ftimings_stop(timer, "main-loop");% + + ftimings_print(timer, 0.0); + + } + +More examples are available in the ``./examples`` directory. + +Other features +============== + +***************** +Run-time querying +***************** + +In addition to this profiling functionality, the library also allows access these values from within the program while it is running. This can, for example, be used to schedule the stop of the program after a certain duration, or to abort it should a certain operation take an abnormal amount of time. + +For this, the procedure ``%since()`` is available which returns the amount of seconds since an unclosed ``%start()`` call has been made with the supplied label. Example: + +.. code-block:: fortran + + print *, 'It is ', timer%since('main-loop'), ' seconds ago since start(''main-loop'') has been called' + +There is also the %get() method to query already closed sections: + +.. code-block:: fortran + + call timer%start('init') + call some_init() + call timer%stop('init') + print *, 'some_init() took', timer%get('init'), ' seconds' + +******************** +Hierarchical queries +******************** +To query child nodes, additional arguments with their labels should be supplied. For example, to get the time spent in all those b calls, use + +.. code-block:: fortran + + print *, 'All the b() calls took', timer%get('main-loop', 'b'), ' seconds' + +The same scheme can be used for the ``%since()`` method. Additionally, there are functions to sum up all the time spent in any descendant nodes with a given name. Suppose you enclosed all communication parts of your code with ``start``/``%stop('comm')`` calls, you can query the total amount of time spent in those with the ``%in_entries()`` method: + +.. code-block:: fortran + + print *, 'Spent', timer%in_entries('comm'), ' seconds in communication' + +In order to consider only those sections below a certain parent, additional arguments should be provided before the label name to be queried, + +.. code-block:: fortran + + print *, 'Spent', timer%in_entries('main-loop', 'b', 'comm'), ' seconds in comm sections in b()' + +******* +Sorting +******* + +By default, arguments are inserted into the tree in the order in which their ``%start()`` calls were done. To get a better overview in very large trees covering a whole simulation code, it is often better to sort the resulting tree levels. For this, there is the ``%sort()`` method that sorts the internal tree structure. Note that the original order is then lost. + +********** +Thresholds +********** + +Additionally, sometimes one is not overly concerned with many sections that cover only a short time duration. It is possible to exclude those by passing the optional argument threshold to the ``%print()`` method of ``timer_t``. Then, all child sections that took less than that threshold are subsumed under a single entry node (below threshold) and are not shown individually. That way the printed tree is still consistent in that all child nodes' values sum up to their parent's total, but unimportant nodes can be hidden. + + +Overheads +========= +Every first call to ``timer%start()`` at a certain point in the tree allocates a small amount of memory. If the timer is no longer needed, all that memory can be freed again with the ``%free()`` method. + +Additionally, timer instances start and can be ``%disabled()``, in which case almost all operations return immediately without (as much of) the overhead that would be necessary when doing the actual time measurements. That said, the actual overhead when the timer is enabled should be in the order of thousands of cycles per ``start%()``/``%stop()`` pair. + +Due to this small overhead it is of course never advised to instrument completely down to the innermost loops. On a per-function level, though, the overhead should be small enough in most cases – otherwise your functions are problematically small anyway. It is of course possible to create multiple ``timer_t`` objects, that instrument the code in different granularity, and which do not need to be enabled all the time. + +License and Attribution +======================= +Copyright 2014 Lorenz Hüdepohl + +This project is licensed under the GNU Lesser General Public License v3.0. Full text of the license is available `here <https://gitlab.mpcdf.mpg.de/loh/ftimings/blob/master/COPYING.LESSER>`_. + +------------ + +.. [#] Note that the focus of perflib is slightly different from ftimings', perflib has less overhead and strives to exclude that overhead from the resulting numbers. ftimings' focus is to provide hierarchical/nested time information. +.. [#] Note the peculiar name, ftimings-1-gcc-8, of the pkg-config file. This signifies the API version (1) of ftimings as well as the Fortran compiler used to build it – since Fortran modules are compiler dependent, unfortunately this is necessary. \ No newline at end of file