Update diagnosticfiles authored by Rainer Weinberger's avatar Rainer Weinberger
...@@ -21,18 +21,18 @@ Table of contents ...@@ -21,18 +21,18 @@ Table of contents
# Diagnostic output # Diagnostic output
Arepo will not only output the simulation snapshot and reduced data via Arepo will not only output the simulation snapshot and reduced data
the halo-finder files, but also a number of (mostly ascii) diagnostic log- via the halo-finder files, but also a number of (mostly ascii)
files which contain important information about the code performance and diagnostic log- files which contain important information about the
runtime behavior. code performance and runtime behavior.
In practice, to quickly check the performance of large In practice, to quickly check the performance of large production
production runs, it is useful to check the ``timebins.txt`` runs, it is useful to check the ``timebins.txt`` and ``cpu.txt``
and ``cpu.txt`` files. The former will give information how many simulation files. The former will give information how many simulation elements
elements are on which timestep, i.e. characteristics of the system are evolved with which timestep sizes, i.e. characteristics of the
simulated, the latter provides information about the computational system simulated, the latter provides information about the
time spent in each part of the code, which can be influenced to some computational time spent in each part of the code, which can be
degree by the values of the code parameters. influenced to some degree by the values of the code parameters.
For ongoing simulations, these can be checked via For ongoing simulations, these can be checked via
...@@ -43,18 +43,18 @@ For ongoing simulations, these can be checked via ...@@ -43,18 +43,18 @@ For ongoing simulations, these can be checked via
stdout stdout
====== ======
The standard output contains general information about the simulation status The standard output contains general information about the simulation
and many of the main routines will print general information in it. status and many of the main routines will print general information in
The output itself is mainly relevant for reconstructing what the simulation it. The output itself is mainly relevant for reconstructing what the
did, which is needed e.g. for debugging purposes. simulation did, which is needed e.g. for debugging purposes.
balance.txt balance.txt
=========== ===========
Output of fractional cpu time used in each individual step, optimized to be Output of fractional cpu time used in each individual step, optimized
machine readable (while cpu.txt is more human readable). to be machine readable (while cpu.txt is more human readable).
Symbol key: Symbol key
total = '-' / '-' total = '-' / '-'
treegrav = 'a' / ')' treegrav = 'a' / ')'
...@@ -116,21 +116,24 @@ example: ...@@ -116,21 +116,24 @@ example:
cpu.txt cpu.txt
======= =======
Each sync-point, such a block is written. This file For each sync-point, such a block is written. This file reports
reports the result of the different timers built into Arepo. Each measurements of the different timers built into Arepo. Each
computationally expensive operation has a different timer attached to it and computationally expensive operation has a different timer attached to
this way allows to closely monitor what the computational time is spent on. it, thus allowing to closely monitor where the computational time is
Some of the timers (e.g. treegrav) have sub-timers for individual operations. spent. Some of the timers (e.g. treegrav) have sub-timers for
This is denoted by the indentation hierarchy in the first column. individual operations. This is denoted by the indentation hierarchy
The distribution of these timings is highly problem dependent, but it is in the first column. The fraction of time spent in different code
possible to identify inefficient parts of the overall algorithm and optimize parts, as well as the absolute amount, is highly problem
only the most time-consuming parts of the code. There is the option dependent. The timers make it possible to identify inefficient parts
``OUTPUT_CPU_CSV`` which also returns this data as a ``cpu.csv`` file. of the overall algorithm and concentrate on the most time-consuming
parts of the code. There is also the option ``OUTPUT_CPU_CSV`` which
The different columns are: returns thes same data as a more easily machine-readable ``cpu.csv``
name; wallclock time (in s) this step; percentage this step; wallclock time file.
(in s) cumulative; percentage up to this step. A typical block of cpu.txt looks
the following (here a gravity-only, tree-only run): The different columns are: name; wallclock time (in s) this step;
percentage this step; wallclock time (in s) cumulative; percentage up
to this step. A typical block of cpu.txt looks the following (here a
gravity-only, tree-only run):
Step 131, Time: 0.197266, CPUs: 1, MultiDomains: 8, HighestActiveTim Step 131, Time: 0.197266, CPUs: 1, MultiDomains: 8, HighestActiveTim
eBin: 20 eBin: 20
...@@ -183,10 +186,10 @@ the following (here a gravity-only, tree-only run): ...@@ -183,10 +186,10 @@ the following (here a gravity-only, tree-only run):
domain.txt domain.txt
========== ==========
The load-balancing (cpu work and memory) both in gravity and hydro calculation The load-balancing (cpu work and memory) both for gravity and hydro
are reported for each timebin individually. Reported every sync-point. calculations is reported for each timebin individually. Reported every
Ideally balanced runs have the value 1, the higher the value, the more sync-point. Ideally balanced runs have a value 1, the higher the
imbalanced the simulation. value, the more imbalanced the simulation.
DOMAIN BALANCE, Sync-Point 13314, Time: 0.997486 DOMAIN BALANCE, Sync-Point 13314, Time: 0.997486
Timebins: Gravity Hydro cumulative grav-balance Timebins: Gravity Hydro cumulative grav-balance
...@@ -208,11 +211,11 @@ energy.txt ...@@ -208,11 +211,11 @@ energy.txt
========== ==========
In specified intervals (in simulation time, specified by the parameter In specified intervals (in simulation time, specified by the parameter
`TimeBetStatistics`) the total energy and its components are computed and `TimeBetStatistics`) the total energy and its components are computed
written into `energy.txt`. This file also contains the cumulative energy and written into the file `energy.txt`. This file also contains the
that had to be injected into the system to ensure positivity in thermal energy. cumulative energy that had to be injected into the system to ensure
All output in code units. Note: this only works with up to 6 particle types. positivity in thermal energy. All output is in code units. Note: this
The columns are only works with up to 6 particle types. The columns are
1. simulation time/ scalefactor 1. simulation time/ scalefactor
2. total thermal energy 2. total thermal energy
...@@ -245,7 +248,7 @@ The columns are ...@@ -245,7 +248,7 @@ The columns are
29. total injected energy due to positivity enforcement of thermal e 29. total injected energy due to positivity enforcement of thermal e
nergy nergy
Two example lines Two example lines:
0.96967 3.29069e+06 0 4.27406e+07 3.29069e+06 0 1.65766e+06 0 0 3.93 0.96967 3.29069e+06 0 4.27406e+07 3.29069e+06 0 1.65766e+06 0 0 3.93
02e+07 0 0 0 0 0 0 0 0 1.78097e+06 0 0 0 503.489 3047.89 0 0 65.5756 02e+07 0 0 0 0 0 0 0 0 1.78097e+06 0 0 0 503.489 3047.89 0 0 65.5756
...@@ -257,8 +260,8 @@ Two example lines ...@@ -257,8 +260,8 @@ Two example lines
info.txt info.txt
======== ========
Every sync-point, the time-bins, time, timestep and number of active particles Every sync-point, the time-bins, time, timestep and number of active
are written into this file, e.g. particles are written into this file, e.g.
Sync-Point 13327, TimeBin=16, Time: 0.999408, Redshift: 0.000592464, Sync-Point 13327, TimeBin=16, Time: 0.999408, Redshift: 0.000592464,
Systemstep: 0.000147974, Dloga: 0.000148072, Nsync-grv: 17679, Systemstep: 0.000147974, Dloga: 0.000148072, Nsync-grv: 17679,
...@@ -267,14 +270,15 @@ are written into this file, e.g. ...@@ -267,14 +270,15 @@ are written into this file, e.g.
memory.txt memory.txt
========== ==========
Arepo internally uses an own memory manager. This means that one large chunk of Arepo uses its own internal memory manager. This means that one large
memory is reserved initially for Arepo (specified by the parameter chunk of memory is reserved initially for Arepo (specified by the
`MaxMemSize`) and allocation for individual arrays is handled internally. parameter `MaxMemSize`), and the allocation for individual arrays is
The reason for introducing this was to avoid memory fragmentation during then handled internally from this pool. The reason for introducing
runtime on some machines, but also to have detailed information about how much this was to avoid memory fragmentation during runtime on some
memory Arepo actually needs and to terminate if this exceeds a pre-defined machines, but also to have detailed information about how much memory
threshold. ``memory.txt`` reports this internal memory usage, and how much memory Arepo actually needs and to terminate if this exceeds a pre-defined
is actually needed by the simulation. threshold. ``memory.txt`` reports this internal memory usage, and how
much memory is actually needed by the simulation.
MEMORY: Largest Allocation = 816.742 Mbyte | Largest Allocation W MEMORY: Largest Allocation = 816.742 Mbyte | Largest Allocation W
ithout Generic = 132.938 Mbyte ithout Generic = 132.938 Mbyte
...@@ -462,13 +466,15 @@ is actually needed by the simulation. ...@@ -462,13 +466,15 @@ is actually needed by the simulation.
sfr.txt sfr.txt
======= =======
In case ``USE_SFR`` is active, Arepo will create a ``sfr.txt`` file, which reports In case ``USE_SFR`` is active, Arepo will create a ``sfr.txt`` file,
the stars created in every call of the star-formation routine. which reports the stars created in every call of the star-formation
routine.
The individual columns are: The individual columns are:
* time (code units or scale factor) * time (code units or scale factor)
* total stellar mass to be formed in timestepo prior to stochastic sampling (code units), * total stellar mass to be formed in timestepo prior to stochastic
sampling (code units),
* instantaneous star formation rate of all cells (Msun/yr), * instantaneous star formation rate of all cells (Msun/yr),
* instantaneous star formation rate of active cells (Msun/yr), * instantaneous star formation rate of active cells (Msun/yr),
* total mass in stars formed in this timestep (after sampling) (code units), * total mass in stars formed in this timestep (after sampling) (code units),
...@@ -490,13 +496,15 @@ timebins.txt ...@@ -490,13 +496,15 @@ timebins.txt
============ ============
Arepo is optimized for time-integrating both hydrodynamical as well as Arepo is optimized for time-integrating both hydrodynamical as well as
gravitational interactions on the largest possible timestep that is allowed by gravitational interactions on the largest possible timestep that is
the timestep criterion and allowed by the binary hierarchy of time steps. allowed by the timestep criterion and allowed by the binary hierarchy
Each for each timestep, a linked list of particles on this particular of time steps. For each timestep, a linked list of particles on this
integration step exists, and their statistics are reported in `timebins.txt`. particular integration step exists, and their statistics are reported
In this file, the number of gas cells and collisionless particles in each in `timebins.txt`. In this file, the number of gas cells and
timebin (i.e. integration timestep) is reported for each sync-point, as well collisionless particles in each timebin (i.e. integration timestep) is
as the cpu time and fraction spent on each timebin. A typical bock looks like reported for each sync-point, as well as the cpu time and the fraction
of the total cost contributed by each timebin. A typical block looks
like
Sync-Point 2658, Time: 0.307419, Redshift: 2.25289, Systemstep: 9.10 Sync-Point 2658, Time: 0.307419, Redshift: 2.25289, Systemstep: 9.10
27e-05, Dloga: 0.000296144 27e-05, Dloga: 0.000296144
...@@ -513,12 +521,13 @@ as the cpu time and fraction spent on each timebin. A typical bock looks like ...@@ -513,12 +521,13 @@ as the cpu time and fraction spent on each timebin. A typical bock looks like
------------------------ ------------------------
Total active: 185 143 Total active: 185 143
timings.txt timings.txt
=========== ===========
The performance of the gravitational tree algorithm is reported in The performance of the gravitational tree algorithm is reported in
`timings.txt` for each sync-point. An example of a single sync-point looks `timings.txt` for each sync-point. An example of a single sync-point
the following: looks like the following
Step(*): 372, t: 0.0455302, dt: 0.000215226, highest active timebin: Step(*): 372, t: 0.0455302, dt: 0.000215226, highest active timebin:
19 (lowest active: 19, highest occupied: 19) 19 (lowest active: 19, highest occupied: 19)
...@@ -529,3 +538,5 @@ the following: ...@@ -529,3 +538,5 @@ the following:
maximum number of nodes: 31091, filled: 0.677246 maximum number of nodes: 31091, filled: 0.677246
avg times: all=0.519064 tree1=0.515797 tree2=0 commwait=1.0013 avg times: all=0.519064 tree1=0.515797 tree2=0 commwait=1.0013
6e-05 sec 6e-05 sec