Table of contents
Diagnostic output
Arepo will not only output the simulation snapshot and reduced data via the halo-finder files, but also a number of (mostly ascii) diagnostic log- files which contain important information about the code performance and runtime behavior.
In practice, to quickly check the performance of large production
runs, it is useful to check the timebins.txt
and cpu.txt
files. The former will give information how many simulation elements
are evolved with which timestep sizes, i.e. characteristics of the
system simulated, the latter provides information about the
computational time spent in each part of the code, which can be
influenced to some degree by the values of the code parameters.
For ongoing simulations, these can be checked via
tail -n 30 timebins.txt
tail -n 200 cpu.txt
stdout
The standard output contains general information about the simulation status and many of the main routines will print general information in it. The output itself is mainly relevant for reconstructing what the simulation did, which is needed e.g. for debugging purposes.
balance.txt
Output of fractional cpu time used in each individual step, optimized to be machine readable (while cpu.txt is more human readable).
Symbol key
total = '-' / '-'
treegrav = 'a' / ')'
treebuild = 'b' / '('
insert = 'c' / '*'
branches = 'd' / '&'
toplevel = 'e' / '^'
treecostm = 'f' / '%'
treewalk = 'g' / '$'
treewalk1 = 'h' / '#'
treewalk2 = 'i' / '@'
treebalsndrcv = 'j' / '!'
treeback = 'm' / '7'
treedirect = 'r' / '2'
pm_grav = 's' / '1'
ngbtreebuild = 't' / 'Z'
ngbtreevelupdate = 'u' / 'Y'
voronoi = 'v' / 'X'
insert = 'w' / 'W'
findpoints = 'x' / 'V'
cellcheck = 'y' / 'U'
geometry = 'z' / 'T'
exchange = 'A' / 'S'
dynamic = 'B' / 'R'
hydro = 'C' / 'Q'
gradients = 'D' / 'P'
fluxes = 'F' / 'N'
fluxcomm = 'H' / 'L'
updates = 'J' / 'j'
vertex vel = 'K' / 'I'
mhd = '4' / 'p'
domain = 'U' / 'y'
peano = 'V' / 'x'
drift/kicks = 'W' / 'w'
timeline = 'X' / 'v'
treetimesteps = 'Y' / 'u'
i/o = 'Z' / 't'
logs = '1' / 's'
sfrcool = '2' / 'r'
fof = '#' / 'h'
subfind = '$' / 'g'
refine = '%' / 'f'
mesh_derefine = '^' / 'e'
images = '&' / 'd'
initializ. = '*' / 'c'
restart = '(' / 'b'
misc = ')' / 'a'
example:
Step= 13147 sec= 0.212 Nsync-grv= 5181 Nsync-hyd=
342 dhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhttwwwwwxxzBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBBBBBBBBDFFFUUUUUUUUUUUUUUUUUUVVVVWXY2222^^
Step= 13148 sec= 0.001 Nsync-grv= 1 Nsync-hyd=
1 rvvwwwwwwwwwwwwwwwxxxxxxxxxxxxxyyzzzzzABBBDDFFFFFFFFHVVVVWWWYYY
Y1111111111111111111111111111112222222222222^))))))))))))
cpu.txt
For each sync-point, such a block is written. This file reports
measurements of the different timers built into Arepo. Each
computationally expensive operation has a different timer attached to
it, thus allowing to closely monitor where the computational time is
spent. Some of the timers (e.g. treegrav) have sub-timers for
individual operations. This is denoted by the indentation hierarchy
in the first column. The fraction of time spent in different code
parts, as well as the absolute amount, is highly problem
dependent. The timers make it possible to identify inefficient parts
of the overall algorithm and concentrate on the most time-consuming
parts of the code. There is also the option OUTPUT_CPU_CSV
which
returns thes same data as a more easily machine-readable cpu.csv
file.
The different columns are: name; wallclock time (in s) this step; percentage this step; wallclock time (in s) cumulative; percentage up to this step. A typical block of cpu.txt looks the following (here a gravity-only, tree-only run):
Step 131, Time: 0.197266, CPUs: 1, MultiDomains: 8, HighestActiveTim
eBin: 20
diff cumulative
total 1.03 100.0% 99.22 100.0%
treegrav 1.00 97.6% 96.12 96.9%
treebuild 0.01 0.7% 0.76 0.8%
insert 0.00 0.5% 0.54 0.5%
branches 0.00 0.2% 0.21 0.2%
toplevel 0.00 0.0% 0.00 0.0%
treecostm 0.00 0.3% 0.30 0.3%
treewalk 0.99 96.5% 94.96 95.7%
treewalk1 0.99 96.5% 94.96 95.7%
treewalk2 0.00 0.0% 0.00 0.0%
treebalsndrcv 0.00 0.0% 0.00 0.0%
treeback 0.00 0.0% 0.00 0.0%
treedirect 0.00 0.0% 0.01 0.0%
ngbtreebuild 0.00 0.0% 0.00 0.0%
ngbtreevelupdate 0.00 0.0% 0.01 0.0%
voronoi 0.00 0.0% 0.00 0.0%
insert 0.00 0.0% 0.00 0.0%
findpoints 0.00 0.0% 0.00 0.0%
cellcheck 0.00 0.0% 0.00 0.0%
geometry 0.00 0.0% 0.00 0.0%
exchange 0.00 0.0% 0.00 0.0%
dynamic 0.00 0.0% 0.00 0.0%
hydro 0.00 0.0% 0.01 0.0%
gradients 0.00 0.0% 0.00 0.0%
fluxes 0.00 0.0% 0.00 0.0%
fluxcomm 0.00 0.0% 0.00 0.0%
updates 0.00 0.0% 0.00 0.0%
vertex vel 0.00 0.0% 0.00 0.0%
mhd 0.00 0.0% 0.00 0.0%
domain 0.01 1.4% 1.65 1.7%
peano 0.01 0.5% 0.59 0.6%
drift/kicks 0.00 0.4% 0.74 0.7%
timeline 0.00 0.0% 0.03 0.0%
treetimesteps 0.00 0.0% 0.00 0.0%
i/o 0.00 0.0% 0.03 0.0%
logs 0.00 0.0% 0.03 0.0%
sfrcool 0.00 0.0% 0.00 0.0%
refine 0.00 0.0% 0.00 0.0%
mesh_derefine 0.00 0.0% 0.00 0.0%
images 0.00 0.0% 0.00 0.0%
initializ. 0.00 0.0% 0.00 0.0%
restart 0.00 0.0% 0.00 0.0%
misc 0.00 0.0% 0.02 0.0%
domain.txt
The load-balancing (cpu work and memory) both for gravity and hydro calculations is reported for each timebin individually. Reported every sync-point. Ideally balanced runs have a value 1, the higher the value, the more imbalanced the simulation.
DOMAIN BALANCE, Sync-Point 13314, Time: 0.997486
Timebins: Gravity Hydro cumulative grav-balance
hydro-balance
|bin=18 47268 26570 64920 m 1.000 | 1.000
1.000 | 1.000
|bin=16 12523 2399 17652 m 1.000 | 1.000
1.000 | 1.000
>|bin=14 5129 331 5129 m 1.000 | 1.000 *
1.000 | 1.000
--------------------------------------------------------------------
-----------------
BALANCE, LOAD: 1.000 1.000 1.000 WORK: 1.000
1.000
--------------------------------------------------------------------
-----------------
energy.txt
In specified intervals (in simulation time, specified by the parameter
TimeBetStatistics
) the total energy and its components are computed
and written into the file energy.txt
. This file also contains the
cumulative energy that had to be injected into the system to ensure
positivity in thermal energy. All output is in code units. Note: this
only works with up to 6 particle types. The columns are
1. simulation time/ scalefactor
2. total thermal energy
3. total potential energy
4. total kinetic energy
5. internal energy particle type 0
6. potential energy particle type 0
7. kinetic energy particle type 0
8. internal energy particle type 1
9. potential energy particle type 1
10. kinetic energy particle type 1
11. internal energy particle type 2
12. potential energy particle type 2
13. kinetic energy particle type 2
14. internal energy particle type 3
15. potential energy particle type 3
16. kinetic energy particle type 3
17. internal energy particle type 4
18. potential energy particle type 4
19. kinetic energy particle type 4
20. internal energy particle type 5
21. potential energy particle type 5
22. kinetic energy particle type 5
23. total mass in particle type 0
24. total mass in particle type 1
25. total mass in particle type 2
26. total mass in particle type 3
27. total mass in particle type 4
28. total mass in particle type 5
29. total injected energy due to positivity enforcement of thermal e
nergy
Two example lines:
0.96967 3.29069e+06 0 4.27406e+07 3.29069e+06 0 1.65766e+06 0 0 3.93
02e+07 0 0 0 0 0 0 0 0 1.78097e+06 0 0 0 503.489 3047.89 0 0 65.5756
0 7.71477
0.978903 3.28666e+06 0 4.18631e+07 3.28666e+06 0 1.67505e+06 0 0 3.8
4203e+07 0 0 0 0 0 0 0 0 1.76774e+06 0 0 0 503.306 3047.89 0 0 65.75
86 0 7.71477
info.txt
Every sync-point, the time-bins, time, timestep and number of active particles are written into this file, e.g.
Sync-Point 13327, TimeBin=16, Time: 0.999408, Redshift: 0.000592464,
Systemstep: 0.000147974, Dloga: 0.000148072, Nsync-grv: 17679,
Nsync-hyd: 2739
memory.txt
Arepo uses its own internal memory manager. This means that one large
chunk of memory is reserved initially for Arepo (specified by the
parameter MaxMemSize
), and the allocation for individual arrays is
then handled internally from this pool. The reason for introducing
this was to avoid memory fragmentation during runtime on some
machines, but also to have detailed information about how much memory
Arepo actually needs and to terminate if this exceeds a pre-defined
threshold. memory.txt
reports this internal memory usage, and how
much memory is actually needed by the simulation.
MEMORY: Largest Allocation = 816.742 Mbyte | Largest Allocation W
ithout Generic = 132.938 Mbyte
-------------------------- Allocated Memory Blocks---- ( Step
0 )------------------
Task Nr F Variable MBytes Cumulative Fun
ction|File|Linenumber
--------------------------------------------------------------------
----------------------
0 0 0 Parameters 0.0040
0.0040 read_parameter_file()|src/io/parameters.c|648
0 1 0 ParametersValue 0.0157
0.0197 read_parameter_file()|src/io/parameters.c|649
0 2 0 ParamtersType 0.0001
0.0198 read_parameter_file()|src/io/parameters.c|650
0 3 0 IO_Fields 0.0052
0.0250 init_field()|src/io/io.c|130
0 4 0 RateT 0.1527
0.1777 InitCool()|src/cooling/cooling.c|794
0 5 0 PhotoT 0.0058
0.1835 ReadIonizeParams()|src/cooling/cooling.c|691
0 6 0 slab_to_task 0.0005
0.1840 my_slab_based_fft_init()|src/gravity/pm/pm_mpi_fft.c|101
0 7 0 slabs_x_per_task 0.0001
0.1840 my_slab_based_fft_init()|src/gravity/pm/pm_mpi_fft.c|116
0 8 0 first_slab_x_of_task 0.0001
0.1841 my_slab_based_fft_init()|src/gravity/pm/pm_mpi_fft.c|119
0 9 0 slabs_y_per_task 0.0001
0.1841 my_slab_based_fft_init()|src/gravity/pm/pm_mpi_fft.c|122
0 10 0 first_slab_y_of_task 0.0001
0.1842 my_slab_based_fft_init()|src/gravity/pm/pm_mpi_fft.c|125
0 11 0 Exportflag 0.0001
0.1843 allocate_memory()|src/utils/allocate.c|46
0 12 0 Exportindex 0.0001
0.1843 allocate_memory()|src/utils/allocate.c|47
0 13 0 Exportnodecount 0.0001
0.1844 allocate_memory()|src/utils/allocate.c|48
0 14 0 Send 0.0001
0.1844 allocate_memory()|src/utils/allocate.c|50
0 15 0 Recv 0.0001
0.1845 allocate_memory()|src/utils/allocate.c|51
0 16 0 TasksThatSend 0.0001
0.1846 allocate_memory()|src/utils/allocate.c|53
0 17 0 TasksThatRecv 0.0001
0.1846 allocate_memory()|src/utils/allocate.c|54
0 18 0 Send_count 0.0001
0.1847 allocate_memory()|src/utils/allocate.c|56
0 19 0 Send_offset 0.0001
0.1848 allocate_memory()|src/utils/allocate.c|57
0 20 0 Recv_count 0.0001
0.1848 allocate_memory()|src/utils/allocate.c|58
0 21 0 Recv_offset 0.0001
0.1849 allocate_memory()|src/utils/allocate.c|59
0 22 0 Send_count_nodes 0.0001
0.1849 allocate_memory()|src/utils/allocate.c|61
0 23 0 Send_offset_nodes 0.0001
0.1850 allocate_memory()|src/utils/allocate.c|62
0 24 0 Recv_count_nodes 0.0001
0.1851 allocate_memory()|src/utils/allocate.c|63
0 25 0 Recv_offset_nodes 0.0001
0.1851 allocate_memory()|src/utils/allocate.c|64
0 26 0 Mesh_Send_count 0.0001
0.1852 allocate_memory()|src/utils/allocate.c|66
0 27 0 Mesh_Send_offset 0.0001
0.1852 allocate_memory()|src/utils/allocate.c|67
0 28 0 Mesh_Recv_count 0.0001
0.1853 allocate_memory()|src/utils/allocate.c|68
0 29 0 Mesh_Recv_offset 0.0001
0.1854 allocate_memory()|src/utils/allocate.c|69
0 30 0 Force_Send_count 0.0001
0.1854 allocate_memory()|src/utils/allocate.c|71
0 31 0 Force_Send_offset 0.0001
0.1855 allocate_memory()|src/utils/allocate.c|72
0 32 0 Force_Recv_count 0.0001
0.1855 allocate_memory()|src/utils/allocate.c|73
0 33 0 Force_Recv_offset 0.0001
0.1856 allocate_memory()|src/utils/allocate.c|74
0 34 1 P 11.2500
11.4356 allocate_memory()|src/utils/allocate.c|77
0 35 1 SphP 15.3125
26.7481 allocate_memory()|src/utils/allocate.c|80
0 36 1 NextActiveParticleHydro 0.1562
26.9044 timebins_allocate()|src/time_integration/timestep.c|686
0 37 1 NextInTimeBinHydro 0.1562
27.0606 timebins_allocate()|src/time_integration/timestep.c|689
0 38 1 PrevInTimeBinHydro 0.1562
27.2169 timebins_allocate()|src/time_integration/timestep.c|692
0 39 1 NextActiveParticleGravity 0.3125
27.5294 timebins_allocate()|src/time_integration/timestep.c|686
0 40 1 NextInTimeBinGravity 0.3125
27.8419 timebins_allocate()|src/time_integration/timestep.c|689
0 41 1 PrevInTimeBinGravity 0.3125
28.1544 timebins_allocate()|src/time_integration/timestep.c|692
0 42 1 DC 14.0267
42.1811 voronoi_init_connectivity()|src/mesh/voronoi/voronoi_dyn
amic_update|709
0 43 1 DomainStartList 0.0001
42.1812 domain_allocate()|src/domain/domain.c|386
0 44 1 DomainEndList 0.0001
42.1812 domain_allocate()|src/domain/domain.c|387
0 45 1 DomainFirstLocTopleave 0.0001
42.1813 domain_allocate()|src/domain/domain.c|388
0 46 1 DomainNLocalTopleave 0.0001
42.1813 domain_allocate()|src/domain/domain.c|389
0 47 1 TopNodes 0.0020
42.1833 domain_allocate()|src/domain/domain.c|390
0 48 1 DomainTask 0.0001
42.1835 domain_allocate()|src/domain/domain.c|391
0 49 1 DomainListOfLocalTopleaves 0.0001
42.1836 domain_Decomposition()|src/domain/domain.c|144
0 50 1 Ngb_DomainNodeIndex 0.0001
42.1837 ngb_treeallocate()|src/ngbtree/ngbtree.c|1346
0 51 1 Ngb_Nodes 3.0735
45.2573 ngb_treeallocate()|src/ngbtree/ngbtree.c|1348
0 52 1 ExtNgb_Nodes 0.7684
46.0257 ngb_treeallocate()|src/ngbtree/ngbtree.c|1352
0 53 1 Ngb_Nextnode 0.1564
46.1821 ngb_treeallocate()|src/ngbtree/ngbtree.c|1355
0 54 1 Ngb_Father 0.1562
46.3383 ngb_treeallocate()|src/ngbtree/ngbtree.c|1356
0 55 1 Ngb_Marker 0.2661
46.6044 ngb_treeallocate()|src/ngbtree/ngbtree.c|1358
0 56 1 VF 16.0329
62.6373 initialize_and_create_first_tetra()|src/mesh/voronoi/vor
onoi_3d.c|143
0 57 1 DP 4.4672
67.1045 initialize_and_create_first_tetra()|src/mesh/voronoi/vor
onoi_3d.c|145
0 58 1 DT 17.6054
84.7099 initialize_and_create_first_tetra()|src/mesh/voronoi/vor
onoi_3d.c|148
0 59 1 ListExports 0.6153
85.3252 create_mesh()|src/mesh/voronoi/voronoi.c|169
0 60 1 List_InMesh 0.1250
85.4502 create_mesh()|src/mesh/voronoi/voronoi.c|172
0 61 1 List_P 0.2500
85.7002 create_mesh()|src/mesh/voronoi/voronoi.c|174
0 62 1 DTC 11.7369
97.4371 create_mesh()|src/mesh/voronoi/voronoi.c|176
0 63 1 PrimExch 0.0001
97.4372 mesh_setup_exchange()|src/mesh/voronoi/voronoi_exchange.
c|193
0 64 1 GradExch 0.0001
97.4373 mesh_setup_exchange()|src/mesh/voronoi/voronoi_exchange.
c|194
0 65 0 Sndpm_count 0.0001
97.4373 pmforce_uniform_optimized_prepare_densi()|src/gravity/pm
/pm_periodic.c|733
0 66 0 Sndpm_offset 0.0001
97.4374 pmforce_uniform_optimized_prepare_densi()|src/gravity/pm
/pm_periodic.c|734
0 67 0 Rcvpm_count 0.0001
97.4374 pmforce_uniform_optimized_prepare_densi()|src/gravity/pm
/pm_periodic.c|735
0 68 0 Rcvpm_offset 0.0001
97.4375 pmforce_uniform_optimized_prepare_densi()|src/gravity/pm
/pm_periodic.c|736
0 69 0 partin 2.0000
99.4375 pmforce_uniform_optimized_prepare_densi()|src/gravity/pm
/pm_periodic.c|880
0 70 0 rhogrid 16.2500
115.6875 pmforce_uniform_optimized_prepare_densi()|src/gravity/pm
/pm_periodic.c|1038
0 71 0 forcegrid 16.2500
131.9375 pmforce_periodic()|src/gravity/pm/pm_periodic.c|1597
0 72 0 flistin 0.5000
132.4375 pmforce_uniform_optimized_readout_force()|src/gravity/pm
/pm_periodic.c|1328
0 73 0 flistout 0.5000
132.9375 pmforce_uniform_optimized_readout_force()|src/gravity/pm
/pm_periodic.c|1329
0 74 0 scount 0.0001
132.9376 myMPI_Alltoallv()|src/mpi_utils/myalltoall.c|67
0 75 0 rcount 0.0001
132.9376 myMPI_Alltoallv()|src/mpi_utils/myalltoall.c|68
0 76 0 soff 0.0001
132.9377 myMPI_Alltoallv()|src/mpi_utils/myalltoall.c|69
0 77 0 roff 0.0001
132.9377 myMPI_Alltoallv()|src/mpi_utils/myalltoall.c|70
--------------------------------------------------------------------
----------------------
sfr.txt
In case USE_SFR
is active, Arepo will create a sfr.txt
file,
which reports the stars created in every call of the star-formation
routine.
The individual columns are:
- time (code units or scale factor)
- total stellar mass to be formed in timestepo prior to stochastic sampling (code units),
- instantaneous star formation rate of all cells (Msun/yr),
- instantaneous star formation rate of active cells (Msun/yr),
- total mass in stars formed in this timestep (after sampling) (code units),
- cumulative stellar mass formed (code units).
Example:
4.373019e-01 9.714635e-03 1.100743e+02 1.405136e+02 2.2809
41e-02 2.752464e+01
4.373667e-01 4.007455e-04 1.104648e+02 5.795346e+00 0.0000
00e+00 2.752464e+01
4.374315e-01 2.009357e-02 1.104276e+02 2.905270e+02 0.0000
00e+00 2.752464e+01
4.374962e-01 3.904148e-04 1.103389e+02 5.643836e+00 0.0000
00e+00 2.752464e+01
timebins.txt
Arepo is optimized for time-integrating both hydrodynamical as well as
gravitational interactions on the largest possible timestep that is
allowed by the timestep criterion and allowed by the binary hierarchy
of time steps. For each timestep, a linked list of particles on this
particular integration step exists, and their statistics are reported
in timebins.txt
. In this file, the number of gas cells and
collisionless particles in each timebin (i.e. integration timestep) is
reported for each sync-point, as well as the cpu time and the fraction
of the total cost contributed by each timebin. A typical block looks
like
Sync-Point 2658, Time: 0.307419, Redshift: 2.25289, Systemstep: 9.10
27e-05, Dloga: 0.000296144
Occupied timebins: gravity hydro dt
cumul-grav cumul-sph A D avg-time cpu-frac
bin=19 57159 30466 0.004738310805
65419 32200 3.04 100.0%
bin=17 6854 1222 0.001184577701
8260 1734 0.36 1.0%
bin=16 1221 369 0.000592288851
1406 512 0.19 0.0%
X bin=15 185 143 0.000296144425
185 143 < 0.02 0.3%
------------------------
Total active: 185 143
timings.txt
The performance of the gravitational tree algorithm is reported in
timings.txt
for each sync-point. An example of a single sync-point
looks like the following
Step(*): 372, t: 0.0455302, dt: 0.000215226, highest active timebin:
19 (lowest active: 19, highest occupied: 19)
Nf= 65536 timebin=19 total-Nf=24510464
work-load balance: 1 (1 0), rel1to2: 1
number of iterations: max=1, exported fraction: 0
part/sec: raw=127058, effective=127058 ia/part: avg=32.2464
maximum number of nodes: 31091, filled: 0.677246
avg times: all=0.519064 tree1=0.515797 tree2=0 commwait=1.0013
6e-05 sec