TurTLE issueshttps://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues2017-06-23T13:27:05Zhttps://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/11simplified wrapper2017-06-23T13:27:05ZCristian Lalescusimplified wrapperGenerating code piece by piece from the Python wrapper is not easy to deal with the first time you see it done.
I think I should write a cleaner version of the wrapper.
I'm doing this in branch feature/new-wrapper.
In the interest of cle...Generating code piece by piece from the Python wrapper is not easy to deal with the first time you see it done.
I think I should write a cleaner version of the wrapper.
I'm doing this in branch feature/new-wrapper.
In the interest of cleaner code to discuss/diagnose/optimise during the scaling workshop (issue #10), I may need to do this sooner rather than later.
Broadly speaking:
1. write an abstract class "direct_numerical_simulation", which will contain all the code that's currently written as strings in "_code.py".
2. write children of "direct_numerical_simulation" that will contain the code written as strings in "_fluid_solver.py" and "NSVorticityEquation.py".
3. write minimal child of "_code.py", which will generate a small main file, initial conditions and hdf5 stats datasets, and then compile and launch.Cristian LalescuCristian Lalescuhttps://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/10LRZ scaling tests2017-06-23T13:27:05ZCristian LalescuLRZ scaling testsWe need to set up initial conditions and parameters for the test runs.
Copy paste of e-mail from Nicolay J. Hammer:
> the LRZ Scaling Workshop 2017 is approaching. As time during the workshop will be precious and a considerable amount o...We need to set up initial conditions and parameters for the test runs.
Copy paste of e-mail from Nicolay J. Hammer:
> the LRZ Scaling Workshop 2017 is approaching. As time during the workshop will be precious and a considerable amount of resources (SuperMUC as well as support staff) have been reserved exclusively for this workshop, it is crucial that you can proceed with your work as efficiently as possible. In order to ensure this, we suggest that you prepare the following items:
>
> 1. a simulation which runs on 1 node, within 2-10 mins. (this allows you to run one node profiling in reasonable time)
> 2. a small simulation which can be run from 16 to 128 nodes within 2-10 mins. (this allows you to run MPI profiling and change the code accordingly)
> 3. a medium simulation which can be run from 128 nodes to 1 island (512 nodes) which runs in a short time (maximum 10-20 mins.)
> 4. a large simulation, with our help during the workshop, which scale your code from 1 island to 4 islands and beyond.
>
> We plan to have a short introduction round in the beginning of the workshop. Therefore, we ask you to prepare a small presentation (max. 15 mins.) which introduces your application and its background. The introduction will take place on Monday afternoon (see below).
Comment: we can run the code with and without particles.
With particles, the normal way to run the code would be to also sample fields at the particle positions (as well as the particle locations themselves) reasonably often.
Sampling fields should be discussed in issue #9.
Question: do we need to introduce the sampling functionality for the scaling workshop?
As far as I can tell, this is not fundamentally different from just writing the positions of the particles, or the `rhs` values, plus the interpolation step.
However, @bbramas reports that the I/O is very expensive, I'm not sure what would happen if the HDF5 file is accessed many more times.https://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/9sampling of fields at particle locations2017-06-23T13:27:05ZCristian Lalescusampling of fields at particle locationsThis is a "feature request" issue.
Desired functionality:
1. user computes arbitrary quantity from field information, at grid points --- result is stored as a generic `field` object.
2. user would like to sample the values of this field...This is a "feature request" issue.
Desired functionality:
1. user computes arbitrary quantity from field information, at grid points --- result is stored as a generic `field` object.
2. user would like to sample the values of this field at particle locations, and store the results in a given HDF5 group, in a dataset with a given name.
Desired solution:
A template function that
1. takes as parameters:
- `f`, a pointer to a `field<rnumber, FFTW, fc>`
- `p`, a pointer to an `particles_system<double>`
- `gid`, an `hid_t` identifying an HDF5 group
- `fname`, a `std::string`
2. performs the interpolation
3. creates/overwrites the dataset `fname + std::string("/") + std::to_string(p->step_idx)` in the group `gid`
4. writes the result of the interpolation into the dataset2017-05-05https://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/8test runs on supermuc2017-06-23T13:27:05ZCristian Lalescutest runs on supermucWe need to set up the test runs for supermuc.
1. go through cluster description, figure out environment setup.
2. get "compile_library" to work.
3. see what modifications are necessary to work with the queueing system.
4. g...We need to set up the test runs for supermuc.
1. go through cluster description, figure out environment setup.
2. get "compile_library" to work.
3. see what modifications are necessary to work with the queueing system.
4. get small and medium jobs to run.Berenger BramasBerenger Bramashttps://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/7NavierStokes vs NSVorticityEquation2017-06-23T13:27:05ZCristian LalescuNavierStokes vs NSVorticityEquationWe're using tests/test_vorticity_equation.py to compare the "old code" with the "new code".
Status:
1. library compiles, both codes seem to run using 1 thread per process on 'tolima' and 'chichi-P' (my desktop machine, opensuse somet...We're using tests/test_vorticity_equation.py to compare the "old code" with the "new code".
Status:
1. library compiles, both codes seem to run using 1 thread per process on 'tolima' and 'chichi-P' (my desktop machine, opensuse something, I can look up details if needed, and my laptop, debian rolling release something). Also, on machine that @bbramas is using.
2. NSVE crashes on 'tolima' when using 2 threads per process, it seems to work on the other ones.
3. trajectories obtained with the two codes look different: [trajectories.pdf](/uploads/0956509bf40a237edc0dd33ef5bcc490/trajectories.pdf)
I'd say (3) takes priority over everything else, since in the past problems in the algorithm led to segfaults and other crash-causing behavior.https://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/6HDF5 tuning2017-06-23T13:27:05ZCristian LalescuHDF5 tuningAt the moment, the various files are generated from python.
Could someone with experience in HDF5 please have a look at this page http://docs.h5py.org/en/latest/high/file.html, at the "Version Bounding" section?
My impression is that w...At the moment, the various files are generated from python.
Could someone with experience in HDF5 please have a look at this page http://docs.h5py.org/en/latest/high/file.html, at the "Version Bounding" section?
My impression is that with bfps it would be perfectly safe to use 'latest', but I'm not sure how that would work on the different clusters.
Also, I'm not sure what the different visualisation tools can handle.https://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/5FFT scaling2017-06-23T13:27:05ZCristian LalescuFFT scalingThe code needs to be fast at FFTs, so this issue is addressed to FFTs.
Here are preliminary scaling results for the code itself, obtained for the 1536^3 test cases [scaling.pdf](/uploads/390114e10896cf74040e2fcd9953e5cf/scaling.pdf).
...The code needs to be fast at FFTs, so this issue is addressed to FFTs.
Here are preliminary scaling results for the code itself, obtained for the 1536^3 test cases [scaling.pdf](/uploads/390114e10896cf74040e2fcd9953e5cf/scaling.pdf).
For this discussion, only the "ftest" line is relevant.
My interpretation is that the direct FFTW approach scales quite reasonably.
Procedure for plot:
1) take snapshot from 1536^3 DNS, run four different DNS for 64 time steps with this snapshot as initial condition:
- "ftest": only run Navier Stokes solver
- "ptest-1e5": same as "ftest", but add 10^5 particles, with sampling at every timestep.
- "ptest-2e7": same as "ftest", but add 2 x 10^7 particles, with sampling at every timestep.
- "ptest-2e7-lessiO": same as "ftest", but add 2 x 10^7 particles, with sampling at every 16 timesteps.
2) The jobs are run using 128, 192, 256, 384 and 512 MPI processes, on draco, so always using a number of nodes at full capacity.
In fact only "ftest" can run on 512 processes, since the particle code can't run if there are less than 4 z slices per slab allocated to MPI process.
3) Afterwards, read the overall execution time from the output file for each process, average over all processes for each run, and plot as a function of the number of processes.
To generate the plot, i am using the file
https://gitlab.mpcdf.mpg.de/clalescu/bfps_addons/blob/develop/tests/timing_analyzer.py
It's currently set up to work with my peculiar file structure, but I trust the "check_scaling" function is clearly enough defined.https://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/4new fluid solver2017-06-23T13:27:05ZCristian Lalescunew fluid solverTask: rewrite algorithm from `fluid_solver.cpp` in `vorticity_equation.cpp`, but using the `field` class.
Desirable result: `vorticity_equation.cpp` should not know anything about FFTW or other possible backends.
We need a clear proc...Task: rewrite algorithm from `fluid_solver.cpp` in `vorticity_equation.cpp`, but using the `field` class.
Desirable result: `vorticity_equation.cpp` should not know anything about FFTW or other possible backends.
We need a clear procedure to decide whether or not this is a reasonable goal with respect to efficiency/scalability.
@bbramas, I will ask for help with this "procedure" after I address the first three immediate tasks below.
Branch: `feature/new-solver`.
Immediate tasks as of commit 2253a7926065822e762ba548e53e49db81456d9b:
- test script `tests/test_vorticity_equation.py` should check that the results obtained with the two different codes are identical to within numerical precision.
- timings should be placed throughout the `vorticity_equation` methods.
- the test code should also be able to handle particles; this can be achieved by simply passing the `get_rdata()` pointers instead of `fs->rvelocity`.
- figure out how to perform statistics with new code --- choose between direct translation of old statistics methods to the new class, or moving some functionality to the python class.
- the `vorticity_equation` code should be able to use either binary I/O or HDF5 I/O (at the moment it's just binary).
- the `field` class and related classes (`field_layout` and `kspace`) should be made consistent with the `fftw_interface.hpp` header.
- the `field.?pp` files should be split into three.
More general things to consider:
- some of the functionality from a solver might be better off as a method of the `field` class (for instance the computation of curls).
In particular the `field.?pp` files define a method for computing the gradient of a `field` object into another `field` object; what are the pro-s and con-s of such an approach?
- particle code should work directly with the `field` class (such that scalar/tensorial fields can be interpolated directly).
There's an old `feature/field-interpolator` branch that is addressed to this, and there's also the `hack/tensor-interpolation` branch that needs to be cleaned up.
@bbramas, please let me know if you can think of anything I missed in this plan.Cristian LalescuCristian Lalescuhttps://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/3Timing results mixed between processes2017-06-23T13:27:05ZBerenger BramasTiming results mixed between processessometimes the output from different CPUs is mixed up.
The timing output should be in order, but sometime it is not even so we are using a right to left blocking com:
```cpp
void show(const MPI_Comm inComm) const {
int myRan...sometimes the output from different CPUs is mixed up.
The timing output should be in order, but sometime it is not even so we are using a right to left blocking com:
```cpp
void show(const MPI_Comm inComm) const {
int myRank, nbProcess;
int retMpi = MPI_Comm_rank( inComm, &myRank);
assert(retMpi == MPI_SUCCESS);
retMpi = MPI_Comm_size( inComm, &nbProcess);
assert(retMpi == MPI_SUCCESS);
if((&outputStream == &std::cout || &outputStream == &std::clog) && myrank != nbProcess-1){
// Print in reverse order
char tmp;
MPI_Recv(&tmp, 1, MPI_BYTE, myrank+1, 99, inComm, MPI_STATUS_IGNORE);
}
std::stack<std::pair<int, const std::shared_ptr<CoreEvent>>> events;
for (int idx = static_cast<int>(root->getChildren().size()) - 1; idx >= 0; --idx) {
events.push({0, root->getChildren()[idx]});
}
outputStream << "[TIMING-" << myRank<< "] Local times.\n";
outputStream << "[TIMING-" << myRank<< "] :" << root->getName() << "\n";
// output here
outputStream.flush();
if((&outputStream == &std::cout || &outputStream == &std::clog) && myrank != 0){
// Print in reverse order
char tmp;
MPI_Send(&tmp, 1, MPI_BYTE, myrank-1, 99, inComm);
}
}
```https://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/2Performance Discussion2017-06-23T13:27:05ZBerenger BramasPerformance DiscussionHere will be posted the first results, to start discussing and deciding the orientation.
## N1536_kMeta2 Using 256 CPUs
### Config
```bash
./scaling_tests.py --ncpu 256 --src-simname N1536_kMeta2 --src-iteration 8192 --src-wd /hy...Here will be posted the first results, to start discussing and deciding the orientation.
## N1536_kMeta2 Using 256 CPUs
### Config
```bash
./scaling_tests.py --ncpu 256 --src-simname N1536_kMeta2 --src-iteration 8192 --src-wd /hydra/ptmp/khr/bfps/database/ --fluid --p1e5 --p2e7
version 1.7.post45+g6385432.dirty
```
### IO (write)
Note that some parts have been removed.
The last process gives us:
```
[TIMING-255] @code::main_start = 2519.59s
[TIMING-255] @fluid_solver::read = 44.0502s
[TIMING-255] @fluid_solver::compute_velocity = 13.3542s (Min = 0.125834s ; Max = 0.13815s ; Average = 0.134891s ; Occurrence = 99)
[TIMING-255] @fluid_solver::ift_velocity = 125.755s (Min = 1.72731s ; Max = 10.7683s ; Average = 1.90539s ; Occurrence = 66)
[TIMING-255] @particles_io_base::particles_io_base = 2.94317s
[TIMING-255] @rFFTW_distributed_particles::rFFTW_distributed_particles = 0.047429s
[TIMING-255] @rFFTW_distributed_particles::read = 20.0988s
[TIMING-255] @field::compute_stats = 216.914s (Min = 3.08093s ; Max = 3.86351s ; Average = 3.28658s ; Occurrence = 66)
[TIMING-255] @field::cospectrum = 10.5377s (Min = 0.158073s ; Max = 0.180153s ; Average = 0.159663s ; Occurrence = 66)
[TIMING-255] @field::ift = 107.045s (Min = 1.58284s ; Max = 1.73614s ; Average = 1.6219s ; Occurrence = 66)
[TIMING-255] @field::compute_rspace_stats = 99.3301s (Min = 1.30216s ; Max = 1.95447s ; Average = 1.505s ; Occurrence = 66)
[TIMING-255] @MPI_Bcast = 10.2841s (Min = 0.134494s ; Max = 0.247863s ; Average = 0.155819s ; Occurrence = 66)
[TIMING-255] @FIELD_RLOOP = 69.8258s (Min = 1.02012s ; Max = 1.0942s ; Average = 1.05797s ; Occurrence = 66)
[TIMING-255] @MPI_Allreduce = 19.2172s (Min = 0.0614824s ; Max = 0.783903s ; Average = 0.29117s ; Occurrence = 66)
[TIMING-255] @rFFTW_distributed_particles::write2 = 248.374s (Min = 6.31219s ; Max = 33.165s ; Average = 7.30512s ; Occurrence = 34)
[TIMING-255] @MPI_Allreduce = 219.672s (Min = 1.2993e-05s ; Max = 0.345064s ; Average = 6.46095e-05s ; Occurrence = 3400000)
[TIMING-255] @std::copy = 0.00545869s (Min = 2.1e-07s ; Max = 4.514e-06s ; Average = 4.31585e-07s ; Occurrence = 12648)
[TIMING-255] @write_rhs = 26.3912s (Min = 0.000231546s ; Max = 0.164712s ; Average = 0.000263912s ; Occurrence = 100000)
[TIMING-255] @fluid_solver::compute_Lagrangian_acceleration = 376.551s (Min = 11.3228s ; Max = 11.5214s ; Average = 11.4106s ; Occurrence = 33)
[TIMING-255] @fluid_solver::compute_Lagrangian_acceleration = 296.386s (Min = 8.90616s ; Max = 9.05127s ; Average = 8.98139s ; Occurrence = 33)
[TIMING-255] @fluid_solver::compute_velocity = 7.7664s (Min = 0.0877993s ; Max = 0.153871s ; Average = 0.117673s ; Occurrence = 66)
[TIMING-255] @fluid_solver::ift_velocity = 64.1085s (Min = 1.90793s ; Max = 1.95632s ; Average = 1.94268s ; Occurrence = 33)
[TIMING-255] @fluid_solver::compute_pressure = 213.898s (Min = 6.44201s ; Max = 6.55077s ; Average = 6.48176s ; Occurrence = 33)
[TIMING-255] @fluid_solver_base::clean_up_real_space = 0.145752s (Min = 0.00114871s ; Max = 0.00354554s ; Average = 0.00220837s ; Occurrence = 66)
[TIMING-255] @fluid_solver_base::dealias = 72.9218s (Min = 1.09773s ; Max = 1.12227s ; Average = 1.10488s ; Occurrence = 66)
[TIMING-255] @rFFTW_distributed_particles::write = 432.386s (Min = 6.30417s ; Max = 7.74553s ; Average = 6.55131s ; Occurrence = 66)
[TIMING-255] @MPI_Allreduce = 428.053s (Min = 1.54e-05s ; Max = 1.05938s ; Average = 6.48565e-05s ; Occurrence = 6600000)
[TIMING-255] @fluid_solver::step = 758.14s (Min = 23.6557s ; Max = 23.7339s ; Average = 23.6919s ; Occurrence = 32)
[TIMING-255] @fluid_solver::omega_nonlin = 724.025s (Min = 6.99106s ; Max = 7.86799s ; Average = 7.54193s ; Occurrence = 96)
[TIMING-255] @fluid_solver::compute_velocity = 11.5398s (Min = 0.105677s ; Max = 0.136042s ; Average = 0.120206s ; Occurrence = 96)
[TIMING-255] @fluid_solver::omega_nonlin::fftw = 405.633s (Min = 3.66764s ; Max = 4.54959s ; Average = 4.22534s ; Occurrence = 96)
[TIMING-255] @fluid_solver::omega_nonlin::RLOOP = 7.43167s (Min = 0.0762171s ; Max = 0.0804303s ; Average = 0.0774132s ; Occurrence = 96)
[TIMING-255] @fluid_solver::omega_nonlin::fftw-2 = 170.379s (Min = 1.76048s ; Max = 1.78986s ; Average = 1.77478s ; Occurrence = 96)
[TIMING-255] @fluid_solver_base::dealias = 106.051s (Min = 1.10247s ; Max = 1.12183s ; Average = 1.1047s ; Occurrence = 96)
[TIMING-255] @fluid_solver::omega_nonlin::CLOOP = 7.8091s (Min = 0.0796459s ; Max = 0.0867639s ; Average = 0.0813448s ; Occurrence = 96)
[TIMING-255] @fluid_solver::omega_nonlin::add_forcing = 0.000439908s (Min = 3.099e-06s ; Max = 2.6225e-05s ; Average = 4.58237e-06s ; Occurrence = 96)
[TIMING-255] @fluid_solver::write = 44.8473s
```
First from this result, it appears the the fftw calls are taking a large part of the execution time.
But here I would like to look to the IO.
So we have:
```
[TIMING-255] @rFFTW_distributed_particles::write2 = 248.374s (Min = 6.31219s ; Max = 33.165s ; Average = 7.30512s ; Occurrence = 34)
[TIMING-255] @MPI_Allreduce = 219.672s (Min = 1.2993e-05s ; Max = 0.345064s ; Average = 6.46095e-05s ; Occurrence = 3400000)
[TIMING-255] @std::copy = 0.00545869s (Min = 2.1e-07s ; Max = 4.514e-06s ; Average = 4.31585e-07s ; Occurrence = 12648)
[TIMING-255] @write_rhs = 26.3912s (Min = 0.000231546s ; Max = 0.164712s ; Average = 0.000263912s ; Occurrence = 100000)
[TIMING-255] @rFFTW_distributed_particles::write = 432.386s (Min = 6.30417s ; Max = 7.74553s ; Average = 6.55131s ; Occurrence = 66)
[TIMING-255] @MPI_Allreduce = 428.053s (Min = 1.54e-05s ; Max = 1.05938s ; Average = 6.48565e-05s ; Occurrence = 6600000)
```
We can compare this output to the first process because these write functions should have different behaviors on the root process or the others.
```
[TIMING-0] @rFFTW_distributed_particles::write2 = 248.148s (Min = 6.3105s ; Max = 33.165s ; Average = 7.29846s ; Occurrence = 34)
[TIMING-0] @MPI_Allreduce = 55.3869s (Min = 1.5148e-05s ; Max = 0.00178043s ; Average = 1.62903e-05s ; Occurrence = 3400000)
[TIMING-0] @write_state_chunk = 158.072s (Min = 4.1521e-05s ; Max = 0.345009s ; Average = 4.64919e-05s ; Occurrence = 3400000)
[TIMING-0] @particles_io_base::write_state_chunk = 154.6s (Min = 4.0557e-05s ; Max = 0.345007s ; Average = 4.54707e-05s ; Occurrence = 3400000)
[TIMING-0] @std::copy = 0.00690804s (Min = 2.6e-07s ; Max = 4.824e-06s ; Average = 5.43256e-07s ; Occurrence = 12716)
[TIMING-0] @write_rhs = 26.5682s (Min = 0.000247992s ; Max = 0.184024s ; Average = 0.000265682s ; Occurrence = 100000)
[TIMING-0] @rFFTW_distributed_particles::write = 432.388s (Min = 6.30419s ; Max = 7.74556s ; Average = 6.55134s ; Occurrence = 66)
[TIMING-0] @MPI_Allreduce = 107.475s (Min = 1.5153e-05s ; Max = 0.00127309s ; Average = 1.62841e-05s ; Occurrence = 6600000)
[TIMING-0] @write_point3D_chunk = 310.065s (Min = 4.1581e-05s ; Max = 1.05933s ; Average = 4.69796e-05s ; Occurrence = 6600000)
```
So, it appears that the MPI_Allreduce is called numerous times (6600000 times here).
But still it is not that expensive if we look to process 0, but since the write operation of process 0 is slow, then the other processes are waiting for him in the MPI_Allreduce.
So here the problem is that all the process must synchronize 6600000 times while the process 0 is writing between each call.
### Merged time
We have also this information from the output:
```
[MPI-TIMING] @MPI_Allreduce
[MPI-TIMING] Stack => rFFTW_distributed_particles::write << code::main_start << BFPS <<
[MPI-TIMING] Done by 256 processes
[MPI-TIMING] Total time for all 110289s (average per process 430.817s)
[MPI-TIMING] Min time for a process 107.475s Max time for a process 436.467s
[MPI-TIMING] The same call has been done 1683000107 times by all process (duration min 9.959e-06s max 1.05939s avg 6.55313e-05s)
[MPI-TIMING] @MPI_Allreduce
[MPI-TIMING] Stack => rFFTW_distributed_particles::write2 << code::main_start << BFPS <<
[MPI-TIMING] Done by 256 processes
[MPI-TIMING] Total time for all 56085s (average per process 219.082s)
[MPI-TIMING] Min time for a process 55.3869s Max time for a process 219.854s
[MPI-TIMING] The same call has been done 867000055 times by all process (duration min 1.0033e-05s max 0.345071s avg 6.46886e-05s)
```
### Question
Improving the IO could reduce the writing time for P0, improve the synchronization and thus might reduce the complete operation .
But, is it really necessary to merge the results on P0? Why not have parallel output?https://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/1particle checkpoints vs sampling2017-06-23T13:27:05ZCristian Lalescuparticle checkpoints vs samplingLow priority.
Particle "state" and "rhs" data is required for continuing DNS. It should be saved in double precision.
Sampled fields come from single precision computations, it makes sense to save them in single precision.
Also full...Low priority.
Particle "state" and "rhs" data is required for continuing DNS. It should be saved in double precision.
Sampled fields come from single precision computations, it makes sense to save them in single precision.
Also full particle trajectories only need to be saved in single precision as far as postprocessing is concerned, since the quantities we're interested in will not be affected by lack of precision.
What needs to happen is:
- modify HDF5 datatype used in _particles.h5 file for sampled fields.
- modify existing method into "checkpoint", and generate an adequate method for "sampling states" (i.e. write same data to different datasets).Cristian LalescuCristian Lalescuhttps://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/17optmize sampling HDF5 access2017-10-18T14:49:08ZCristian Lalescuoptmize sampling HDF5 accessEach call to `sample_from_particles_system` opens the particle file, writes, and then closes the particle file.
As far as I know, this means flushing the buffer, so we lose any "small data writes" optimization that HDF5 can supply.
Rela...Each call to `sample_from_particles_system` opens the particle file, writes, and then closes the particle file.
As far as I know, this means flushing the buffer, so we lose any "small data writes" optimization that HDF5 can supply.
Related: see https://gitlab.mpcdf.mpg.de/clalescu/bfps_addons/blob/feature/new-multiscale-particles/bfps_addons/cpp/full_code/multi_scale_particles.cpp.
That file shows the common usage pattern for the sampling functionality, that is what needs to be optimized.
All of this applies to `sample_particle_system_position` as well, obviously.https://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/18review of alternate sampling I/O2017-10-20T12:35:51ZCristian Lalescureview of alternate sampling I/OIn b3e45e2003ec1441428932f7edcd066831d90a2c I made some changes to the `particles_output_sampling_hdf5` class (and related code), such that when I do need to sample particle data, the particle file stays open.
My understanding is that HD...In b3e45e2003ec1441428932f7edcd066831d90a2c I made some changes to the `particles_output_sampling_hdf5` class (and related code), such that when I do need to sample particle data, the particle file stays open.
My understanding is that HDF5 is much more efficient if a file is kept open, since the buffer is only flushed when full.
@bbramas , can you please review my changes and confirm that I didn't do anything stupid (in particular with the temporary data arrays, and the particle positions).
In principle look at the latest commit in the "feature/efficient-particle-sampler" branch, since I may tweak it soon.Berenger BramasBerenger Bramashttps://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/16particle sample output arrays have wrong shape2017-11-02T08:36:43ZCristian Lalescuparticle sample output arrays have wrong shapeThe output datasets of `sample_from_particles_system` and `sample_particles_system_position` should have the same shape as the corresponding datasets from the checkpoint file.
At the moment they have an extra dimension of size 1.The output datasets of `sample_from_particles_system` and `sample_particles_system_position` should have the same shape as the corresponding datasets from the checkpoint file.
At the moment they have an extra dimension of size 1.https://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/20we need temporal interpolation of fields for particles2019-01-30T16:45:53ZCristian Lalescuwe need temporal interpolation of fields for particlesTime scale of interacting particles is much smaller than timescale of field, so we need a mechanism to perform more integration steps of particles per integration step of fields.
This means we need to interpolate the fields in time as we...Time scale of interacting particles is much smaller than timescale of field, so we need a mechanism to perform more integration steps of particles per integration step of fields.
This means we need to interpolate the fields in time as well as in space.
This needs to be addressed in today's meeting, and then we'll set down a way forward in further comments here.https://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/13particle clouds2019-02-17T05:44:47ZCristian Lalescuparticle cloudsWe should be able to do particle I/O using arbitrarily shaped HDF5 arrays.
The shape of the initial condition should be used for all subsequent checkpoints/samples etc.We should be able to do particle I/O using arbitrarily shaped HDF5 arrays.
The shape of the initial condition should be used for all subsequent checkpoints/samples etc.https://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/24possible memory leak2019-03-01T15:03:59ZCristian Lalescupossible memory leakmemory usage on cobra increases over time.
in particular the output of
"sstat -j job_id -o maxrss"
increases over time.
this may be related to #23.memory usage on cobra increases over time.
in particular the output of
"sstat -j job_id -o maxrss"
increases over time.
this may be related to #23.https://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/25compilation error on intel 182019-03-06T11:58:01ZCristian Lalescucompilation error on intel 18In brief: code does not compiler with intel 18 or later.
Exact error:
```
bfps/cpp/particles/particles_field_computer.hpp(80): error: expression must have a constant value
constexpr int nb_components_in_field = nbcomp(field);...In brief: code does not compiler with intel 18 or later.
Exact error:
```
bfps/cpp/particles/particles_field_computer.hpp(80): error: expression must have a constant value
constexpr int nb_components_in_field = nbcomp(field);
^
bfps/cpp/particles/particles_field_computer.hpp(80): note: the value of parameter "field" (declared at line 76) cannot be used as a constant
constexpr int nb_components_in_field = nbcomp(field);
^
detected during:
instantiation of "void particles_field_computer<partsize_t, real_number, interpolator_class, interp_neighbours>::apply_computation<field_class,size_particle_positions,size_particle_rhs>(const field_class &, const real_number *, real_number *, partsize_t) const [with partsize_t=long long, real_number=double, interpolator_class=particles_generic_interp<double, 1, 0>, interp_neighbours=1, field_class=field<float, FFTW, THREE>, size_particle_positions=3, size_particle_rhs=3]" at line 387
of "bfps/cpp/particles/particles_distr_mpi.hpp"
```
First reported by LRZ support, reproduced on cobra with intel compiler.
Configuration files to reproduce are attached.
Steps to reproduce:
1. clone repository, change directory
2. switch to develop branch
3. execute `python setup.py compile_library`
4. extremely long error message will be given at `test_interpolation.cpp`, originating in `particles_field_computer.hpp`, as detailed above.
[host_information.py](/uploads/12e4013e28dfbf8a3dbbfefc993c1b77/host_information.py)
[bashrc](/uploads/3c06788edf12ebc9b4d0a856d6bf2aee/bashrc)
[machine_settings.py](/uploads/fbcda844f15d8fdf261f0d7e94d8d076/machine_settings.py)https://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/27fix stopping mechanism2019-06-27T11:37:07ZCristian Lalescufix stopping mechanismlauncher should check if `stop_<simname>` file exists in working directory, and it should throw an error if it does.launcher should check if `stop_<simname>` file exists in working directory, and it should throw an error if it does.https://gitlab.mpcdf.mpg.de/TurTLE/turtle/-/issues/31handle custom cmake requirements2019-10-11T09:02:39ZCristian Lalescuhandle custom cmake requirementsAt line 233 of `TurTLE/_code.py` there is a small block of code meant to handle custom cmake configurations for libraries needed by custom executables.
In particular this is used by "strain_vort_alignment" from `turtle_addons` right now....At line 233 of `TurTLE/_code.py` there is a small block of code meant to handle custom cmake configurations for libraries needed by custom executables.
In particular this is used by "strain_vort_alignment" from `turtle_addons` right now.
I assume having an "exec_name_extra_cmake.txt" file in the current directory is not the most reasonable thing to do.
We should see how this problem is handled elsewhere, and choose a reasonable solution for TurTLE.